Sunday, January 14, 2024

Learning to Create Taxonomies

Knowledge of what taxonomies are, what they are for, and how they are used is quite widespread, even if there are uncertainties and disagreements around the definition of “taxonomy.” People who often look up digital information are familiar with various presentations of taxonomies for selecting terms linked to content. These include hierarchical trees of topic and subtopics to browse, scroll boxes of controlled terms, type-ahead or search-suggest terms that appear below a search box after the first few letters are typed into the box, and terms or named entities grouped by various aspect types (facets) in the left margin to select from in order to limit/refine/filter search results.

Why Learn Taxonomy Creation

There is a big difference, however, between being able to use taxonomies and being able to create taxonomies.

While it is usually best to leave taxonomy creation to the experts, taxonomists are not always available, or the needed taxonomy may be small or apparently “simple,” so it may not be economical to hire a contract taxonomist or a consultant. In other situations, the taxonomy subject may be quite technical, and it would seem preferable to have subject matter experts, rather than an external taxonomist, create the taxonomy.  Thus, people who are not professional taxonomists often create taxonomies.

Generative AI now makes it easier for anyone to “generate” a taxonomy. However, the knowledge of taxonomy principles is needed to make necessary corrections and edit the taxonomy to achieve a decent level of quality. Generative AI should not be used to fully create a taxonomy (which could in fact be extracting published taxonomies violating their copyright), but rather it may be a used as a tool facilitate parts of the taxonomy creation process. (See my post “Taxonomies and ChatGPT.”) The technology thus makes it easier to create taxonomies for those who are not taxonomists and have limited time for taxonomy creation tasks.

There is also the matter of taxonomy maintenance. After a contract taxonomist or consultant creates a taxonomy and leaves, the taxonomy still needs to be kept up to date, with new concepts added and others changed, and over time expanded. While documentation and guidelines written by a taxonomy consultant are helpful, a good understanding of taxonomy creation principles is also needed by anyone responsible for expanding or maintaining a taxonomy.

Finally, taxonomy creation is a collaborative effort, involving stakeholders in various roles (project management, content management, digital asset management, information technology tagging, research, user experience, search, etc.) who are invited to contribute their perspectives. Stakeholders can provide better insights to a taxonomy if they have a better understanding of taxonomy principles. Taxonomy project managers in particular need to understand taxonomy creation even if they are not doing the actual taxonomy creation work.

How to Learn Taxonomy Creation

Fortunately, there are many resources to learn the principles and standards of taxonomy design and creation. There is, of course, my book, The Accidental Taxonomist, which, as the name implies, is intended for anyone who finds themselves, perhaps by “accident” in a position that requires them to create, edit, or manage taxonomies.

Heather Hedden delivering a taxonomy workshop
There are also various half-day and full-day workshops at conferences, virtual short courses through professional associations and other organizations, and asynchronous online training. These usually involve some exercises for practice and provide the appropriate amount of training for getting started with creating taxonomies. I’ve offered various kinds of training, both independently and through other organizations, over the years. My current course offerings are on my website

Upcoming Taxonomy Courses

The next live (virtual) course I will offer is a new course called “Controlled Vocabularies and Taxonomies”  offered through HS Events, on GoToWebinar over four weekly sessions from February 29 though March 27. I will teach this course live (with ample time for Q&A) just once, after which it will become available as a recording for purchase.

HS (Henry Stewart) Events are best known for their dominance in the field of digital asset management (DAM), but the course I will teach is not limited to DAM professionals. Actually, this course is most appropriate for the expanding scope of HS Events, which will introduce a Semantic Data conference event, which includes the subject of taxonomies, co-located with its DAM conferences in London and in New York in 2024.

The first session is an introduction to the definitions, types, uses, benefits, and standards for taxonomies. The second deals with project management side of planning and researching for creating controlled vocabularies and taxonomies. The third session gets into the details of creating terms and relationships. Finally, the fourth session takes up design and implementation issues. After this course takes place, the recordings will be available for purchase for on-demand viewing.

Then in June, I will be teaching a three part, weekly, course "Taxonomy Creation for Content Tagging" through the Society for Technical Communication (STC), so the focus is taxonomies to make documents/documentation more findable, but it is also suitable for anyone interested in learning how to create taxonomies. It will be offered on Zoom on Thursday afternoons, 4:00 – 5:30 pm EDT, June 11, 18, and 25, and the Moodle learning management system is used for additional asynchronous discussion and access to resource. Interactive exercises and live Q&A are included. I had taught this course for the first time last year, but due to my increasingly busy consulting work schedule, I do not plan to teach this course again after this June. More details are on the Interactive Virtual Taxonomy Workshop page my website.

In the future, check for my current training offerings on the Taxonomy Courses & Workshops page of my Hedden Information Management website.

Sunday, December 31, 2023

IT and Taxonomies

Taxonomies are related to many fields of work, including knowledge management, information architecture, website design, website marketing at SEO, document management, terminology management, publishing, product management (for information products), content management and strategy, digital asset management, machine learning for classification, natural language processing for auto-tagging, data management, library and information management, and information technology. Information technology is relevant to the implementation of all taxonomies.

Why is IT involved in taxonomies?

Taxonomies link users to content (and taxonomies extended into ontologies also link users to data), but this linking relies on technology. The technology could be a kind of software, such as a content management system that supports the tagging and retrieval of content by taxonomies along with the feature of taxonomy management. Often, however, additional technology is needed to link multiple software systems together, with APIs, and to move data across systems, with extract-transform-load (ETL) tools. Taxonomies are increasingly built in the SKOS (Simple Knowledge Organization System) standard/data model, which enables taxonomies and other knowledge organization systems to be machine-readable and not just human readable.

Taxonomies are a concern of information technology professionals as they are the owners of, and often also the developers of, the systems in which taxonomies are implemented. The systems could be completely internally developed, or they could be licensed software that typically requires some customization or integration with other systems. In my experience as a taxonomy consultant, I have typically engaged in conversations with those in IT as key stakeholders of the taxonomy. However, the degree of the involvement of IT professionals in the taxonomy itself can vary.

In custom taxonomy implementations, such as in an information service/product or in an ecommerce business, IT professionals are usually not involved in the actual design of the taxonomy, but taxonomists or others who create that taxonomy need to collaborate with IT professionals to understand the system’s capabilities and limitations and may impose requirements. Taxonomists are concerned with how the taxonomy will be displayed to the users, how the users can interact with the taxonomy, how tagging is done, and how the search functions. Custom software development has great flexibility in how it supports a taxonomy.
In implementations of taxonomies in licensed software, there may still be some development work for the IT professionals, but there are limits to what can be done or changed.

Commercial content management systems (CMS) that allow for the custom development of the user interface, referred to as “headless” CMSs, however, are becoming more common. The user interface in particular is very significant to how a taxonomy is designed and how it functions.

Who in IT is involved in taxonomies?

Those who work in IT departments with involvement taxonomies could be in roles doing development or support for systems that manage and consume taxonomies, or they could be in systems integration roles. Additionally, there are taxonomy/metadata/ontology specialists who work within the IT department of an enterprise, especially if a knowledge/information management department does not exist in the organization.

In a survey of taxonomists I conducted in January 2022 for the 3rd edition of The Accidental Taxonomist book, of 162 people who do taxonomy work for their employers, which are not consultancies creating taxonomies for others, a multiple-choice question asked what area they work in. Information technology ranked 4th out of 11 choices, with 17% of the responses, following the areas of knowledge management, content management/strategy, and product development/management, yet ahead of the specialties of library, user experience, marketing, and others.

The survey also asked all respondents to provide their job titles, and some of those working in taxonomies have job title that are closely associated with information technology. These included titles of IT Data Analyst, Data and Technology Platform Products, SharePoint Product Owner, Senior Solutions Consultant, Implementation Project Manager, Data Architect, Senior Manager - Graph Solutions, Enterprise Architect, Staff Engineer - Systems, Information Governance Engineer, Head of Technical Services, and Director of Solutions Delivery.

What does IT do with taxonomies?

From my experience as a taxonomy consultant, I have observed that those working in IT, in their efforts to facilitate the adoption of new software and features that make use of taxonomies, may include starter taxonomies within the tool, whether selected from offerings of software vendor or created by the IT staff themselves. For example, IT professionals might create simple controlled vocabularies in the SharePoint term store, such as for document types, departments, locations, etc., so that users can start using the search refinements right away, and there is also an example of the functionality of taxonomy, which can be improved upon and expanded by someone else later.

Then there is enterprise taxonomy/ontology management software, which should be connected to search systems, content management systems, and tagging systems (if not using a tagging module of the taxonomy management system). In my experience working for a taxonomy software vendor, the IT department was often involved in the software purchasing process, if not actually leading the decision-making. Representatives from the IT department attend pre-sales demos of the tool, ask questions, and compile and compare system requirements when requesting a proposal.

That taxonomy is actually an area concern of IT, was also made clear when I saw that taxonomies were mentioned in a section within a chapter on knowledge management-related systems in my son’s introductory Management Information Systems textbook for a required course for his B.S. in Information Technology.

In sum, IT professionals who support enterprise knowledge or information management systems need to have a basic understanding of taxonomy principles, standards, benefits, and uses. My website contains various taxonomy resources. Some IT professionals may even want to go further and design and create small taxonomies (lacking the time to create large taxonomies), and they may want to read my book or attend my workshops or online courses.

Thursday, November 30, 2023

Generative AI at Taxonomy Boot Camp Conference

Generative AI and large language models (LLMs), the technology behind ChatGPT, have been topics of presentations, keynotes, and attendees’ conversations at all the varied conferences I had the fortune to attend this year, including the Taxonomy Boot Camp conference held November 6-7, in Washington, DC. Taxonomy Boot Camp is the only conference dedicated to taxonomies.

Opening and Keynotes

 

Right from the beginning in the opening welcome, the conference chair Stephanie Lemieux mentioned uses of ChatGPT for taxonomy creation, such as asking prompts: What is a category for a following list of terms?, What label for a concept might be better for scientists, or better for parents?, and What are alternative labels for a specific content? It has become clear that generative AI is a tool to assist taxonomists with specific tasks of a project but is not appropriate for automating the entire creation of a taxonomy. Thus, the Taxonomy Boot Camp theme this year, “Humans in the Loop,” was quite apt for the new era of generative AI, even if not specific to it.

 

The Taxonomy Boot Camp opening keynote, “Ontologies in the New Age of AI by Dean Allemang, was on this subject. Dean is more of an ontologist than a taxonomist, hence the title, but he discussed both taxonomies and ontologies. Allemang made the statement that Generative AI “understands” why we need a taxonomy (even if managers do not). He explained that Schema.org has put RDF on many websites, which ChatGPT “reads.” Allemang has found that ChatGPT also performs perfectly on SPARQL queries, the query language for data, including taxonomies, that is in RDF. Allemang gave ChatGPT query examples, such as “Return all the claims we have by claim number, open date, and close date,” and “What is the total loss of each policy where loss is the sum of loss payment, loss reserve, expense, payment, and expense reserve amount?” Allemang advised taxonomists to identify uses for taxonomies that have not been fully delivered on and use generative AI to deliver it, and if people argue that generative AI does not understand their language, taxonomists should build in a link to the taxonomy that makes generative AI understand it.

 

On the second day, Taxonomy Boot Camp registrants  attend the same shared keynote presentations with all of the KMWorld co-located conferences, and this year these mostly dealt with generative AI, including the opening keynote by Dion Hinchcliffe “Tech-Driven Enterprise Thrills & Chills: The Future of Work.” 


Regular Sessions

In addition to being mentioned in various talks, generative AI was also the subject of a session, “ChatGPT, Taxonomist: Opportunities & Challenges in AI-Assisted Taxonomy Development,”  which comprised two separate presentations.

In this session, Xia Lin presented in “Chat GPT and Generative AI for Taxonomy Development” in which he discussed the steps involved in using ChatGPT in two case studies. In one, a taxonomy for data analytics projects of a small business was developed by providing ChatGPT with the scope of the first level of the taxonomy and then asking ChatGPT to expand individual categories by adding subcategories and then to add definitions of terms and categories. The results were reviewed and revised by experts. But Lin did not stop there. He showed the results of asking ChatGPT to provide stakeholder interview questions around a category, and (for those more technically inclined) how to create a ChatGPT plug-in for various defined functions of taxonomy creation, using ChatGPT’s APIs. 

Also in “ChatGPT and Generative AI for Taxonomy Development” Marjorie Hlava and Heather Kotula jointly presented on issues of the use of ChatGPT to create taxonomies and in general. They explained the risks of bias, plagiarism, ethics, data quality, matching the generated taxonomy to the content, and the amplification of errors upon repeating a prompt. In plagiarism, for example, if you ask ChatGPT to return a complete taxonomy on a subject domain in may return a copyrighted taxonomy that cannot be reused without a license.

Generative AI also impacts the topics of other presentations. For example, in the presentation “In Taxonomy We Trust: Building Buy-In for Taxonomy Projects,” Bonnie Griffin mentioned the importance of “continually re-introducing the value of taxonomy, as generative AI captures attention.” It was also the subject of a debate question in somewhat humorous closing sessions “Taxonomy Showdown—Point/Counterpoint With Taxonomy Experts.”

 

More on Taxonomies and AI

Of course, there is more to AI than just generative AI. Other sessions dealt with machine learning for auto-categorization. These included presentations by each Bob Kasenchak and Rachael Maddison in the session “Machine Learning Is Coming forYour Taxonomy,”  (link to Bob’s slides)  and Wytze Vlietstra’s presentation of  “Vision for Modular Taxonomy Product at Elsevier,” in which the program included “shared infrastructure supported by AI-based decision support tools.” In fact, AI has been a theme of Taxonomy Boot Camp in the past, in 2018. It is generative AI based on large language models that is new. 

For some more details on how this technology may be used for taxonomy development, see my prior blog post this spring Taxonomies and ChatGPT.  To get another perspective on this conference, check out the recent blog post by Taxonomy Boot Camp speaker Mary Katherine Barnes Integrating AI: Insights from KMWorld 2023.

Tuesday, October 31, 2023

Taxonomies for Learning and Training Content

Taxonomies are primarily for tagging digital content to make it more easily found when users search or browse on taxonomy concepts. Content can be of various kinds: articles and research reports, policies and procedures, technical documentation, product information, contracts and other legal documents, marketing content, etc. A growing area of digital content is instructional or training content, especially corporate training for employees.

The need for taxonomies for training content

When an organization offers its employees a large number of training courses, it can be difficult for employees to find desired training. Having the training content tagged with controlled terms from a taxonomy makes it easier to find.

The training content may come from different sources and thus may come with different, inconsistent metadata already applied to it. An organization may have generic training (such as on diversity and information security) produced by a corporate training company, industry-specific training (such as anti-money laundering for financial services and retail industries) produced by a different training company, and company-specific training which is internally produced. An organization may also subscribe to an offering of business skills and technical skills training offered by one ore more third party, such as LinkedIn Learning. It may be very difficult to search across all these different sources.

Furthermore, simply searching on words in training course titles might not be effective, if topics are broad or the course titles are vague. For example, a search on “communication” may yield far too many results to sort through. A search on “writing” might miss a training course with a title of “Bringing out Your Voice” or “Use Plain Language.” Tagged with the concept of “Writing,” these courses can then be found.

Faceted taxonomies for training content

Sample faceted taxonomy for
training content in PoolParty

For the complexities of training content, a single topical taxonomy is not enough. There could be ambiguity as to the skill level or between training topic and training format. For example, the topic of “Manager training” is not clear as to whether it is for new managers or all managers. The topic of “Presentation slides” is not clear as to whether it is training on how to create presentation slides or if presentation slides is the training format/medium. This is where a faceted taxonomy can help. Facets are different aspects of content which can be combined as search filters.

Training content is especially well suited for facets. Examples of possible facets for training content are: Content type, Level, Role, Skill, Training Program, and Topic.  An example of taxonomy terms in each facet are as follows:
•    Content type: Video training
•    Level: Intermediate
•    Role: Customer support
•    Skill: Written communication
•    Training program: Upskilling
•    Topic: Timeliness

It’s important to keep in mind that facets should be mutually exclusive, so the same concept, such as “Customer support,” cannot exist in both the Role and the Skill facets. Distinguishing a role and a skill can sometimes be difficult. It important to separate out Role, though, because then there is the possibility to recommend training courses based on one’s Role.

Taxonomy facets are based on metadata properties, but there likely exist many more metadata properties than needed for the end-user to filter train content searches. Additional, administrative metadata properties should not be implemented on the front-end for course searches. These might include Organizational unit, Original source, Region, Access Level, etc.

Skills taxonomy sources and challenges

Developing a skills taxonomy facet has its own challenges. First of all, there are multiple goals of skills taxonomies. Enabling employees or their managers to find appropriate training is just one goal. Other purposes may be to describe job openings to found by candidates with matching skills, to find an expert with a desired skill to ask question of or have work on a project, or to map roles and skills to identify gaps and improve human resources strategies and professional development programs.

There are also varied sources for skills taxonomies. Managers and subject matter experts would list certain skills, which might differ from a list of skills proposed by human resources staff. A taxonomist, metadata specialist, or information architect working on a taxonomy would come up with a slightly different list of skills, probably not as detailed. Finally, there are external sources, but these might not be appropriate to a specific organization. The largest, best known published taxonomy of skills is ESCO (European Skills, Competences, Qualifications, and Occupations), but with 13,890 skills, it is much too large and detailed for any one organization. It might be best to start with any skills list that the HR department has and build it out further with recommendations from managers, but not as detailed as some subject matter experts might suggest. External sources could be consulted to fill in some gaps.

There is the potential to get too detailed in creating a hierarchy of skills, and some of the narrower concepts may end up being specific topics and not exactly skills. For example, a skill of project management could get narrower concepts for different project management methodologies and then various components of each methodology.  This is would not be appropriate for a skills taxonomy, although, if important, these narrower concepts could be included in a Topics facet instead.

Presentations on taxonomies for corporate training content

My most recent conference presentation and my next conference presentation are both about taxonomies for corporate training content.  On October 16, I presented at the LavaCon content strategy conference in San Diego “Leveraging Semantics to Provide Targeted Training Content: A Case Study,” which was jointly presented with PoolParty software proof-of-concept project customer Esther Yoon of Google gTech. In addition to some of the issues described in this blog post, I also discussed how facets can be customized and how roles and skills can be linked for recommendation, and Esther presented how the POC improved the discovery of training content for those in roles related to customer support.

On November 6, at Taxonomy Boot Camp conference in Washington, DC, I will present “Challenges in Creating Taxonomies for Learning & Development,” which will be jointly presented with Amber Simpson of Walmart’s Walmart Academy, also a PoolParty software customer. In addition to issues described here, I will also provide specific examples of challenges in creation a Skills taxonomy facet. The slides will also be made available afterwards.


Saturday, September 30, 2023

SEMANTiCS Conference 2023: Taxonomies, Knowledge Graphs, and LLMs


The most recent conference I participated in was SEMANTiCS, September 20-22, in Leipzig Germany. This was the 19th year of this European conference focused on the application of semantic technologies and systems. This was also my fourth year presenting a workshop/tutorial on taxonomies and ontologies at the conference. The widespread value of taxonomies across different areas of specialization is indicated by the fact that taxonomy workshops are repeatedly a part of conferences on various subjects, including semantics, knowledge management, library and information science, information architecture, content strategy, and  digital asset management.


Semantics and taxonomies

Semantics means “meaning,” so semantic systems utilize standards to support the encoding of meaning of things/resources and their relations, making the semantics machine-readable. Various standards, guidelines, and data models for semantic systems were developed for what is called the Semantic Web. The Semantic Web goes beyond the simple hyperlinks of the World Wide Web to label shared metadata, specify the kinds of relations. This supports linked data, and the linking of taxonomies to other taxonomies and ontologies and their tagged content or data, which are stored on different servers. 


Just as World Wide Web protocols have been adapted within enterprises (“behind the firewall”), so have Semantic Web standards. You don’t have to share your data publicly to reap the benefits of the Semantic Web: open standards to enable the migration of taxonomies and related data between systems, sharing of data with partners, extracting and transforming data from within silos across the enterprise into a standard format, and the ability to link to data on the Web to bring in new content even if not sharing content out on the Web.


Taxonomies, as controlled vocabularies, have always been about concepts, each with unique understood meaning, not just words or strings of text. So, using taxonomies is using semantics. The Semantic Web standard SKOS (Simple Knowledge Organization System) specifies a data model to make taxonomies and other knowledge organization systems (thesauri, classification systems, etc.) machine-readable and interchangeable on the Web. Semantic Web standards also cover ontologies with RDF-Schema and OWL. By following Semantic Web Standards, taxonomies can easily be linked to and extended with ontologies, and then by linking to data stored in a graph database, knowledge graphs can be built.


The SEMANTiCS conference

The SEMANTiCS conference is somewhat unique by being semi-academic and semi-industry. It has separate academic track and industry track chairs and additional tutorials and workshops. It’s good to bring academia and industry together in a field like this, where research topics can be applied and partnerships can be developed. The location of the conference varies, and it partners with a local higher education institution for logistical support, with graduate students volunteering to help in exchange to getting access to sessions. 


This was the second year that SEMANTiCS combined its conferences with the Language Technology Industry Association, which organized a Language Intelligence track, dealing with technologies for the management of terminology, multilingual content, and machine translation. The conference also includes a one-day track focused on DBpedia, which is not the same first day as the tutorials and workshops. The entire conference lasts three full days, and has a social event one evening, and a dinner on the second evening.  


The conference has industry vendor sponsors, about eight of which were exhibiting, and a few more which did not exhibit. There are also slightly more organizations which are “partners,” including DBpedia, The Alan Turing Institute, and a number of institutes of higher education in Europe which have programs in semantic technologies. Additional organizers include Semantic Web Company, Institut für Angewandte Informatik and the Vjije Universities Amsterdam, representing the three countries where SEMANTiCS has been taking place: Austria, Germany, and Netherlands. 


SEMANTiCS 2023

The 2023 conference was held September 20-22 in Leipzig, Germany, under the leadership of a new chair Sahar Vahdati of Technical University Dresden. There were about 285 participants in person and about one-third as many online. The conference has been hybrid since 2021. There were often six simultaneous sessions. Themed tracks or sessions of multiple speakers included Knowledge Graphs, Reasoning & Recommendation, Natural Language Processing and Large Language Models, Legal & Data Governance, Ontologies Data Management, and Environmental-Social-Governance (ESG). While there was not a life sciences track like last year, there was a themed subject track on cultural heritage. LLMs and ESG were both new topics this year. Poster presentations also covered the range of topics. 


Knowledge graphs is a regular theme at this conference, but this time there was the addition of LLMs. The opening keynote was “Generations of Knowledge Graphs: The Crazy Ideas and the Business” presented by Xin Luna Dong of Meta. She spoke of three generations of knowledge graphs: entity-based knowledge graphs, text-rich knowledge graphs, and dual neural knowledge graphs, using an ontology and LLMs. The second day’s keynote was “Knowledge Graphs in the Age of Large Language Models,” presented by Aiden Hogan of the University of Chile. LLMs and AI topics were also presented in the Knowledge Graphs track, such as in Andreas Blumauer’s talk “Responsible AI and LLMs.” Finally, the moderated closing panel was “Large Language Models and Knowledge Graphs: Status Quo - Risks - Opportunities” with panelists, Andreas Blumauer and Jochen Hummel from software vendors and Kristina Podnar, a digital policy consultant, who were not completely in agreement.


In addition to my 3-hour tutorial, “Knowledge Engineering of Taxonomies and Ontologies,” only slightly updated from last year, I also contributed, along with Lutz Krüger, to Andreas Blumauer’s new 3-hour tutorial “They Key to Sustainable Enterprises: ESG, KNowledge Graphs, and Digitalization.” Adopting an ESG program and complying with upcoming ESG directives requires connecting a lot of information and data and aligning it with requirements and disclosure categories, and this is where a knowledge graph can be extremely helpful. Other tutorials and workshops dealt with data spaces, ontology reasoning, healthcare NLP, NLP for knowledge graph construction, and FAIR ontologies. 


Past and future

Semantic technologies were very new when the conference was first launched in 2005 by Semantic Web Company, even before launching its product PoolParty Semantic Suite. But it’s never been a vendor product-based conference. The main purpose was and still is to promote the understanding and advancement of semantic technologies. Competitor software vendors sponsor and exhibit, and Semantic Web Company has stepped back from a lead organizational role. The conference is not one where sponsors make business in selling their products or services, but rather for raising awareness, making and reinforcing partnerships, exchanging ideas, and general networking, including looking for work. It is more of a community conference than anything else, but it is an open welcoming community, with new people coming every year.


The next SEMANTiCS, celebrating its 20th year, will be September 16 - 18, 2024, in Amsterdam.