Wednesday, May 20, 2026

Hierarchies and Attributes in Taxonomies

One of the challenges in creating hierarchical taxonomies is that there can be multiple ways to categorize concepts and thus design hierarchies. There are multiple methods to deal with this, including polyhierarchy and facets. Now that taxonomies are more often extended with ontologies, attributes can also be used for additional “classifications” of things.

Dealing with multiple hierarchies


The traditional method of dealing with multiple methods of categorizing concepts has been to put the concepts into a “polyhierarchy,” which means the concept has more than one broader concept, and thus belongs to more than one hierarchy.  The occasional polyhierarchy is acceptable, but if a polyhierarchy becomes extensive (numerous concepts belong to the same two hierarchies) due to different methods of classification, this does not serve the purpose of helping users find the concepts and tagged content desired. When everything is in a polyhierarchy, the guiding purpose of a hierarchy gets lost.

When the issue is multiple classifications for things, then what is known “faceted classification” is often the answer. A faceted taxonomy design involves designating a facet for each method of classifying things by. For example, products may have facets for brand name, product type, functional use/application, industry market, user type, etc. Each of these could be a facet for products.

Sometimes, however, there may seem to be more possible ways of organizing or classifying something than are practical for facets. It could be within a facet. For example, if you have a facet for product type, you could further classify the product types by product family, by  generic product type (narrower “is a” sub-type of the broader), by broader system of which they are a component (narrower is a part of the broader), by size, or by a certain key feature or characteristic.

Recently on a project, a client suggested an added level of hierarchy within the facet for named product models for a classifying feature that impacted the product size. The problem was that this would combine named entities (proper nouns) of product models and generic types within the same facet. This combination should be avoided in facet design, because facets enable users to search and filter by different methods, such as either by name or by type, and there are scenarios when users would choose one over the other. Combining types and named entities in the same facet can cause confusion. This is where an ontology model may be the solution.

Ontologies for further classification

Ontologies enable customized relationships between classes (which tend to be the same type of high-level grouping as a facet) and customized attributes for members of classes. When we think of ontologies, we usually think of the custom relationships, but custom attributes can support what could be considered “types.” These “types” might have been extra hierarchies, and thus attributes provide a solution to the multiple classification problem. 

If multiple methods of hierarchical classification seem to be overlapping, you should consider making one or more attributes instead.  In my recent consulting case example, what the client originally proposed as top concepts for grouping product models (as a classifying feature impacting the product size), we decided would work better as an attribute of the product models. So, the facet would contain only named entity product models, and the hierarchy would be by model family only.

When an ontology is defined as a formal naming and definition of the types, properties and interrelationships of entities in a particular domain, we might think we have to define everything in the domain, and thus creating an ontology is a large, complex project. Often, what we need is only “some” ontology. While using the features, rules, and data model of an ontology, we need to define only the types, properties, and interrelationships that need to be defined for a business purpose.  This could be defining just a few custom attributes (properties) without even adding any custom relationships.  

More information about attributes in is my prior blog post. "Taxonomies and Attribute Data." 

Examples

In the prior example, the product model feature had originally been proposed for the hierarchy for the purpose of “grouping,” because users might want to look up the product models by that feature. If implemented in a knowledge graph, the attributes, managed in an ontology, will also support users looking up entities by their attributes.  So, the hierarchical design is not necessary.

Any “groupings” of named entities (by region, size, role, etc.), should be reconsidered as attributes of the named entities. Other examples are groupings of vehicles by engine type, which could have engine type as an attribute instead, or groupings of appliances by energy type, which could have the fuel type as an attribute instead. So, instead of Electric cars narrower to both Cars and Electric vehicles, Electric, Internal combustion, and Hybrid would be attributes for Cars

Conclusions

Shared data model standards based on RDF (Resource Description Framework) and the use of dedicated taxonomy/ontology management software that combines taxonomies with ontologies make this solution of using ontology features to resolve multiple hierarchies easy to attain. Instead of thinking that we could extend a taxonomy into an ontology in the future, we should be thinking of how to design a knowledge model now that best serves the body of knowledge and the users.


Thursday, April 30, 2026

Taxonomy Boot Camp London 2026

I was thrilled to participate in the Taxonomy Boot Camp London conference, which was in-person in London this past month for the first time since 2019. A sister conference to Taxonomy Boot Camp in the United States, which has been running since 2005, Taxonomy Boot Camp London had been running “Bite- Sized” online editions of half a day three times per years since 2020, which had been so successful that they continued through last year. The online edition will continue now, once a year, scheduled next for October 7.

Taxonomy Boot Camp London continues to be successfully chaired by London-based taxonomy consultant Helen Lippell, since its first year. She summarized this year’s conference: “I pushed the boundaries of my own knowledge and got to see a huge range of talks by our wonderful speakers …. Our workshops gave attendees the perfect grounding in foundational concepts too.”

As taxonomies are a niche specialty, which are applied to other related fields, the Taxonomy Boot Camp conference is always combined (co-located) with other conferences operated by Information Today Inc. In the United States, this has always been with KMWorld (knowledge management) and additional co-located conferences. For Taxonomy Boot Camp London, from 2016 to 2019 the conference had been co-located with Internet Librarian International to bring in enough attendance to make use of the venue and catering, but the conferences were not similar enough in content or attendance, and did not share keynotes, exhibits, or breaks. This year, for the first time, a new conference of KMWorld Europe was launched, and Taxonomy Boot Camp was fully combined with it, sharing keynotes, meals and breaks, exhibit space, and registration options. This made a lot more sense, due to the overlap of taxonomies and knowledge management. Personally, I also enjoyed seeing knowledge management colleagues, in addition to taxonomy colleagues, at the conference.


Conference Sessions

The format of the conference was the same as in previous years. After a shared keynotes each day, the conference is run in two tracks each day. Tracks are not the same as Taxonomy Boot Camp (Washington, DC (Beginner and case studies, and in two tracks only the first day) but rather on loose themes, which this year were “Components of Successful Semantic Projects”; “Joining Up Data With Semantics;” “Getting the Most of Curating Content, Data, and AI”; and “Taking Structure to the Next Level.” It was difficult to decide what to attend, and I moved between tracks often. 

Heather Hedden speaking at Taxonomy Boot Camp London, 2026

Taxonomy Boot Camp London differs from Taxonomy Boot Camp (DC) by including preconference workshop options on the afternoon before the main conference. There were four workshops to choose from in the single time slot, two for Taxonomy Boot Camp, and two for KMWorld. “Taxonomy Design Fundamentals,” which I taught, and “Finding a Forever Home: Governance, Ownership, & the Long-Term Care of Taxonomies” were the two taxonomy workshops. 

The keynote speakers, Ben Clinch on the first day and Noz Urbina on the second day, both were excellent in taking up different angles to the topic of AI in knowledge management, while also touching on taxonomy.

What was interesting about the conference sessions was the diversity of presentation subjects. While some provided the expected information on how to create good taxonomies (including my joint presentation with Joseph Busch on Thesaurus Standards for Taxonomies”) and others were case study applications of taxonomies, there were additional, different topics. Bob Kasenchak of Factor presented an interesting perspective of semantic layers as abstraction layers, Teodora Petkova of Graphwise presented on how to embed meaning and consistency in content to support knowledge graphs and shared understanding. Craig Johnson of Xemma presented on how research was done to obtain taxonomist-user input in designing a new taxonomy management system.

Connecting to other knowledge organization systems was a common topic, with presentations on the connections of taxonomies and ontologies by Steve McComb of Semantic Arts and Paul Appleby and Ravinder Singh both of Graphifi, the intersection of taxonomies and terminologies by Jo Chapman, and taxonomies as metadata by Yonah Levenson.

There were, of course, numerous sessions on AI use in taxonomy building. Ahren Lehnart spoke about the ways to identify the best concepts out of those being suggested by machine learning and LLMs.  Panos Mitzias of Squirro presented on how AI can help accelerate tasks like concept discovery, drafting structures, and enriching taxonomies, but success still depends on clear scoping, stakeholder engagement, and ongoing governance. Fran Alexander of Expedia presented on various considerations regarding the use of LLMs in taxonomy creation including, provenance, traceability, authoritativeness, context, and the use of multiple LLM agents. Fran, Bob, Kasenchak, and Stephanie Lemieux came together for an impromptu panel discussion on the use of AI in taxonomy creation (filling in for a cancelled speaker). They spoke on the various positive uses of AI and the ways in which AI was still not so good. I found this panel most interesting, so I decided to submit such a panel topic for Taxonomy Boot Camp in Washington, DC, this November

Sessions are not recorded, but most of the slides are available on the conference website. Ahren Lehnart also blogged on the conference themes. 

Conference Details

The joint conferences had a total of about 250 attendees, which compares with 170 for Taxonomy Boot Camp London only in the prior years. (It’s not possible to break out Taxonomy Boot Camp registrants only, since many chose a “all access pass” to both conferences.)  The international aspect was great, with representatives from 29 countries. 

For the first time, the London conference (Taxonomy Boot Camp and KMWorld jointly) had a nearby off-site networking drinks reception the evening after the workshops and before the main conference. The semi-enclosed rooftop bar was a great place to meet and mingle. 

The conference facility venue location was better than previous years, being in central London, close to the Tower of London. The only issue is that the conference organizers were not sure how many attendees to expect, so they were conservative with the space, which turned out a little tight. Although there was enough seating the conference session rooms (barely), the showcase area, which was also where breakfast, lunch, and break refreshments were served, became quite crowded at times. So, it was challenging sometimes to meet people and visit all the exhibitors at times.

The vendor showcase was larger, and had better dedicated space, compared to the former Taxonomy Boot Camp London in-person events. I recall the 2-3 vendors back then having tables just outside the conference room doors. The dedicated showcase space where breakfast lunch and coffee breaks were served was a benefit for the exhibitors. As the venue was in the basement level, excavated ancient Roman walls were on display behind the exhibits. More taxonomy/ontology software vendors were present than in the past: Graphwise (formerly PoolParty), Squirro (vendor of Synaptica), Graphifi (vendor of Graphologi), and a brand new entrant Xemma. The taxonomy/ontology vendors were mixed in with the knowledge management vendors without distinction, and it was good to have this cross-over to learn more about what is available.

Taxonomy Boot Camp in London and the United States

The scope of subjects and themes of Taxonomy Boot Camp London are the same as at Taxonomy Boot Camp in the United States, but the many of the presenters are different with different case studies and stories to tell, and those presenters who are the same (like myself) do not give the same presentations at both conferences. The attendees (delegates) are also different. So, if you're just getting started with taxonomies, either Taxonomy Boot Camp London, or Taxonomy Boon Camp in Washington, DC, whichever is more convenient, is appropriate. If taxonomies are your profession, then you should try to attend each conference at least once. It’s worth the trip. I am looking forward to Taxonomy Boot Camp London / KMWorld Europe next time in April 2027.

Helen Lippell reflected: “I thoroughly enjoyed seeing the event come to fruition after all the hard work the team put in over the last year, and one of my abiding memories will be walking around after the last sessions seeing everyone just chatting away while the venue staff tried to tidy up! I take this as a sign of our community being in rude health and ready to grow in future years.”



 

 

 



Monday, February 23, 2026

Taxonomy Sources: Re-Used, Licensed, or AI-Generated

As a taxonomist, I often write about creating taxonomies from scratch, but in practice, many organizations often obtain at least some taxonomies or controlled vocabularies from other sources.  Although internal content about an organization’s business, products, or services requires mostly custom taxonomies, some taxonomies, such as for regions or technologies, may come from other sources. Content that comes from external sources, such as research articles, is also be appropriate for tagging with taxonomies from other sources.

For “other sources,” these could be:

  • Governmental agencies or nongovernmental organizations which publish taxonomies, thesauri, and subject heading schemes for their purposes but which are freely available

  • Companies which sell their taxonomies

  • Taxonomies that are generated by AI

computer monitor with an implemented faceted taxonomy in its screen

Taxonomies for Re-Use or License

Types of taxonomies available can be categorized in multiple ways that overlap:

  •  available for free or for a fee
  •  available for commercial re-use or not available for commercial re-use
  •  permissible for modification or not permitted to modify
  •  designed a created for a specific content set or intended for broader use

I had previously blogged on taxonomies for license, discussing the issues of fees, availability for re-use, and permission for modification. Now I want to focus on the issue of using a taxonomy created for a specific purpose. 


Recently, I worked for a client that had created taxonomies for the life sciences industries with sections based on branches on the National Library of Medicine’s Medical Subject Headings (MeSH), because it was free. MeSH, however, had been designed for indexing medical research literature, and turned out not to be suitable for my client’s purpose of helping biomedical and pharmaceutical companies find articles relevant to their business and market.

For example, MeSH organizes drug types by their chemical types (Heterocyclic Compounds, Enzymes and Coenzymes, etc.). For a biomedical drug discovery company or a pharmaceutical company, however, the focus and classification of drugs is instead based on what kind of disease they treat (Cancer Drugs, Alzheimer’s Drugs, etc.). Thus using concepts from MeSH is not so suitable for pharmaceutical industry taxonomy.


Previously, I worked at Gale, which developed and managed many controlled vocabularies (or taxonomies) for indexing periodical and reference literature, which it sold to libraries. For a time, Gale also offered for license subject-domain subsets of its subject thesaurus of over 10,000 preferred terms. I realized that the business terms to index articles in business news sources were not necessarily the same terms that a company would want to tag its business documents and intranet pages. Others seemed to realize this too, and Gale didn't sell any stand-alone taxonomy licenses as long as I worked there. 


Taxonomies that are designed purely for sale and not designed with specific content and user type in mind are more suitable for licensing and re-use. I’ve seen a few small scale examples of this with sets of keywords for sale for tagging photos. The only commercial business I am aware of that licenses full taxonomies (with alternative labels and multiple hierarchies) in various business and industry domains is WAND. These taxonomies, which are also enriched with alternative labels (synonyms/variants) are a decent way to get started. The taxonomies can then be edited or supplemented as needed. WAND taxonomies, which are manually developed, are particularly useful for product and services categories in various industries.

AI-Generated Taxonomies

When I first explored the use of GenAI to create taxonomies (described in my prior blog post), I felt that the results were quite inadequate, as LLMs were pulling from multiple sources, where the same term could have different meanings in different contexts, different terms could refer to the same thing, and even the hierarchy would vary for different use cases.


More recently, I’ve used ChatGPT and Claude and found that the results, especially when focused in areas of science, technology, and medicine, have improved with respect to specific taxonomy hierarchies. Even when I did not ask for a taxonomy, the LLMs often return respectable three-level hierarchies of concepts in such topic areas as medical devices, drug types, and cell receptors. I also found AI tools useful for disambiguating similar terms or providing synonyms for technical terms I was not sure of. 


AI-generated taxonomies are a potential competitor to WAND’s taxonomies for sale, but this depends on the size and subject area. The WAND taxonomies are large and detailed in the number of concepts, hierarchical levels, alternative labels, and they have already been expertly created by humans. Using AI to create taxonomies works better on single hierarchical trees, and always requires human editing to refine and complete the taxonomies. Hierarchies and alternative labels are created in separate steps. For multiple smaller taxonomies or taxonomy facets, AI is likely the more practical option than licensing full taxonomies. 


So, it shouldn’t be a surprise that taxonomy management software is starting to integrate GenAI and LLMs to automate taxonomy creation. For example, Graphwise Modeling (formerly PoolParty) introduced a Taxonomy Advisor feature in 2024, which allows users to request suggestions for narrower concepts, alternative labels, and definitions. This month, Graphwise announced the additional Taxonomy Builder feature, which enables the generation of a complete taxonomy hierarchy. It can be used for small portions or larger portions of the taxonomy, as needed, and it’s convenient to have the capabilities within a single tool. It also takes care of the prompt creation, based on the existing hierarchy and the user-entered description of the taxonomy and any additional instructions. I do not create taxonomy hierarchies with AI tools often enough to become good at writing the best prompts, so I appreciate it when a tool helps with that. There will be more about this later, as I working on white paper and will be speaking in a webinar in April on GenAI/LLMs in taxonomy creation. 

When to use Other Sources

As mentioned previously, taxonomies published from external sources are best used for content from external sources. When it comes to AI-generated taxonomies, though, it’s not necessary to generate an entire taxonomy, hierarchy, or facet. AI methods are quite suitable for smaller components of a taxonomy, such as narrower concepts to a single concept. As such, AI uses in taxonomy development are more widely applicable, including for enterprise taxonomies. For example, AI could be useful for generating a list of document types for a document type facet, and then after review, those AI-suggested document types that are not applicable can be removed. The starter list of terms can get people thinking of what might be missing, which is easier than trying to come up with a list of terms from scratch. 


In conclusion, an AI-generated taxonomy, after human review and editing, is usually a better solution than a licensed taxonomy that was created for a different purpose, such as using MeSH for the commercial side of healthcare. A taxonomy that is partially generated by AI or fully generated by AI that uses multiple sources and appropriate prompts (such as what is built into Taxonomy Builder) is typically a better source than a taxonomy that was created for a specific and different use case or than a taxonomy whose license prohibits editing or commercial re-use. If you choose to generate taxonomies with AI, I am happy to offer my services to review and edit them!

Saturday, January 31, 2026

What a Taxonomy is Not

Although taxonomies have become increasingly common within enterprises and on websites, they are not always well understood. Taxonomies are sometimes confused with other knowledge organizations systems, such as classification systems, website navigation schemes, business glossaries, or ontologies.


A taxonomy is a controlled, structurally organized set of unambiguous concepts, which may describe content, information, or data, and which users may be interested in querying about. A taxonomy links users to the information they seek by bringing together various users’ terms with the terms that occur in the content or data. Prior to the emergence of modern taxonomies in applications for digital information, indexes at the back of printed books had been serving a similar role (and they still do). I have already written a blog post on Taxonomy Definition, so to further clarify what taxonomies are, it is also useful to explain what taxonomies are not

 


Taxonomies are not the same as classification systems/schemes (such as industrial classification codes for economic analysis or medical classifications for health data collection or health insurance purposes), as the latter have mutually exclusive classes to which items are assigned for non-redundant analysis. Classification thus allows comparison, analysis, identification, location, and other actions associated with things based on their class. Taxonomies are organized sets of concepts tagged to content or associated with data, where the taxonomy organization serves merely for finding the desired concept or providing context for tagging. Thus, a concept may have more than one broader concept and thus appear in more than one place in the taxonomy hierarchy. 


Taxonomies are not the same as navigation systems, which are common in websites or web applications. A taxonomy is more similar to an index, while a navigation system is more similar to a table of contents. Menu labels in a navigation can link to only one page, whereas concepts in a taxonomy are tagged to multiple pages, content items, or data records. Navigation systems are only used in browsing, but taxonomies may be both browsed and searched for their concepts. Navigation systems reflect paths and established links to content, whereas taxonomies comprise concepts that become metadata when tagged to content. Navigation systems, like classification systems, are not frequently or easily changed, whereas taxonomies can grow and change continuously, as needed.


Taxonomies are not the same as business glossaries, which are lists of terms of relevance to an organization’s business along with their definitions, although there is usually considerable overlap between the terms an organization gathers for its glossary(s). Not only is there usually the difference of a taxonomy’s hierarchical structure (although categories could be assigned to glossary terms), but the ultimate objectives differ, resulting in differences of scopes of term inclusion. A business glossary includes all terms of importance to the business but may not be understood by everyone, so definitions need to be provided. There could be terms of importance, that need no definition, such as Marketing, so they are not included in the glossary. Technical terms and acronyms are usually included. A taxonomy, on the other hand, includes only the terms/concepts of which there are sufficient documents, pages, or content items to be tagged for retrieval. Sufficient content on a subject is a leading criteria for including a concept in a taxonomy.


Finally, taxonomies are not the same as ontologies. The confusion between the two may arise because taxonomies and ontologies are increasingly used in combination, and software (now referred to as TOMS for taxonomy-ontology management system) allows you to create a taxonomy and ontology as a single project or knowledge model. An ontology can be an upper-level model of a knowledge domain, but domain-specific ontologies may include multiple hierarchical levels of subclasses, and thus include what are essentially taxonomies. A taxonomy, however, can stand on its own without an ontology and serve the functions of tagging and retrieval via browsing and/or searching without the extension of an ontology. Ontologies support complex, multi-part queries involving reactions, and they support reasoning and inference, which taxonomies do not. Each utilizes different data models: SKOS for taxonomies and RDFS and OWL for ontologies. 


Prior blog posts I have written that compare taxonomies to other knowledge organization systems in more detail are: