The Accidental Taxonomist: Hierarchical taxonomy

Showing posts with label Hierarchical taxonomy. Show all posts

Wednesday, May 20, 2026

Hierarchies and Attributes in Taxonomies

One of the challenges in creating hierarchical taxonomies is that there can be multiple ways to categorize concepts and thus design hierarchies. There are multiple methods to deal with this, including polyhierarchy and facets. Now that taxonomies are more often extended with ontologies, attributes can also be used for additional “classifications” of things.

Dealing with multiple hierarchies

The traditional method of dealing with multiple methods of categorizing concepts has been to

put the concepts into a “polyhierarchy,” which means the concept has more than one broader concept, and thus belongs to more than one hierarchy. The occasional polyhierarchy is acceptable, but if a polyhierarchy becomes extensive (numerous concepts belong to the same two hierarchies) due to different methods of classification, this does not serve the purpose of helping users find the concepts and tagged content desired. When everything is in a polyhierarchy, the guiding purpose of a hierarchy gets lost.

When the issue is multiple classifications for things, then what is known “faceted classification” is often the answer. A faceted taxonomy design involves designating a facet for each method of classifying things by. For example, products may have facets for brand name, product type, functional use/application, industry market, user type, etc. Each of these could be a facet for products.

Sometimes, however, there may seem to be more possible ways of organizing or classifying something than are practical for facets. It could be within a facet. For example, if you have a facet for product type, you could further classify the product types by product family, by generic product type (narrower “is a” sub-type of the broader), by broader system of which they are a component (narrower is a part of the broader), by size, or by a certain key feature or characteristic.

Recently on a project, a client suggested an added level of hierarchy within the facet for named product models for a classifying feature that impacted the product size. The problem was that this would combine named entities (proper nouns) of product models and generic types within the same facet. This combination should be avoided in facet design, because facets enable users to search and filter by different methods, such as either by name or by type, and there are scenarios when users would choose one over the other. Combining types and named entities in the same facet can cause confusion. This is where an ontology model may be the solution.

Ontologies for further classification

Ontologies enable customized relationships between classes (which tend to be the same type of high-level grouping as a facet) and customized attributes for members of classes. When we think of ontologies, we usually think of the custom relationships, but custom attributes can support what could be considered “types.” These “types” might have been extra hierarchies, and thus attributes provide a solution to the multiple classification problem.

If multiple methods of hierarchical classification seem to be overlapping, you should consider making one or more attributes instead. In my recent consulting case example, what the client originally proposed as top concepts for grouping product models (as a classifying feature impacting the product size), we decided would work better as an attribute of the product models. So, the facet would contain only named entity product models, and the hierarchy would be by model family only.

When an ontology is defined as a formal naming and definition of the types, properties and interrelationships of entities in a particular domain, we might think we have to define everything in the domain, and thus creating an ontology is a large, complex project. Often, what we need is only “some” ontology. While using the features, rules, and data model of an ontology, we need to define only the types, properties, and interrelationships that need to be defined for a business purpose. This could be defining just a few custom attributes (properties) without even adding any custom relationships.

More information about attributes in is my prior blog post. "Taxonomies and Attribute Data."

Examples

In the prior example, the product model feature had originally been proposed for the hierarchy for the purpose of “grouping,” because users might want to look up the product models by that feature. If implemented in a knowledge graph, the attributes, managed in an ontology, will also support users looking up entities by their attributes. So, the hierarchical design is not necessary.

Any “groupings” of named entities (by region, size, role, etc.), should be reconsidered as attributes of the named entities. Other examples are groupings of vehicles by engine type, which could have engine type as an attribute instead, or groupings of appliances by energy type, which could have the fuel type as an attribute instead. So, instead of Electric cars narrower to both Cars and Electric vehicles, Electric, Internal combustion, and Hybrid would be attributes for Cars.

Conclusions

Shared data model standards based on RDF (Resource Description Framework) and the use of dedicated taxonomy/ontology management software that combines taxonomies with ontologies make this solution of using ontology features to resolve multiple hierarchies easy to attain. Instead of thinking that we could extend a taxonomy into an ontology in the future, we should be thinking of how to design a knowledge model now that best serves the body of knowledge and the users.

Thursday, September 18, 2025

Narrower Terms vs. Alternative Terms

A number of years ago I worked on a project of cleaning up a large taxonomy on occupations and job titles. My client contact was sometimes confused between terms to be used as synonyms/variants for a preferred term and terms to be used as narrower terms to a preferred term. This initially surprised me, because the difference seemed so obvious. A more recent project raised the issue again, and I realize challenges.

The word “term” can be confusing, considering the different types of terms that exist. Both variant terms (also called synonym, nonpreferred terms, or entry terms) and narrower terms are kinds of terms. By contrast, focusing on concepts that may have various labels, the distinctions between a concept’s narrower concepts and its alternative labels is quite clear. The widely adopted SKOS (Simple Knowledge Organization System) data model standard follows the concept-based approach. SKOS is now followed by all dedicated taxonomy management software systems.

Many taxonomies, however, are not yet managed in dedicated taxonomy management systems but rather in spreadsheets or internally developed tools, neither of which follow SKOS. This is the case of both my projects in question. Each “term” in the spreadsheet-based tool had its own row, which resulted multiple rows for the same concept. Broader categories were in another column to the right. This format is potentially confusing because the variants appeared in a column as did the hierarchical levels, and you had to remember which column was which.

Regardless of the tool used, what makes it even more confusing is that a narrower concept could be either a variant term or a hierarchically narrower term. What may variously be called synonyms, variants, nonpreferred terms, entry terms, or alternative labels are not merely literal synonyms, but they could be any terms or labels that may be used in tagging to trigger the use of the concept or preferred term. This includes terms whose meaning is narrower or more specific than the term/concept in question, since the latter includes more specific terms within its scope. So, tagging the occurrence of a concept with a “broader” concept is acceptable.

For example, in a medical taxonomy a concept can be Radiation therapy. Radiotherapy is an alternative label. But then there are specific types of radiation therapy, such as Brachytherapy, Radioimmunotherapy, and Radionuclide therapy. These could be added to the taxonomy either as narrower concepts or as alternative labels to Radiation therapy, depending on how specific the taxonomy should be.

When creating or editing a taxonomy, it is often difficult to decide how specific the taxonomy should be in certain places. Terms that are too specific to warrant use as concepts should then be relegated to the status of variants/alternative labels. Deciding what is too specific depends on the concept’s relative specificity within the entire taxonomy in addition to considering the potential usage of the specific concept.

In sum, if you are not ready to adopt SKOS-based taxonomy management software, at the very least you should adopt a SKOS-based approach in conceptualizing and labeling your taxonomy. Call things “concepts” and “labels”, not “terms.” Concepts are in hierarchical relationships to each other. Labels are the names for concepts. The “preferred label” is the displayed form of the name (such as in facets in the fronted application), and “alternative labels” are variant labels to match against strings of text that may be used for the concept and trigger tagging with the concept. Furthermore, alternative labels could be displayed differently from preferred labels, such as in italics and/or a different colored shaded cell.

Monday, March 31, 2025

Customizing Taxonomy Hierarchies

Taxonomies need to be custom-created for their purposes to be most effective. Basically, a taxonomy comprises the concepts or terms that reflect the subject domain of the content that will be tagged and retrieved with the aid of that taxonomy. Taxonomies must also be customized to the requirements (or limitations) of the implemented search technology and the user interface, and ideally the taxonomy is also customized to the needs and preferences of the users. This includes taxonomy design aspects of size, degree of detail, use of synonym/variants, use of hierarchy, and implementation as facets.

Taxonomy customization usually focuses on the concepts/terms/labels and not so much on the exact hierarchy of grouping narrower concepts under broader concepts, other than perhaps limiting the number of hierarchical levels. While the selection and definition of concepts depends on the context of the content, the hierarchical relationships between concepts are typically independent of any specific content and are usually dependent only on the context of the taxonomy itself. Such a context-independent hierarchy is what enables a single taxonomy to be used for multiple different content items of different content creators. This is also the approach used in designing classification systems, which are intended for broad, generic use.

Why Customize Hierarchy

However, a customized taxonomy may be designed for a rather specific body of content, and then the hierarchy may depend on the context of that overall body of content, if not the specific content items. For example, the concept “Piano” is often considered narrower to “Musical instruments”, but in certain contexts it may be narrower to “Furniture,” such as for the contexts of interior design, furnishing a bar or restaurant, or for moving and storage services. Furthermore, I would not always recommend that “Piano” be narrower to both broader concepts in the same taxonomy (a taxonomy feature known as “polyhierarchy”), because the same taxonomy might not be used for both contexts. It depends.

When structuring a taxonomy hierarchy, the use and purpose of the hierarchy needs to be considered. A hierarchy is not created simply because it’s a taxonomy and thus traditionally has hierarchy. Possible uses of hierarchy include:

Supporting browsing and navigation to guide users to the desired concept.
Providing context for concepts to support tagging, whether manual or automated.
Enabling “recursive” or “rolled up” retrieval, so that a user’s selection of a concept retrieves not only what was been tagged to that concept but also what has been tagged to all of its narrower concepts, too.
Enabling expansion of a search, so that if there are too few or no results for a specific concept, the retrieval set can be expanding to content tagged with the broader concept and/or other narrower concepts of it.
Instructing users on the appropriate classification and organization of information

Usually, the same hierarchy can support all of the above goals, although occasionally there are conflicting needs.

Customizing Hierarchy Example

The need for customizing hierarchy became especially clear to me in a recent taxonomy consulting project I did for the business of event venue space rentals. Types of spaces (structures, rooms, etc.) were grouped under broader concepts by their potential use, rather than by structural type. To a lesser extent, events or activities for spaces were also sometimes grouped by the type of space that might be suitable. For example, a generic taxonomy might include “Dance class” and “Technical training” both under the same broader concept for “Classes/training,” but because these different types of classes need different kinds of spaces, in this taxonomy they were put in different parts of the taxonomy hierarchy. “Dance class” was made narrower to “Dance event,” and “Technical training” was made narrower to “Training.”

The hierarchy of concepts used in a taxonomy to tag images may also be structured differently than a taxonomy for tagging text content. In this case, for example, broader concepts for grouping others had been created of “Small meeting” and “Large event,” which may not seem logically needed when the range in number of guests was an additional search attribute/filter. However, these concepts are quite useful for tagging images that may depict a small or large event but do not utilize counts of people. Another example is grouping together under the same broader concept the activities of music rehearsals/practices along with music performance events under the same broader concept of “Music events.” Although the activities of organizing rehearsals and organizing performances are quite different from each other, the venues that are suitable for each and their images are similar.

Despite their similarities in scope and concepts, a taxonomy for venue rentals should not be the same as a taxonomy for real estate of long-term lease or sale of properties (focusing on the space but agnostic to the use), nor for events management (focusing on the details of events and less so on space), nor equipment sales and rentals (focusing on the equipment and less on the use). Even when the concepts are the same, the hierarchy may differ. While the inclusion of concepts and their labels should consider the content, the design of the hierarchy should consider the taxonomy’s use.

Sunday, March 24, 2024

History of Modern Information Taxonomies

The word “taxonomy” was coined in 1813 by the Swiss botanist A. P. de Candolle, who developed a new method of classifying plants. The word is derived from the combination of Greek words τάξις (taxis), meaning “order” or “arrangement,” and νόμος (nomos), meaning “method” or “law.” The designation of taxonomy was then applied after-the-fact to Carl Linneaus’ binomial nomenclature system that had been published under the title Systema Naturae initially in 1735.

Today’s information taxonomies have their origins in a combination of classification systems, library subject heading schemes, and literature retrieval thesauri, and thus have features that combine all of these. Despite their name, information taxonomies are closer to subject heading schemes and thesauri, than they are to classification systems.

Classification systems

Classification systems have a multi-level hierarchy of classes, where a subclass is fully contained in its parent class, and consequently members of a subclass are also members of the parent class. Members (things) can belong to only one class, though. Historic examples include:

Linnaean classification of organisms (1735-1758)
Paris Bookseller's classification (1842)
International Classification of Diseases (originally Bertillon Classification of Causes of Death, 1860)
Dewey Decimal Classification (1876) and other library classifications
Industry classification systems:

Standard Industrial Classification System (U.S) (1937)
International Standard Industrial Classification (U.N.) (1948)

The requirement that a thing (an organism, book, document, medical diagnosis, economic establishment) can go into only one class supports various purposes, which are not for information retrieval:

Understanding and organism’s evolutionary background; identifying potential medicinal herbs
Locating and reshelving a book on its shelf
Performing heath data analysis from hospital records; billing health insurance companies appropriately
Doing economic analysis of industries by aggregate establishment data

When it comes to information resources, classification systems may be used to determine in what (virtual) file folder a document belongs or, to support machine-learning based auto-classification.

Classification systems are also useful for data analysis, since content or records are assigned to only one classification, and this prevents any double counting. Large, data-heavy organizations might have developed their own internal classification systems for data tracking purposes. Such classifications do not serve the same purpose of a tagging/information retrieval taxonomy and should not substitute for a taxonomy but rather exist alongside for separate purposes.

Subject heading schemes

Subject heading schemes were developed to help people find books and later also articles on various subjects with more detail and flexibility for growth than classification systems. Subject headings are used for cataloguing and indexing, not for classification. Unlike classification (for shelf location) of which an item has only one classification, an item (book, article, other media) can have multiple subjects.

Features of subject heading schemes:

Alphabetical arrangement of a very large number of subjects and/or named entities (proper nouns)
Cross-references of See (Use) and See also (Related)
Headings with large numbers of citations broken down to group the citations by a sub-heading or subdivision, in what is also called pre-coordination. For example, China – Foreign relations.

Back-of-the-book indexes, whose format evolved over the first half of the 20th century, follow a similar style.

Examples of early subject heading schemes:

Library of Congress Subject Headings (1898) and other national library systems
US. National Library of Medicine’s Medical Subject Headings (1954)

Library subject headings were adopted for periodical article indexes early on. The Reader’s Guide to Periodical Literature published by the H.W, Wilson Company had been using subject headings, including subdivisions and cross-references, since shortly after its introduction in 1901 (as can be seen in the 1900 -1905 cumulative index excerpted in the screenshot below).

(The two-digit years are from the prior century.)

Eventually, subject heading schemes adopted thesaurus features of Broader term, Narrower term, and Related term relationships, as was the case for Library of Congress Subject Headings, starting in 1985. Thus, subject heading schemes and thesauri have become very similar. The name “heading” in subject headings implies that there also exist some sub-headings/subdivisions, a feature which is not a typical of thesauri, though.

Thesauri

Information thesauri (in contrast to a dictionary thesaurus, like Roget’s) emerged in the mid-20th century outside of libraries for the more specialized subject needs of the federal government, scientific publishers, and technology companies. The word “thesaurus” was first used to refer to a controlled vocabulary, as a set of words/terms, not classification codes, for information retrieval in the 1950s.

Early thesauri include:

E. I. Dupont de Nemours Company’s thesaurus (1959)
Thesaurus of Armed Services Technical Information Agency (ASTIA) Descriptors, U.S. Department of Defense (1960)
Chemical Engineering Thesaurus, published by the American Institute of Chemical Engineers (1961)

Additional professional organization publishers of scientific journals created their own thesauri in the 1960s. Dialog, the first online information service for article citations, which also utilized thesauri of information publishers, was launched in 1966.

Soon thereafter, standards for thesauri were developed and published:

UNESCO Guidelines for the establishment and development of monolingual thesauri (1970)
DIN 1463 (Deutsches Institut für Normung) Guidelines for the establishment and development of monolingual thesauri (1972)
ISO 2788 Guidelines for the establishment and development of monolingual thesauri (1974) (superseded by ISO 25964-1 2011)
ANSI American National Standard for Thesaurus Structure, Construction, and Use (1974) (superseded by ANSI/NISO Z39.19 1993)

Modern information taxonomies

The word “taxonomy” for a hierarchical structure (like a classification scheme) of terms for tagging and retrieval (like a thesaurus) gradually became popular in the 1990s. These new taxonomy-like thesauri became popular, largely due to advancements of software and website user interfaces to enable interactive displays of hierarchies. Taxonomies had the same primary purpose of thesauri, which is information findability and retrieval, but taxonomy implementations introduced new designs for browsing and expanding hierarchies. It was found that “taxonomy” also tended to resonate with business audiences better than “thesaurus.” A market for business and commercial taxonomies started to be recognized by software vendors and by consultants by the end of the 1990s.

Combining an interactive user interface with a database enabled the introduction of dynamic filters or refinements of searches by selected taxonomy terms based on different aspects, and thus faceted taxonomies emerged and have since become a popular, if not dominant, implementation of taxonomies for many different use cases. Faceted taxonomies, by combining search terms for refinement, do not need to be as large and detailed as thesauri.

As for the next chapter in the history of taxonomies, that involves a convergence with ontologies. You can read more about that in my past blog article “Taxonomies vs. Ontologies.”

Monday, May 29, 2023

Taxonomies and ChatGPT

ChatGPT, generative AI, and large language models (LLMs) are hot topics of interest in fields of data, information, and knowledge management. LLMs dominated the keynote presentations at the networking conversations at Knowledge Graph Conference in New York and were also discussed in presentations and panels of this conference and Data Summit in Boston, both of which I attended this month. The technology is relevant to taxonomies as well.

ChatGPT is the user interface application on top of GPT (Generative Pre-Trained Transformer), a publicly available LLM developed by OpenAI, which is now in version 4. ChatGPT is thus a form of generative AI, in how it generates answers. There are many other LLMs (Neural network-based AI, trained with deep learning on very large volumes of text), including those which are proprietary, restricted, or for non-commercial research, but only some have generative AI user interfaces. Although we may think of generative AI for providing answers to questions, it can do a lot more, including tasks related to taxonomies.

Organizing terms into hierarchies

Building a taxonomy is a combination of top-down design (identifying the top concepts or facets) and bottom-up building (identifying specific concepts from content analysis). The top-level of a taxonomy is designed to serve user needs and thus should be based on stakeholder interviews, surveys, and brainstorming workshops, which is not something ChatGPT can do. The bottom-up building a taxonomy, based on terms extracted content or search log terms, may benefit from some AI involvement.

I have made a few test requests of ChatGPT for “Put the following list of terms into a hierarchical taxonomy…,” and the results are bulleted lists with indented narrower concepts. ChatGPT can also generate a taxonomy in a machine-readable SKOS in a requested RDF serialization format, as Bob DuCharme explained in his May 20 blog post “Getting ChatGPT to turn a flat vocabulary list into a hierarchical taxonomy.”

Like card sorting exercises, you can specify the top categories/concepts (like a “closed card sort”), or you can let ChatGPT create the top categories (like an “open card sort”). In any case, better results are with context, of course, so you should also tell ChatGPT what the subject domain or context is. Asking for a hierarchical taxonomy results in a third level of hierarchy sometimes, and not just a single level of grouping. Near duplicates usually appear next to each other in the list, and the taxonomist can then decide if and how to merge them into a single concept.

It is particularly for long lists of terms, where automated methods can save the taxonomist’s time. If a taxonomist comes up with terms based on manual content analysis, stakeholder interviews, or submitted lists from subject matter experts, the term lists tend not to be very long, and even the process of coming up with the terms tends to include some thoughts toward categorization at the same time. Longer term lists (such several hundred) are derived from automated term extraction (using text analytics technologies) across a corpus of dozens or hundreds of documents and from search log reports. ChatGPT is practical for putting these long lists of terms into draft hierarchies. There are inevitably some taxonomic errors in the results, which should be obvious to any taxonomist. For example, I have seen duplicated terms on different levels of the hierarchy.

In both lists of extracted terms and search log lists, terms occur that are not suitable as concepts for a taxonomy, such as verbs and adjectives or vague words. ChatGPT understands grammatical rules, so my prompt also says “Include in the taxonomy only nouns and noun phrases and omit the other terms.”

Generating alternative labels (“synonyms”) for concepts

Asking ChatGPT to “provide a list of synonyms for…” a given term can also be helpful for coming up with alternative labels for taxonomy concepts. Alternative labels should be customized for the context of the content and users, so alternative labels for a concept will vary from one taxonomy to another, and an external source, such as ChatGPT should not relied upon as the only source for alternative labels, but merely as a supplemental source of suggestions to be considered.

Again, context can help and should be provided. I asked “Provide a list of synonyms for “healthcare” and got 20 terms. But then when I asked “Provide a list of synonyms for health care, meaning the industry,” I received a slightly more focused list of 15 terms. Interestingly, the two-word variant “health care” was not on the list, so “synonyms” is understood by ChatGPT to mean different words with the same meaning and not orthographic variations. Nevertheless, even 15 terms are too many, and the taxonomist should select from the list of suggestions. It might be a good idea to then test search the suggested alternative labels in the content and system being used.

Although by strict definition a “synonym” is a single word with the same meaning as another word, ChatGPT provides acceptable synonyms for terms which are multi-word phrases, or synonymous multi-word phrases, such as “Chemical manufacturing and distribution” provided as a synonym for “chemical industry.”

Other taxonomy-related uses of ChatGPT

Getting help in designing an ontology (a more complex, yet high-level semantic model with defined classes of concepts, customized relationships, and attributes) is also possible with ChatGPT or other LLMs. Again, submitting the request multiple times with slight variations will yield multiple different responses for the ontologist to consider and select ideas from. Ontologies are not expressed in simple text, though, so the prompt request should specify it, such as RDF TTL. Dean Allemang, author of Semantic Web or the Working Ontologist, has written multiple articles (medium.com/@dallemang) recently on ChatGPT and ontologies/knowledge graphs.

ChatGPT can also be used for comparing lists of terms, data conversion, and basic coding, which may be useful for taxonomists who lack coding skills. It can convert taxonomy or ontology data from one data format to another (although taxonomy/ontology management software also imports/exports in multiple formats). Taxonomies and ontologies in their raw data format are most commonly expressed in the RDF (Resource Description Framework) data model which has various serialization format: RDF/XML, JSON, JSON- LD, .ttl (Turtle), etc., and ChatGPT can convert data from one to another. Data extraction can also be done with ChatGPT. For example, knowledge management professional Camille Mathieu recently shared in a LinkedIn post how she used ChatGPT to write a Python script to extract text & metadata from PDFs.

Perhaps what is most intriguing as a future implementation of taxonomies and ChatGPT is to go in the other direction and have knowledge organization systems, such as taxonomies, support the creation and use of queries (as called “prompts”) for generative AI, to obtain better results. This requires some back-end development, though, and is not merely a matter of putting a taxonomy into a prompt. Since a taxonomy is created for a specific subject domain, the questions need to be confined to the domain of the taxonomy. Semantic Web Company has developed a simple publicly accessible demo “PoolParty Meets Chat GPT,” whereby you can compare the results of questions you ask in the subject area of ESG (Environmental, Social, and Governance) that are submitted directly to ChatGPT and with those which are filtered through an ESG taxonomy and knowledge graph (managed in PoolParty software) so that the questions are enriched before being sent to ChatGPT. The semantically enriched questions generate answers that have more detail, better accuracy, and even web links to definitions and other articles.

Conclusions

While it’s arguable whether ChatGPT alone is a good way to obtain “facts,” there is no doubt that it is a good way to get suggestions and ideas. These suggestions can support the work of taxonomists and ontologists, and taxonomies and ontologies in turn can support the results of ChatGPT and other LLMs. Because there will be errors from ChatGPT, it should not be used to generate taxonomies by those who are not already knowledgeable with taxonomy requirements and best practices, nor should it be used as a substitute for the expertise of taxonomists.

I hope to experiment more with ChatGPT for taxonomies and share additional details in future blog posts.

Friday, December 30, 2022

Taxonomy Definition

I usually explain that a taxonomy is a structured kind of controlled vocabulary, which is list of terms (or concepts) usually used to tag content to aid in its retrieval. The structure can be hierarchical, faceted, or a combination. Other people have defined taxonomies for a general audience in more simplistic ways as a kind of hierarchical classification system. So, while a taxonomy has two main features (naming and structure), my preferred definition has focused on the controlled vocabulary and naming aspect, whereas other definitions focus on the hierarchical classification aspect of taxonomies. However, a taxonomy and a classification system are not necessarily the same. While it is understandable that a definition is simplified for a general audience, it should not be simplified to the extent of being misleading.

I have blogged previously on the differences between taxonomies and classification systems, so I won’t repeat all the differences again. The main point is that a classification system is generic and rigid and is intended to be used widely, such as the Dewey Decimal Classification for libraries, whereas a taxonomy tends to be customized for a particular use case and context and is flexible and undergoes changes.

Meanwhile, there are also a few well-known classification systems that are called “taxonomies,” such as the Linnaean taxonomy of organisms and Bloom’s taxonomy of educational objectives. These seem quite different from the information-retrieval type of taxonomy. The Linnaean hierarchical levels have names (Kingdom, Phylum, Class, etc.). The relationship of the hierarchical levels to each other are not all of the thesaurus standards: generic-specific, generic-instance, or whole-part. Rather, the Linnaean taxonomic relationship are generic-specific only, or more precisely that of member of class or subclass. Bloom's taxonomy has a completely different hierarchical model that does not follow thesaurus standards at all.

How does a taxonomy of concepts for information retrieval relate to a scientific taxonomy? They are similar, and the differences are not so great that there should be considered different meanings of the word “taxonomy.” If we consider that taxonomies are systems to name and organize things hierarchically, then a taxonomy for information retrieval, comprised of terms for tagging and retrieving content (documents, images, etc.), can be considered a taxonomy of a controlled vocabulary, in contrast to taxonomies of things, such as organisms. This is a slightly different perspective than to consider a taxonomy as a kind of controlled vocabulary, as I previously had. The following diagram illustrates a possible way to consider how information-retrieval taxonomies related to classification systems and controlled vocabularies.

Diagram showing that information taxonomies are at the interssection of classification systems and controlled vocabularies

Several kinds of knowledge organization systems are defined by their published standards. For thesauri, there are ANSI/NISO Z39.19 and ISO 25964. For terminologies, there is ISO/TC 37/SC 3 and other related standards. For ontologies, there is OWL (Web Ontology Language) from the W3C. There is no standard, however, specifically for “taxonomies” or even for “classification systems,” which is a reason why these remain difficult to define. The designations “classification system,” “classification scheme,” and “taxonomy” have been used interchangeably.

Wikipedia provides the definition at the entry for Taxonomy: “A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types.” But then it goes on to say, “it may refer to a categorisation of things or concepts.” Thus, an information-retrieval taxonomy is a categorization of concepts (also called terms in a controlled vocabulary). It is not a classification system, since the goal is not to classify things, not even the things tagged with the taxonomy concepts, but rather to organize the set of concepts that have been identified as appropriate for tagging and retrieving a set of content.

Wednesday, May 20, 2026

Hierarchies and Attributes in Taxonomies

Dealing with multiple hierarchies

Ontologies for further classification

Examples

Conclusions

Thursday, September 18, 2025

Narrower Terms vs. Alternative Terms

Monday, March 31, 2025

Customizing Taxonomy Hierarchies

Why Customize Hierarchy

Customizing Hierarchy Example

Sunday, March 24, 2024

History of Modern Information Taxonomies

Classification systems

Subject heading schemes

Thesauri

Modern information taxonomies

Monday, May 29, 2023

Taxonomies and ChatGPT

Organizing terms into hierarchies

Generating alternative labels (“synonyms”) for concepts

Other taxonomy-related uses of ChatGPT

Conclusions

Friday, December 30, 2022

Taxonomy Definition

Subscribe to The Accidental Taxonomist Blog