The Accidental Taxonomist: Taxonomy definitions

Showing posts with label Taxonomy definitions. Show all posts

Monday, June 1, 2026

Is a Taxonomy an Ontology?

At last month’s Knowledge Graph Conference, in addition to knowledge graphs and graph databases, there is a growing interest in ontologies, but the role of taxonomies does not seem so well understood. For example, in one presentation I attended, it was said "you get synonyms/alternative labels into a knowledge graph via ontologies," rather than mentioning taxonomies. More than one person asked me: isn’t a taxonomy a kind of ontology?

The fact that, technically, SKOS (the data model for interoperability used for taxonomies) has been designed as upper ontology, can lead to the conclusion that all taxonomies modeled on SKOS are then domain ontologies, as they are instances of the SKOS upper ontology. However, that is a more theoretical way, than a practical way, to look at taxonomies.

When I write or speak about taxonomies, I aim to be practical. While theoretically a taxonomy is a kind of ontology, in practice it is not, and maintaining a distinction helps clarify how each a taxonomy and an ontology can improve on each when they are combined.

If you are an ontologist and see everything through the lens of ontologies, then you probably consider that a taxonomy is a simple type of ontology that merely does not utilize all the features of a full ontology. If an ontology is simply defined as a knowledge model that has classes (things), relationships between the things, and attributes as properties of the things, then, yes, a taxonomy is a kind of ontology. It has concepts, hierarchical relationships, and often other attributes for concepts, that typically merely definitions, scope notes, or other notes.

The problem of calling any taxonomy an ontology is that the benefits of semantically enriching a taxonomy with an added ontology or extending an ontology with a taxonomy might not be well understood. We add an ontology to a taxonomy in order to provide customized semantic relationships and attributes of all kinds. Additionally, basing the added ontology on OWL (Web Ontology Language) enables capabilities of inferencing and reasoning.

Furthermore, saying that a taxonomy is an ontology could lead to less than sufficient attention to the taxonomy features that ontologies alone lack. These features include alternative labels and hidden labels that match variants in both tagging and user searching, equivalent foreign language labels for concepts, concept schemes that can be implemented as search facets, and distinct fields for definitions and different kinds of notes that are standardized for interoperability.

If following the Semantic Web’s stack of data model recommendations, then a taxonomy can be defined as what is built on SKOS (Simple KnowledgeOrganization System), and an ontology is defined as what is built on RDFS(RDF-Schema) and OWL (Web Ontology Language). I find that a very clear explanation of the difference between taxonomies and ontologies to those who are familiar with ontologies. These different data models may be integrated within the same knowledge model, and that’s how we get taxonomies extended with ontologies or ontologies extended with taxonomies.

We might call taxonomy-ontology combinations “knowledge models” or “semantic models.” If the model has mostly taxonomy (SKOS-based) data, such as a large taxonomy with a little ontology added, it is best called a taxonomy, and if the model has mostly ontology (RDFS and OWL-based) data, such as a large ontology with some taxonomy data, it is best called an ontology.

The organizers of the Knowledge Graph Conference understood the distinct role of taxonomies in knowledge graphs and thus welcomed me again to present a tutorial specifically on taxonomies.

Saturday, January 31, 2026

What a Taxonomy is Not

Although taxonomies have become increasingly common within enterprises and on websites, they are not always well understood. Taxonomies are sometimes confused with other knowledge organizations systems, such as classification systems, website navigation schemes, business glossaries, or ontologies.

A taxonomy is a controlled, structurally organized set of unambiguous concepts, which may describe content, information, or data, and which users may be interested in querying about. A taxonomy links users to the information they seek by bringing together various users’ terms with the terms that occur in the content or data. Prior to the emergence of modern taxonomies in applications for digital information, indexes at the back of printed books had been serving a similar role (and they still do). I have already written a blog post on Taxonomy Definition, so to further clarify what taxonomies are, it is also useful to explain what taxonomies are not.

Taxonomies are not the same as classification systems/schemes (such as industrial classification codes for economic analysis or medical classifications for health data collection or health insurance purposes), as the latter have mutually exclusive classes to which items are assigned for non-redundant analysis. Classification thus allows comparison, analysis, identification, location, and other actions associated with things based on their class. Taxonomies are organized sets of concepts tagged to content or associated with data, where the taxonomy organization serves merely for finding the desired concept or providing context for tagging. Thus, a concept may have more than one broader concept and thus appear in more than one place in the taxonomy hierarchy.

Taxonomies are not the same as navigation systems, which are common in websites or web applications. A taxonomy is more similar to an index, while a navigation system is more similar to a table of contents. Menu labels in a navigation can link to only one page, whereas concepts in a taxonomy are tagged to multiple pages, content items, or data records. Navigation systems are only used in browsing, but taxonomies may be both browsed and searched for their concepts. Navigation systems reflect paths and established links to content, whereas taxonomies comprise concepts that become metadata when tagged to content. Navigation systems, like classification systems, are not frequently or easily changed, whereas taxonomies can grow and change continuously, as needed.

Taxonomies are not the same as business glossaries, which are lists of terms of relevance to an organization’s business along with their definitions, although there is usually considerable overlap between the terms an organization gathers for its glossary(s). Not only is there usually the difference of a taxonomy’s hierarchical structure (although categories could be assigned to glossary terms), but the ultimate objectives differ, resulting in differences of scopes of term inclusion. A business glossary includes all terms of importance to the business but may not be understood by everyone, so definitions need to be provided. There could be terms of importance, that need no definition, such as Marketing, so they are not included in the glossary. Technical terms and acronyms are usually included. A taxonomy, on the other hand, includes only the terms/concepts of which there are sufficient documents, pages, or content items to be tagged for retrieval. Sufficient content on a subject is a leading criteria for including a concept in a taxonomy.

Finally, taxonomies are not the same as ontologies. The confusion between the two may arise because taxonomies and ontologies are increasingly used in combination, and software (now referred to as TOMS for taxonomy-ontology management system) allows you to create a taxonomy and ontology as a single project or knowledge model. An ontology can be an upper-level model of a knowledge domain, but domain-specific ontologies may include multiple hierarchical levels of subclasses, and thus include what are essentially taxonomies. A taxonomy, however, can stand on its own without an ontology and serve the functions of tagging and retrieval via browsing and/or searching without the extension of an ontology. Ontologies support complex, multi-part queries involving reactions, and they support reasoning and inference, which taxonomies do not. Each utilizes different data models: SKOS for taxonomies and RDFS and OWL for ontologies.

Prior blog posts I have written that compare taxonomies to other knowledge organization systems in more detail are:

Wednesday, July 31, 2024

Subject Headings vs. Taxonomies

When I spoke about taxonomies at the recent SLA (Special Librarians Association) annual conference, I was asked how a taxonomy differs from a subject heading scheme. Librarians are very familiar with subject headings, which are used to catalog books and other library materials. This is an interesting question, which I answered briefly in my presentation session, but I’d like to explain further.

I have previously written about how a taxonomy differs from a classification in “Classification Systems vs. Taxonomies” Taxonomies are more similar to subject heading schemes. Libraries use both classification systems (such as the Dewey Decimal), which are for determining the physical location of books and other library materials on shelves based on their codes, and subject heading schemes (such as Library of Congress Subject Headings), which are used to identify books and other materials by their specific subject matter. The same subject could be used to catalog books and materials of different types (nonfiction, fiction, sound recordings, children’s) with very different classifications.

How Taxonomies and Subject Heading Schemes are Similar

Taxonomies and subject heading schemes are both considered types of controlled vocabularies, and they share similar uses and features. They both serve users who are looking up subjects to find information or resources available on the subject, rather than (or not yet) for identifying the physical location of the resource. In addition, they both:

have structures, but their focus is on the concepts
can be both searched and browsed
exist for both general and specific subject domains (Medical Subject Headings (MeSH) published by the National Library of Medicine is an example of a specific subject-domain subject heading scheme.)
have some structured, thesaurus-type of relationships between terms, including broader/narrower, and related.
bring together different names, as synonyms/alternative labels/nonpreferred terms/used for terms
may include named entities (proper nouns for people, organizations, or geographic places) alongside topical subjects
may have scope notes on select terms

How Taxonomies and Subject Heading Schemes Differ

With so many similarities, one might wonder if there are any differences between subject heading schemes and taxonomies.

Subject heading schemes and taxonomies have different histories and originally different formats. Subject heading schemes were designed for the print format and have been adapted to digital environments, whereas information “taxonomies” as we know them have existed only after the emergence of digital navigation and search systems.

Structural Differences with Subdivisions

The name “subject headings” refers to the traditional browsable display of headings in an index, and under headings may appear sub-headings or subdivisions to further refine multiple references/citations/linked results. This structure is the main difference between subject heading schemes and taxonomies. The heading-subheading/subdivision structure is characteristic of back-of-the book indexes and indexes to articles when such indexes previously appeared in print, although it is still used online.

A subject heading may be subdivided by the addition of different types of subdivisions: topical, geographical (such as a country name), chronological (such as century, decade or war time), and form (for the content type, such a Periodicals). Some topical subdivisions are rather generic and can be applied to many headings, such as “Management,” “Research,” or “Law and legislation,” but most are specific to only a limited number of headings. For example, the subdivision “Lighting” is to be used under headings for structures, rooms, vehicles, installations, etc. See the full list of Library of Congress subdivisions.

The way that subdivisions refine a heading can be compared to the function of facets in a faceted taxonomy, which was noted by someone in the audience of my conference session. (See also the post “Faceted Classificationand Faceted Taxonomies.”) Subdivisions and facets are both aspects of something. That does not mean, however, that a faceted taxonomy and a subject heading scheme are the same.

The structure of a faceted taxonomy has facets at the top-level, and the facets are relevant to a specific set of content, so they are aspects of the content, rather than aspects of a heading term.
There can be hierarchies of terms within a facet of a faceted taxonomy, but subdivisions do not have internal hierarchy. Instead, subdivisions may subdivide each other, but this is more like a prescribed navigation path, and they must follow a standard sequence. For example:
English literature—20th century—History and criticism

Application Differences of Subdivisions vs. Attributes

Another facet-like implementation of taxonomies is to have attributes to refine the search results of a specific term within a hierarchical taxonomy. Attributes are common in e-commerce taxonomies, which involve a hierarchical taxonomy for product categories and attributes for product features. Attributes are more like subdivisions, in the way that they refine topics from the hierarchical taxonomy, but they are applied (tagged) differently than subdivisions.

The combination of a subject heading and a subdivision is done at the time of indexing an article or cataloging a book, and there are rules about which combinations are permitted. The combinations are indexed as if they were a single compound concept. Catalogers are required to use established heading-subdivision combinations and cannot just make up their own. Any string of multiple subdivisions must be applied in a prescribed order, such as geographic-topical-chronological-form for Library of Congress Subject Headings that are topics authorized for geographic subdivision.

Unlike the practice of cataloging or indexing with subject headings and subdivision taxonomy terms and attributes for refinement are:

assigned more independently of each other, although the type of taxonomy term may restrict which attributes are available
have a greater number of attribute types available and tag a piece of content with values from most or all of the attribute types
may even have more than one attribute value of the same type may be applied (such as an item having two colors)
have no ranked order to apply attributes or to search on them

Convergence of Subject Headings Schemes and Taxonomies

While subject heading schemes and taxonomies have traditionally had different styles, they have become more similar in more recent decades.

Many subject heading schemes and taxonomies have both adopted thesaurus features. Originally, the Library of Congress Subject Headings had only See (Use) and See also relationships (like in an index), but in 1987 it adopted thesaurus relationships of broader term/narrower term, and related term in place of See also. Meanwhile the differences between taxonomies and thesauri have also been blurred, as taxonomies may have related-term relationships, and thesauri may have an over-arching hierarchical structure. The leading reason taxonomies and thesauri are difficult to distinguish, in my opinion, is because the same software tools are used to develop and manage both, and the software makes no distinction between “taxonomy” and “thesaurus.”

Another way in which subject headings have become more like taxonomies is that subject headings may be used without subdivisions. This is increasingly common as subject headings get reused in search and retrieval systems which do not support the complexity of subdivisions. For example, newer online publishers of medical information have adopted Medical Subject Headings without their subdivisions, which are still used by the National Library of Medicine. Additionally, auto-tagging is not easily done with multiple levels indexing. Without subdivisions, subject heading schemes are essentially the same as taxonomies, as long as they have a hierarchical structure.

Conclusions

Taxonomies have similarities and differences to both classification systems and to subject heading schemes. In fact, I would say that the modern information taxonomies have inherited features of both. Taxonomies are not always well defined, but they are flexible and adaptable to business needs.

Controlled vocabularies have existed for a long time, but their applications are becoming more varied. This has led to differences and also convergences of their features. Nevertheless, certain controlled vocabularies are more common in certain implementations. Subject heading schemes remain common in libraries, whereas taxonomies are more common in business and commercial implementations.

Friday, December 30, 2022

Taxonomy Definition

I usually explain that a taxonomy is a structured kind of controlled vocabulary, which is list of terms (or concepts) usually used to tag content to aid in its retrieval. The structure can be hierarchical, faceted, or a combination. Other people have defined taxonomies for a general audience in more simplistic ways as a kind of hierarchical classification system. So, while a taxonomy has two main features (naming and structure), my preferred definition has focused on the controlled vocabulary and naming aspect, whereas other definitions focus on the hierarchical classification aspect of taxonomies. However, a taxonomy and a classification system are not necessarily the same. While it is understandable that a definition is simplified for a general audience, it should not be simplified to the extent of being misleading.

I have blogged previously on the differences between taxonomies and classification systems, so I won’t repeat all the differences again. The main point is that a classification system is generic and rigid and is intended to be used widely, such as the Dewey Decimal Classification for libraries, whereas a taxonomy tends to be customized for a particular use case and context and is flexible and undergoes changes.

Meanwhile, there are also a few well-known classification systems that are called “taxonomies,” such as the Linnaean taxonomy of organisms and Bloom’s taxonomy of educational objectives. These seem quite different from the information-retrieval type of taxonomy. The Linnaean hierarchical levels have names (Kingdom, Phylum, Class, etc.). The relationship of the hierarchical levels to each other are not all of the thesaurus standards: generic-specific, generic-instance, or whole-part. Rather, the Linnaean taxonomic relationship are generic-specific only, or more precisely that of member of class or subclass. Bloom's taxonomy has a completely different hierarchical model that does not follow thesaurus standards at all.

How does a taxonomy of concepts for information retrieval relate to a scientific taxonomy? They are similar, and the differences are not so great that there should be considered different meanings of the word “taxonomy.” If we consider that taxonomies are systems to name and organize things hierarchically, then a taxonomy for information retrieval, comprised of terms for tagging and retrieving content (documents, images, etc.), can be considered a taxonomy of a controlled vocabulary, in contrast to taxonomies of things, such as organisms. This is a slightly different perspective than to consider a taxonomy as a kind of controlled vocabulary, as I previously had. The following diagram illustrates a possible way to consider how information-retrieval taxonomies related to classification systems and controlled vocabularies.

Diagram showing that information taxonomies are at the interssection of classification systems and controlled vocabularies

Several kinds of knowledge organization systems are defined by their published standards. For thesauri, there are ANSI/NISO Z39.19 and ISO 25964. For terminologies, there is ISO/TC 37/SC 3 and other related standards. For ontologies, there is OWL (Web Ontology Language) from the W3C. There is no standard, however, specifically for “taxonomies” or even for “classification systems,” which is a reason why these remain difficult to define. The designations “classification system,” “classification scheme,” and “taxonomy” have been used interchangeably.

Wikipedia provides the definition at the entry for Taxonomy: “A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types.” But then it goes on to say, “it may refer to a categorisation of things or concepts.” Thus, an information-retrieval taxonomy is a categorization of concepts (also called terms in a controlled vocabulary). It is not a classification system, since the goal is not to classify things, not even the things tagged with the taxonomy concepts, but rather to organize the set of concepts that have been identified as appropriate for tagging and retrieving a set of content.

Sunday, February 9, 2020

Classification Systems vs. Taxonomies

Is a taxonomy the same as a classification scheme or system? Or, to put it another way, is a classification system, such as the Dewey Decimal System, a kind of taxonomy? Both of these kinds of knowledge organization systems have the feature of arranging topical terms in a hierarchy of multiple levels, without having related-term relationships or necessarily synonyms/nonpreferred terms, which are features of thesauri. So, it appears as if the only difference is that classification systems have some kind of notation or alphanumeric code associated with each term, and taxonomies do not. The differences, however, are greater than that.

Classification systems

The codes/notations in classifications are not merely shortcut conveniences. They represent a way to divide up the area of knowledge into broad classes, sub-classes, sub-sub-classes, etc. The codes/notations are not an after-thought but are planned from the beginning of the design of a classification system.

The classification is comprehensive; everything in the subject domain is covered with a classification code + label. There is often not a lot of room for expansion, except for a few unused sub-unit codes in each area for new topics. The word classification means to put into a predefined class or grouping. The approach to classification is thinking “where does this go?” (Digital documents may go into more than one classification.)

Classification systems are not just used in libraries, but in corporate settings too, such as for research literature or detailed manufacturing product catalogs. The standard for defining knowledge organization systems for interoperability on the web, the Simple Knowledge Organization System (SKOS), developed by the World Wide Web Consortium (W3C), recognizes classifications systems, by having a designated element for “notation.”

Taxonomies

A taxonomy is a kind of knowledge organization system that has its terms hierarchically related to each other. The starting point in creating a taxonomy might be a few top terms or facets, but then the focus of taxonomy development is on the specific terms needed, rather than the division of a domain into classes and subclasses, etc. What this means is that the terms do not have to comprehensively cover the subject domain in an abstract manner. Rather the terms have to “cover” the topics appearing in the body of content to be tagged with the taxonomy.

The taxonomy is used for tagging or indexing, not for classification or cataloging. So, rather than thinking where (into what class) does this document go, the question is, what is/are the main topic(s) of this document. The topics might not fall into neat balanced hierarchies. For example, an intranet taxonomy might have a term for Temporary employees, because there are some human resources policies dealing with this topic specifically, but have no term for Full-time employees, since that is the default, and the term would not be useful (and likely inconsistently tagged).

Taxonomies vs. Classification Systems Comparison Table

Different mindsets

Lumpers and splitters are historically two opposing viewpoints in categorization and classification: whether you "lump" items into large categories, focusing on the similarities, or "split" items into more smaller categories, focusing on the differences. Of course, there is often a combination of both approaches, but it is my feeling that the design of modern taxonomies tends to involve more lumping, whereas the design of classification systems has involved more splitting.

One of the challenges of working with subject matter experts (SMEs) in building a taxonomy is that SMEs, as experts in their domain, may tend to think of how to classify their domain, and propose a taxonomy that resembles a classification system, even if it lacks the codes/notations. So, it’s very important to provide precise guidelines to SMEs contributing to a taxonomy, explaining that the terms are intended for tagging common topics that appear in the content and are for limiting/filtering search results, and that full classification is not necessary.

Students of library science may also tend to think of classification systems as serving for taxonomies. They learn about classification systems when they study cataloging, and subject cataloging is also about where the book or other library material belongs (often literally, on the shelf). So, even librarians need training on taxonomies and the taxonomy mindset if they want to become taxonomists. I will be giving a taxonomy workshop at the Computers in Libraries conference in March, so I will be sharing these ideas with those who attend.

Wednesday, June 22, 2016

Taxonomies vs. Thesauri: Practical Implementations

The differences between taxonomies and thesauri and when to implement which has been a subject of previous presentations of mine and a previous blog post, Taxonomies vs. Thesauri. Most recently, a presentation of a case study of controlled vocabularies at Cengage Learning, which I gave at the “Taxonomy Café” session at the SLA annual conference this month, the post-presentation roundtable discussions got me thinking more about the differences in practical implementations.

To summarize the differences, while both taxonomies and thesauri have hierarchical relationships among their terms, in a taxonomy all terms are connected into a few large hierarchies with a limited number of top terms so as to serve top-down navigation or drilling-down of topics. While faceted taxonomies function differently, each facet label can be seen as a top term. Associative relationships (related terms) are a standard feature of thesauri but not of taxonomies. Synonyms/nonpreferred terms/alternate labels are required for thesauri, but could be optional in small taxonomies. Taxonomies serve browsing and drilling down by end users who are exploring topics, whereas thesauri serve users who search for (look up) a specific concept and then may following “use” (preferred term), broader, narrower, or related term links to find the best term. A taxonomy works well for a controlled vocabulary that is limited in scope and easily categorized into hierarchies, whereas a thesaurus works better for content and a set of terms that is not easily categorizable and does not have a limited scope.

In practice, I have found that taxonomies are useful for classifying products and services (such as in ecommerce), general enterprise document management, implementations in content management systems which support taxonomies, and all faceted or filtering implementations (SharePoint search, Endeca, and other post-search filtering enterprise search software). Thesauri, on the other hand, are more suitable for indexing and retrieval research literature (articles, white papers, conference presentations and proceedings, patents, etc.), whether commercially published or not.

Taxonomies are easier to create and often easier to implement than thesauri. They generally do not have associative (related term) relationships. In absence of associative relationships between terms and with the emphasis on creating large top-term hierarchies, the thesaurus standard (ANSI/NISO Z39.19) rules for hierarchical relationships do not always have to be strictly followed. The inclusion of synonyms/nonpreferred terms also tends to be less thorough in taxonomies than in thesauri. Thesauri, on the other hand, require greater expertise in the field of information/knowledge organization, particularly to distinguish between hierarchical and associative relationships and to create the optimal number of those relationships and the optimal number of nonpreferred terms. Taxonomies, whether hierarchical or faceted, also tend to be easy to understand and use, accommodated by out-of-the-box content management software, and easier to maintain (and could be maintained by subject matter experts instead of taxonomists). Therefore, if a taxonomy, rather than a thesaurus, will suffice, then it makes more sense to create and maintain a taxonomy.

Thesauri, on the other hand, are more appropriate for the indexing repositories of content for research because they do not restrict the inclusion of terms to established hierarchies. Any terms that represent a minimal threshold of content can be added, even if at first glance they may seem out of scope. For example, a term “Hot drinks” would not likely fit into a taxonomy on health/medicine, but the term would be desired for articles on research correlating the drinking of very hot beverages to esophageal cancer. Thesauri allow for inclusion of terms that, in combination with other terms, can achieve a more nuanced meaning, which may be needed in the research and discovery of what is contained in a body of research literature.

Indeed, in practice, the majority of new controlled vocabularies that are being created are taxonomies, not thesauri, and in fact taxonomies are usually all that are needed. The new implementations tend to be of the kind that are suitable for taxonomies. New repositories of documents for research, on the other hand, while highly important to be indexed with thesauri, do not arise as frequently. More often, collections of documents for researching are already established and often already have thesauri. These thesauri do require the work of taxonomists to update and maintain them, though.