Friday, December 30, 2022

Taxonomy Definition

I usually explain that a taxonomy is a structured kind of controlled vocabulary, which is list of terms (or concepts) usually used to tag content to aid in its retrieval. The structure can be hierarchical, faceted, or a combination. Other people have defined taxonomies for a general audience in more simplistic ways as a kind of hierarchical classification system. So, while a taxonomy has two main features (naming and structure), my preferred definition has focused on the controlled vocabulary and naming aspect, whereas other definitions focus on the hierarchical classification aspect of taxonomies. However, a taxonomy and a classification system are not necessarily the same. While it is understandable that a definition is simplified for a general audience, it should not be simplified to the extent of being misleading.

I have blogged previously on the differences between taxonomies and classification systems, so I won’t repeat all the differences again.  The main point is that a classification system is generic and rigid and is intended to be used widely, such as the Dewey Decimal Classification for libraries, whereas a taxonomy tends to be customized for a particular use case and context and is flexible and undergoes changes.

Meanwhile, there are also a few well-known classification systems that are called “taxonomies,” such as the Linnaean taxonomy of organisms and Bloom’s taxonomy of educational objectives.  These seem quite different from the information-retrieval type of taxonomy. The Linnaean hierarchical levels have names (Kingdom, Phylum, Class, etc.). The relationship of the hierarchical levels to each other are not all of the thesaurus standards: generic-specific, generic-instance, or whole-part. Rather, the Linnaean taxonomic relationship are generic-specific only, or more precisely that of member of class or subclass. Bloom's taxonomy has a completely different hierarchical model that does not follow thesaurus standards at all.

How does a taxonomy of concepts for information retrieval relate to a scientific taxonomy? They are similar, and the differences are not so great that there should be considered different meanings of the word “taxonomy.” If we consider that taxonomies are systems to name and organize things hierarchically, then a taxonomy for information retrieval, comprised of terms for tagging and retrieving content (documents, images, etc.), can be considered a taxonomy of a controlled vocabulary, in contrast to taxonomies of things, such as organisms. This is a slightly different perspective than to consider a taxonomy as a kind of controlled vocabulary, as I previously had. The following diagram illustrates a possible way to consider how information-retrieval taxonomies related to classification systems and controlled vocabularies.

Diagram showing that information taxonomies are at the interssection of classification systems and controlled vocabularies

Several kinds of knowledge organization systems are defined by their published standards. For thesauri, there are ANSI/NISO Z39.19 and ISO 25964. For terminologies, there is ISO/TC 37/SC 3 and other related standards. For ontologies, there is OWL (Web Ontology Language) from the W3C. There is no standard, however, specifically for “taxonomies” or even for “classification systems,” which is a reason why these remain difficult to define. The designations “classification system,” “classification scheme,” and “taxonomy” have been used interchangeably.

Wikipedia provides the definition at the entry for Taxonomy: “A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types.” But then it goes on to say, “it may refer to a categorisation of things or concepts.” Thus, an information-retrieval taxonomy is a categorization of concepts (also called terms in a controlled vocabulary). It is not a classification system, since the goal is not to classify things, not even the things tagged with the taxonomy concepts, but rather to organize the set of concepts that have been identified as appropriate for tagging and retrieving a set of content.


6 comments:

  1. This last sentence intrigues me. Knowledge organization need not be for retrieval; sometimes taxonomies are created to facilitate communication across an organization. It is possible, even likely, that this agreed-upon set of terminology will at some point be reflected in the naming of databases, schemas, tables and columns. At that point, yes, it’s for retrieval. However, capturing the “language” of an organization into agreed-upon terms, definitions and relations has its own value.
    I would like to know of others who are in this fascinating application of taxonomies.

    ReplyDelete
    Replies
    1. Correct, knowledge organization need not be for information retrieval. That's why I qualified the "information-retrieval taxonomy" as a kind of taxonomy. This is by far the most common kind of taxonomy, though.

      Delete
  2. These are tricky concepts to untangle, thank you for your efforts.
    This may be off topic, but you have me thinking anew how the goal of hierarchical classification schemes (assign an item only one classification code) can operate across a collection, but also operate within specific taxonomy entries. For example, one goal of information retrieval for your invoices might be to ensure you can answer questions like this: "did business in Maryland in December rebound from the November numbers?" The hierarchical structure of dates (years contain months contain days) keeps the December invoices separate from the November invoices and also ensures counts by month will sum to the count by year. Of course my controlled vocabulary might distinguish between the "order date”, “shipping date” and "delivery date" - and I better make sure I know my audience. [One can also see this nesting of a classification system into a taxonomy with locations (country contains states/regions contains postal codes) where it might be the "billing address" or the "delivery address”.] Perhaps date and location are special cases of universally applicable hierarchical classification schemes, as opposed to custom schemes for specific domains.
    I can’t help but think, however, that it's almost as if in the digital world you have an army of librarians who can reorganize the stacks instantly - and with each iteration each item is cataloged in a way that it has one place it should be.

    ReplyDelete
  3. Physical items, such as books, can be cataloged in only one place, and categories in a classification scheme occur in only one location.
    By contrast, concepts in a digital taxonomy can occur in more than one hierarchy, and content can be tagged with multiple concepts.

    ReplyDelete
  4. Thanks for posting. It really inspired me to write down a few things that I had been thinking about. It's taken me a while to put my finger on exactly what it was that set me spinning.

    Here's my take: I just don't like the word "taxonomy" (other than as the name for the practice of arrangement). A "taxonomy" -- as a thing -- is just really hard to pin down. For some reason, Wayne Booth's memorable expression "simply the flinging of Greek-fed, polysyllabic bullshit" comes to mind. I can almost feel one of my grad school mentors standing over my shoulder, talking about Wittgenstein's language-games and Kuhn's guidance on the normalization of science (and practice). Ultimately, I'm just not sure what the word actually does for us. I don't think Cutter or Melvil Dewey talked about taxonomies. Nor did Julius Kaiser or S.R. Ranganathan. The word has just kind of appeared as a synonym for the cumbersome "knowledge organization system," throwing the unlikely bedfellows of gazetteers, subject headings, and semantic networks into the same semantic sleeping bag...

    [I go on at length on the topic at: https://kwocurriculum.blogspot.com/2023/01/deflating-taxonomies-do-we-really-need.html]

    ReplyDelete
    Replies
    1. True there are problems with word "taxonomy," lacking an agreed upon definition and used inconsistently. But we are stuck with it. There are many people with "taxonomist" or "taxonomy" in their job title making a living from this work, and providing useful services, so we need to respect that.

      Delete