Tuesday, January 31, 2023

Taxonomies vs. Ontologies

The question often comes up: how are taxonomies and ontologies different? While there are some short simple answers (such as: taxonomies are hierarchies, and ontologies are semantic networks), it is understandable that the distinction is not that clear. There is considerable overlap. Ontologies may contain taxonomies, and taxonomies can be semantically enriched to become ontology-like. The same software tools, for example PoolParty, support the creation of both.

One of the trends in data/information/knowledge management in the convergence of systems, methods, and technologies, including the convergence of taxonomies and ontologies. It’s gotten to the point that some people will refer to taxonomies and ontologies almost interchangeably, as if they are essentially the same thing. They are not, although they are increasingly combined. It’s interesting that one of the most active discussion channels within the Taxonomy Talk community on Discord is on ontologies.

Uses

Although both taxonomies and ontologies are kinds of knowledge organization systems, which support access to information, their specific uses tend to differ. The primary use of information taxonomies is for consistent tagging and accurate and comprehensive retrieval of content items. These could be documents, components (sections) of documents, web or intranet pages, or digital assets (image, audio, video files, etc.). Ontologies, with their inclusion or linkages to instances/individuals, with their various attributes, are more focused on the specifics of data: data retrieval, data comparison, and data analysis. Taxonomies are primarily for what a content item is about (although content/document types may also be part of taxonomy), as in “get me all the information resources about…,” or “get me a list of products with…” and specifying set of features and price range as filters. Ontologies, on the other hand, can support more complex, multistep queries, such as “get me a list of products with…” a set of features and price range, whose vendors are located in Canada and have a minimum annual revenue of CAD $50 million.

In comparing retrieval of content and data, for example, taxonomies can retrieve a spreadsheet file, whereas ontologies can retrieve data from individual cells in the spreadsheet. Ontologies can traverse data in a database. While this could be a relational database, increasingly ontologies are used with graph databases, since ontologies are also structured as graphs.

Origins

Another major difference between taxonomies and ontologies is their origins. Information taxonomies (not biological taxonomies) originated in the discipline of library science. Specifically, I would say that taxonomies have evolved as a kind of flexible hybrid of classification systems and thesauri. Ontologies, on the other hand, (when not in philosophy) tend to be taught and researched as a part of computer science. Again, there has also been convergence of library science and computer science in the field of information science. Nevertheless, library/information science and computer/information science are different approaches.

Taxonomies have also become an area of interest in information architecture, user experience design, content management, and digital asset management. Taxonomies are also related to terminology management and information search and retrieval. Ontologies, on the other had, have become an area of interest in data science, data engineering, and graph data management. Ontologies also borrow concepts from set theory in mathematics and logic from philosophy.

Taxonomies and ontologies follow different standards, but the standards have also converged in a way. Taxonomies have no standard of their own but follow the thesaurus standards (ANSI/NISO Z.39.19 and ISO 25964) for recommended best practices. Ontologies are based on W3C standards of RDF, RDF-Schema, and the formal language of OWL (Web Ontology Language). The W3C then published a recommendation for taxonomies, thesauri, and other knowledge organization systems called SKOS (Simple Knowledge Organization System) in 2009, and since then it has become widely adopted. SKOS is based on RDF, as is the ontology standards RSF-S. As a result, SKOS and RDF-S statements or namespaes can be combined in the same knowledge organization system, and taxonomies and ontologies can thus be combined.

Features

Both taxonomies and ontologies aim to describe a knowledge domain with collections of entities structured into groups or types, with relationships between them. Ontologies go further in describing the relationships in more detail. Attributes are also more extensive in ontologies. Both support the options for notes or definitions.

Concepts or Entities

Taxonomies are comprised of concepts (sometimes called terms), which are things. Concepts can be generic or specific and may even include named entities (unique proper nouns). Taxonomies do not differentiate between generic concepts and named entities, which correspond to “individuals” in an ontology. Ontologies, on the other hand, distinguish between two types of entities: classes and individuals. Classes can be broad or specific, but, as the name implies, they are intended to contain something, either subclasses or individuals. By contrast, leaf nodes (the narrowest concepts in a hierarchy) in a taxonomy could actually be quite broad in meaning.

Individuals, as defined by an ontology, tend to be named entities (proper nouns), and they should be uniquely individual. This may not be obvious. A brand name product is a proper noun, but technically it is not an individual, because there are numerous specific instances of the product owned by different people. There may be some differences of opinion on how to define individuals.

Relationships

Taxonomies follow thesaurus standards for relationships. Thesaurus hierarchical relationships comprise three types: generic-specific or “is a” kind of relationship, generic-instance (where the instance is a named entity or proper noun), and whole-part. Ontologies have only generic-specific “is a” hierarchical relationships, which are between classes and subclasses. The relationship between an individual and a class is not considered hierarchical in an ontology but rather a relationships of class-member. Also, the whole-part relationship is not considered hierarchical in ontologies (but could be created as a semantic relationship).

While generic-instance is a permitted hierarchical relationship type In a taxonomy, named entity concepts (proper nouns) are not so often narrower to a corresponding generic concept, but rather tend to be grouped in their own separate concept scheme to serve as a separate search facet or filter.

A generic associative (“related”) relationship may exist in taxonomies, although it is more of a feature of thesauri. It is bidirectional and reciprocal, and it tends to be used between concepts within the same concept scheme, which often corresponds to a class in an ontology. Ontologies do not have a generic associative relationship. Instead, ontologies have semantic relations which are designated by the ontology creator, just as the classes are designated, and they are not used within classes but across a specified pair of classes. Suggestions of what might be of related interest to the end-user is not within the scope of an ontology’s purpose which is more structured and based on rules. Ontologies may have other bidirectional reciprocal relationships, such as “goes with,” “has sibling, “accompanies,” etc.

Equivalency and alternative labels

In a taxonomy, each concept has a single preferred label in each language for display and any number of alternative labels and hidden labels per language to help match on searching or tagging. In the traditional thesaurus model, “nonpreferred” terms redirect to “preferred” terms. The alternative labels are sufficiently equivalent in the context of the taxonomy and content to be used for a given concept, and thus might not be exact synonyms. Alternative labels include synonyms, near synonyms, and possibly even narrower terms not deemed needed as concepts with preferred labels.

In ontologies, the OWL element sameAs is intended for equivalency of individuals, and equivalentClass is for the equivalency of classes, and they mean exact equivalence. But there is no designation of one name being preferred and the other alternative. They all are preferred. The use of sameAs and equivalentClass are not intended for use within a single ontology, but rather across different ontologies. So, those OWL elements are similar to the SKOS exactMatch relationship, which is used across concept schemes or taxonomies. They do not support search within the same data set as alternative labels do.

Enforcement of rules

SKOS is a data model for taxonomies and thesauri, but it does not specify any rules for usage. Rather, the taxonomy creator should attempt to follow the guidelines, not exactly rules, in the thesaurus standards (ANSI/NISO Z39.19 and ISO 25964-1). The quality standards include disjoint labels (a label can be used only once for a concept, preferred or alternative, and for only one concept), single relationships (a pair concepts my have hierarchical or associative relationships between them, but not both), and no hierarchical cycles. The standard for ontologies, on the other hand, OWL, has many rules built into it. This makes OWL ontologies more powerful by supporting inferencing and reasoning.

Conclusions

Taxonomies and ontologies share some features, but each has its own additional features. Thus, a combination of a SKOS taxonomy with an OWL ontology combines the features of both. Furthermore, the combination of a taxonomy with an ontology also enables a combination of uses, namely the search and retrieval for both content and data together. Rather than a convergence of taxonomies and ontologies, they are carefully and deliberately combined to maximize their benefits.

 

 

9 comments:

  1. I enjoyed your article and may I suggest if you could please elucidate with examples under each explanation. Nonetheless, a good learning!

    ReplyDelete
    Replies
    1. Since this is just a blog post, I chose not to go into such details, which are more common for a white paper or journal article. But perhaps I will expand it for such publication in the future, and I will consider your suggestion.

      Delete
  2. In writing my field guide, I write about the relationships between networks and hierarchies and I've dealt with calling what in the future will be either a taxonomy or an ontology of system processes -- the patterns and processes that make up all systems. Now I have a reference and basis for reasoning and discussion when issues come up. Thanks!

    ReplyDelete
  3. That's good to hear.
    Well, I am generally focused on concepts not processes, although the name of a process can also be a concept.

    ReplyDelete
  4. I have been doing SEO for 12 years now and I admit that the concepts of taxonomies and ontologies (although many websites today display their content using taxonomies) are still new to me. I can't say if your article is missing any information, but it is certainly helping me to understand the basic principles.

    Am I wrong if I tell that entities can be associated with entities ?

    ReplyDelete
    Replies
    1. Benjamin, as I understand it, SEO deals with keywords (among other things), but keywords are not necessarily restricted to a controlled vocabulary list, as a taxonomy is.
      Entities can be associated with other entities. We tend not to call them entities in a taxonomy but rather "concepts", but ontologies have what are called entities, of which there are two types: classes and individuals/instances.

      Delete
    2. Thanks for your feedback, it's clearer now. Indeed in SEO, the keyword is still very present in the speeches but personally I tend to understand and use the concepts and entities much more.

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Thanks for sharing this guide, Heather. I found it extremely useful and has built my understanding of taxonomies and ontologies.

    ReplyDelete