Showing posts with label Ontologies. Show all posts
Showing posts with label Ontologies. Show all posts

Monday, June 1, 2026

Is a Taxonomy an Ontology?

At last month’s Knowledge Graph Conference, in addition to knowledge graphs and graph databases, there is a growing interest in ontologies, but the role of taxonomies does not seem so well understood. For example, in one presentation I attended, it was said "you get synonyms/alternative labels into a knowledge graph via ontologies," rather than mentioning taxonomies. More than one person asked me: isn’t a taxonomy a kind of ontology?  

The fact that, technically, SKOS (the data model for interoperability used for taxonomies) has been designed as upper ontology, can lead to the conclusion that all taxonomies modeled on SKOS are then domain ontologies, as they are instances of the SKOS upper ontology. However, that is a more theoretical way, than a practical way, to look at taxonomies.

When I write or speak about taxonomies, I aim to be practical. While theoretically a taxonomy is a kind of ontology, in practice it is not, and maintaining a distinction helps clarify how each a taxonomy and an ontology can improve on each when they are combined.

If you are an ontologist and see everything through the lens of ontologies, then you probably consider that a taxonomy is a simple type of ontology that merely does not utilize all the features of a full ontology. If an ontology is simply defined as a knowledge model that has classes (things), relationships between the things, and attributes as properties of the things, then, yes, a taxonomy is a kind of ontology. It has concepts, hierarchical relationships, and often other attributes for concepts, that typically merely definitions, scope notes, or other notes.

The problem of calling any taxonomy an ontology is that the benefits of semantically enriching a taxonomy with an added ontology or extending an ontology with a taxonomy might not be well understood. We add an ontology to a taxonomy in order to provide customized semantic relationships and attributes of all kinds. Additionally, basing the added ontology on OWL (Web Ontology Language) enables capabilities of inferencing and reasoning.

Furthermore, saying that a taxonomy is an ontology could lead to less than sufficient attention to the taxonomy features that ontologies alone lack. These features include alternative labels and hidden labels that match variants in both tagging and user searching, equivalent foreign language labels for concepts, concept schemes that can be implemented as search facets, and distinct fields for definitions and different kinds of notes that are standardized for interoperability.

If following the Semantic Web’s stack of data model recommendations, then a taxonomy can be defined as what is built on SKOS (Simple KnowledgeOrganization System), and an ontology is defined as what is built on RDFS(RDF-Schema) and OWL (Web Ontology Language). I find that a very clear explanation of the difference between taxonomies and ontologies to those who are familiar with ontologies. These different data models may be integrated within the same knowledge model, and that’s how we get taxonomies extended with ontologies or ontologies extended with taxonomies.

We might call taxonomy-ontology combinations “knowledge models” or “semantic models.” If the model has mostly taxonomy (SKOS-based) data, such as a large taxonomy with a little ontology added, it is best called a taxonomy, and if the model has mostly ontology (RDFS and OWL-based) data, such as a large ontology with some taxonomy  data, it is best called an ontology.

The organizers of the Knowledge Graph Conference understood the distinct role of taxonomies in knowledge graphs and thus welcomed me again to present a tutorial specifically on taxonomies.

Wednesday, May 20, 2026

Hierarchies and Attributes in Taxonomies

One of the challenges in creating hierarchical taxonomies is that there can be multiple ways to categorize concepts and thus design hierarchies. There are multiple methods to deal with this, including polyhierarchy and facets. Now that taxonomies are more often extended with ontologies, attributes can also be used for additional “classifications” of things.

Dealing with multiple hierarchies


The traditional method of dealing with multiple methods of categorizing concepts has been to put the concepts into a “polyhierarchy,” which means the concept has more than one broader concept, and thus belongs to more than one hierarchy.  The occasional polyhierarchy is acceptable, but if a polyhierarchy becomes extensive (numerous concepts belong to the same two hierarchies) due to different methods of classification, this does not serve the purpose of helping users find the concepts and tagged content desired. When everything is in a polyhierarchy, the guiding purpose of a hierarchy gets lost.

When the issue is multiple classifications for things, then what is known “faceted classification” is often the answer. A faceted taxonomy design involves designating a facet for each method of classifying things by. For example, products may have facets for brand name, product type, functional use/application, industry market, user type, etc. Each of these could be a facet for products.

Sometimes, however, there may seem to be more possible ways of organizing or classifying something than are practical for facets. It could be within a facet. For example, if you have a facet for product type, you could further classify the product types by product family, by  generic product type (narrower “is a” sub-type of the broader), by broader system of which they are a component (narrower is a part of the broader), by size, or by a certain key feature or characteristic.

Recently on a project, a client suggested an added level of hierarchy within the facet for named product models for a classifying feature that impacted the product size. The problem was that this would combine named entities (proper nouns) of product models and generic types within the same facet. This combination should be avoided in facet design, because facets enable users to search and filter by different methods, such as either by name or by type, and there are scenarios when users would choose one over the other. Combining types and named entities in the same facet can cause confusion. This is where an ontology model may be the solution.

Ontologies for further classification

Ontologies enable customized relationships between classes (which tend to be the same type of high-level grouping as a facet) and customized attributes for members of classes. When we think of ontologies, we usually think of the custom relationships, but custom attributes can support what could be considered “types.” These “types” might have been extra hierarchies, and thus attributes provide a solution to the multiple classification problem. 

If multiple methods of hierarchical classification seem to be overlapping, you should consider making one or more attributes instead.  In my recent consulting case example, what the client originally proposed as top concepts for grouping product models (as a classifying feature impacting the product size), we decided would work better as an attribute of the product models. So, the facet would contain only named entity product models, and the hierarchy would be by model family only.

When an ontology is defined as a formal naming and definition of the types, properties and interrelationships of entities in a particular domain, we might think we have to define everything in the domain, and thus creating an ontology is a large, complex project. Often, what we need is only “some” ontology. While using the features, rules, and data model of an ontology, we need to define only the types, properties, and interrelationships that need to be defined for a business purpose.  This could be defining just a few custom attributes (properties) without even adding any custom relationships.  

More information about attributes in is my prior blog post. "Taxonomies and Attribute Data." 

Examples

In the prior example, the product model feature had originally been proposed for the hierarchy for the purpose of “grouping,” because users might want to look up the product models by that feature. If implemented in a knowledge graph, the attributes, managed in an ontology, will also support users looking up entities by their attributes.  So, the hierarchical design is not necessary.

Any “groupings” of named entities (by region, size, role, etc.), should be reconsidered as attributes of the named entities. Other examples are groupings of vehicles by engine type, which could have engine type as an attribute instead, or groupings of appliances by energy type, which could have the fuel type as an attribute instead. So, instead of Electric cars narrower to both Cars and Electric vehicles, Electric, Internal combustion, and Hybrid would be attributes for Cars

Conclusions

Shared data model standards based on RDF (Resource Description Framework) and the use of dedicated taxonomy/ontology management software that combines taxonomies with ontologies make this solution of using ontology features to resolve multiple hierarchies easy to attain. Instead of thinking that we could extend a taxonomy into an ontology in the future, we should be thinking of how to design a knowledge model now that best serves the body of knowledge and the users.


Tuesday, December 30, 2025

Taxonomy Benefits Over an Ontology

In a recent conversation based on a LinkedIn post, someone asked “Why choose a taxonomy over an ontology?” This is a good question, since there has been a growing understanding that ontologies build upon taxonomies by adding more semantics, which enable additional benefits. I have presented at conferences on the topic of extending a taxonomy with an ontology. Taxonomies, however, have benefits that ontologies alone cannot provide.

I have compared taxonomies and ontologies in a past blog post (Taxonomies vs. Ontologies). Comparing their uses to taxonomies, ontologies support more complex multi-part searches, enable searching on data and not just content or full documents, and can connect across data in different repositories and sources, which leads to creating knowledge graphs or a semantic layer. Additionally, ontologies support modeling and exploration of complex relationships, graph visualizations, and support for reasoning and inferencing based on logic. Meanwhile, ontologies also include the basic feature of taxonomies of unlimited hierarchies of classes and subclasses. Thus, it may seem as if ontologies are superior to taxonomies and provide greater benefits than taxonomies.

Taxonomies, however, especially those based on the SKOS (Simple Knowledge Organization System) data model, have features and benefits not supported by ontologies alone which are based only on OWL and RDFS standards.  These taxonomy (or more broadly “controlled vocabulary”) features include the incorporation of synonyms to support searching and tagging, the support of multilingual concepts, the inclusion of definitions and notes in a standardized manner, the ability to map and link taxonomies together based on equivalent or related concepts, the alignment of the taxonomy with end-user applications including browsable hierarchies and facets for filtering, and finally the ease of implementation into various content systems.

Taxonomies are richer than ontologies in their linguistic aspects, including both synonyms and labels in other languages. Taxonomies are traditionally based on thesauri, which include the feature of having “equivalence” among multiple terms, whereby a preferred term may be “used for” other nonpreferred terms. The SKOS data model specifies a preferred label and any number of alternative labels and hidden labels for a concept. Furthermore, concepts may have labels in multiple languages, and this supports tagging content in different languages and retrieval by users of different languages.

In ontologies, there exists the OWL property of sameAs for equivalence of individuals and equivalentClass for equivalence of classes, but both tend to be used to declare equivalence across different datasets rather than for use within a single ontology, as there is no designation of preferred and alternative names. So, these OWL properties are more like mapping properties than support of synonyms within a controlled vocabulary. As such they do not support the basic purpose of alternative labels in a taxonomy, which is to enable matches to support searching on variant labels and tagging despite different words in texts for the same thing.

The SKOS data model for taxonomies defines properties for scope notes, editorial notes, history notes, examples, and definitions. These are standardized fields and thus the meanings of these notes fields are consistent across taxonomies, supporting interoperability and migration. In OWL ontologies there exists an annotation property, but its use broadly includes labels, definitions, synonyms, attribution, notes, or comments.  With such inconsistent use, annotations are not well supported in importing, exporting, or linking of ontologies.

SKOS also has a set of mapping relationships. While OWL supports equivalence with SameAs and equivalentClass, SKOS taxonomies have not only equivalence relationships, exactMatch, but also closeMatch, narrowMatch, broadMatch, and relatedMatch, and thus all concepts in two separate taxonomies can be mapped to each other, unlike two ontologies which may share only a few matches. The full mapping of one taxonomy for another supports various uses, including using one taxonomy in the front end and the other in the back end, tagged to content.

Finally, taxonomies are better suited for various content-based implementation and applications, especially with out-of-the-box systems, such web content management systems, digital asset management systems, SharePoint, etc. A taxonomy modeled is several SKOS concept schemes can designate each concept scheme as a facet in faceted search/browse system, in which a facet serves as a filter. A taxonomy built as a hierarchy tree can be implemented so that users can expand the tree to browse to narrower concepts and then they can retrieve content tagged with the most specific concept desired. Ontologies, even if they contain hierarchies of classes and subclasses, are typically visualized as graphs, and any hierarchies are not displayed in a front-end application. Furthermore, ontology visualizations are usually not linked to actual content or data as they serve just for visualizing.

In sum, while ontologies add richer semantics/ meaning to relationships and attributes, taxonomies have richer semantics/meaning for concepts. Combining a taxonomy and ontology can bring the best of both worlds, and semantic web standards of SKOS, OWL, and RDF-S are all compatible for combining within a single project, since they are all based on the RDF (Resource Description Framework) data model. However, in many cases, a taxonomy with rich meaning for concepts, support for synonyms in search and tagging, along interactive displays of hierarchies and/or facets, is all that is needed. You can always add an ontology later.

Monday, May 5, 2025

Taxonomies and Attribute Data

In the past (such as my 2021 blog post "Attributes in Taxonomies"), I have explained that “attributes” serve as filters to refine search results on content, results that have already been narrowed by a hierarchical taxonomy concept or category. As such, the attributes available for filtering can vary based on a taxonomy concept or category that had been selected. To the end user, high-level taxonomy facets and attributes both function similarly as filters, and the distinction between facets and attributes may not be apparent. If the distinction is not noticeable to end users, then then facets and attributes may be confused. It’s best to describe attributes for what they are, and not merely by what they can do. That’s that this blog post aims to do.

Attributes

Data is information in the form of specific values that are relevant to something such as an asset, object, product, person, event, or transaction. Since data is relevant to something else, we can refer to data as an “attribute “of something. When attributes are standardized and used in information/data management, then attributes are metadata. Metadata schema are structures to organize data.

Examples of attribute metadata are:

  • for people: birth date, gender, occupation, nationality, phone number
  • for products: brand, price, color, size, SKU number
  • for documents: title, author, publication date, language, word count, publication status, file type

Almost all metadata, both descriptive and administrative, are attributes of something. (Only structural metadata, that which is used to mark up text, would not be an attribute.)  Attributes, as metadata, can serve various purposes, including identification, comparison, sorting, filtering, and finding something based on its attributes.

Attribute values may be of different types: text, numbers, dates, or yes/no (also called “Boolean”). As text strings, attribute values may be uncontrolled free text or terms from a controlled list.

Taxonomies

Taxonomies are structures of concepts, which are used primarily for tagging and retrieval of content, although there are secondary uses. The concepts include subjects and named entities. In all cases, the concepts are of controlled vocabularies. The structures may be primarily hierarchical or primarily faceted, although a combination, such as limited hierarchies within a facet, is also possible. The structure of the taxonomy provides context for tagging and supports interaction by users.

When a taxonomy is structured into facets, typically each facet serves also as a metadata property.  A hierarchical topical taxonomy can also provide values for a metadata property. Taxonomies are structures to organize controlled vocabulary concepts.

Examples of taxonomy facets include:

  • Topics
  • Activities
  • Industries
  • Product/service types
  • Brand names
  • Companies
  • Organizations
  • Names of people
  • Types of people/Roles
  • Events/Occasions

Thus, the types of things that are facets are usually not the same types of things that are considered attributes. 

Metadata schema are structures to organize data, whereas taxonomies are structures to organize controlled vocabulary concepts that can populate metadata properties.

Where Attributes and Taxonomies Overlap

Considering again the examples of different types of attributes for different things, there are some attributes that could be managed in a “taxonomy” instead of merely as “attributes”:

  • For people:  Name
  • For products:  Product type/category
  • For documents:  Subject/topic

Technically, each of these characteristics is also an attribute, but it is usually more practical to manage them as taxonomies so that they can support the implemented benefits of a taxonomy, such as semantic tagging, searching (including type-ahead search suggest), and browsing.

Thus, when we talk about “attributes” in the context of taxonomies, we mean those characteristics of something that are better managed as attributes and not managed as taxonomies. The decision is one of knowledge modeling.

For example, to support the refinement of searches, a taxonomy of expert people for an organization may have the following taxonomy facets:

  •  Name
  •  Subject of expertise
  •  Organizational unit
  •  Location

Then in addition to the facets, the taxonomy may have the following attributes associated with each record of a person:

  • Job title
  • Academic degree
  • Email address
  • Phone number
  • URL of headshot image

This is selected data of interest, but not values that are used in initial search or browsing for finding and retrieving content. Attributes are metadata, and taxonomy facets are also metadata, but that does not mean that they are the same, because different metadata can have different functions or purposes.

Ontologies: Bridging Taxonomies and Attributes

When we enrich a taxonomy with features of an ontology, not only can we add semantic relationships, but we can also add attributes to taxonomy concepts. Usually, when taxonomists first learn about ontologies, they think primarily of the addition of customized relationships between concepts, and they might not be aware of the importance of the addition of attributes.

In ontologies, semantic relationships are formally called “object properties,” and attributes are called “datatype properties.” Both are equally important. Meanwhile, the feature of “classes” in an ontology typically corresponds to taxonomy concept schemes or facets.

To add attributes to a taxonomy, the best way to do it is through adding an ontology, which can be very simple and not even include semantic relationships. As the availability of different attributes may vary based on a hierarchy branch of concepts, this can be managed by creating classes, which are assigned to hierarchical branches, facets, or concept schemes. Then, attributes (datatype properties) are applied and used with concepts based on the class the concept belongs to. 

Conclusion

The following table summarized the differences between taxonomy facets and attributes.

Taxonomy Facets          Attributes

Basic structure of many taxonomies

Additional data added to taxonomies

Controlled vocabularies

Controlled or uncontrolled terms, text,
numbers, dates, Boolean options, etc.

Concepts as nouns or noun phrases

If text, any kind of text string

Top organizational level of a taxonomy

Values relevant to any taxonomy concept

Concept Schemes in SKOS, or
Classes in an OWL ontology

Metadata on a concept, or
datatype properties in an OWL ontology

Thursday, December 19, 2024

Ontologies vs. Knowledge Graphs

At the Connected Data London (CDL) conference I attended last week, ontologies were humorously referred to as the “O” word. The thought was that, until recently, experts preferred not to mention “ontology,” lest they alienate their audience, customers, or stakeholders. The word comes across as too technical. It is a term from philosophy, after all, and it does not help that it sounds very similar to “oncology” (as “taxonomy” has been confused with “taxidermy”). The term “knowledge graph” on the other hand, is more user friendly, and even if it is not perfectly understood, its general meaning can be guessed. Thus, people would refer to knowledge graphs regardless of whether they meant a knowledge graph or an ontology.

At the conference, however, it was discussed that there is a growing acceptance of the word “ontology,” not just among experts but also among varied stakeholders who need to implement them. This was noted by several conference speakers, especially in the wrap-up panel session for the Data Modeling track, which was titled “The ‘O’ Word: How Ontologies Drive Interoperable Data and Business Innovation.” The panel moderator Katariina Kari explained that this recent shift has happened because of LLMs, explaining: “We need a reliable natural language repository. LLMs works on a network of mimicking language, LLMs are primed for language.” So, now use of the word ontology can even help a startup get funding from venture capitalists, she observed.

However, there remains some confusion over what an ontology is. At one end there is the difference between ontologies and taxonomies, and at the other end the difference between ontologies and knowledge graphs. I clarified the distinction between taxonomies and ontologies in a prior blog post, “Taxonomies vs. Ontologies” (January 2023). While knowledge graphs are a relatively new concept, and ontologies have existed for much longer, it is the varied understanding of ontologies that has given rise to confusion.

An ontology is defined as a model of a domain of knowledge, which comprises classes (sets of things), attributes (types of characteristics of things) and relationships between classes. According to this definition, an ontology is a somewhat generic model of a domain, and it does not include all of the individual members or instances of each class (such as the names of individual companies in the class called Company) nor the specific attributes of each attribute type (such as the address of each specific company for the attribute type called Address).

However, the W3C recommendation for ontologies, OWL (Web Ontology Language) includes the designation “individuals,” and ontology software tools, such as Protégé, support the inclusion of individuals and their specific attributes. Thus, it is easy to think that an ontology, by definition, includes all specific individuals. But just because OWL covers the recommendation for how to include instances of a class, and software supports the inclusion of instances of classes does not necessarily mean that the instances or individuals are actually a component of an ontology. The ontology experts on this CDL conference panel confirmed that an ontology is the upper-level semantic model.

Then, what do we call an ontology plus all of the individual members (instances) of classes and their specific attributes? That is essentially what a knowledge graph is. This is especially true when individuals are specific to an organization or enterprise, such as names of individual customers, products, employees, etc., and we call that an “enterprise knowledge graph.”

The first applications of ontologies in information/data science were in biomedicine, in which individuals included such things as names organisms (including bacteria and viruses) and chemicals, etc. Thus, the notion of an individual in science is not quite the same as in business, which has also been a source of confusion over what an individual is and the inclusion of individuals in an ontology. In enterprise knowledge graphs, the instances can be very numerous and specific, including individual “events,” such as interactions or transactions.

In conclusion, an ontology is typically a defining feature and component of a knowledge graph, but it is not all of what goes into a knowledge graph. A knowledge graph also includes individuals, which may be named entity instances or they may be specific taxonomy concepts (abstract things that are not unique named entities, such as the concepts “Data ethics” or “Performance measurement”), and a knowledge graph also includes specific attributes of individuals. It may be said that a knowledge graph is the instantiation of an ontology, and an ontology is the knowledge model. Katariina further explained: “knowledge graphs that actually follow an ontology will have an LLM perform better than just a KG that is unharmonized, not yet adhering to a clear ontology.”

Thursday, October 31, 2024

The Semantic Data Conference

I was honored to be accepted to speak at the first “Semantic Data” conference in New York, a one-day event held on October 23, following the inaugural event held in London on June 27. Semantic Data, organized by Henry Stewart (HS) Events, is co-located with its better-known DAM (Digital Asset Management) conference, which has been running for over 20 years in New York, London, and Los Angeles.

The full name of the conference was “Semantic Data: Taxonomy, Ontology, and Knowledge Graphs,” so the conference was less focused on data then on what you can do with data and content when combined with the semantics of taxonomies and ontologies. There was no presentation dedicated to knowledge graphs this time, with only sessions in the single-day one-track event. Less of a focus on knowledge graphs was fine, since the Knowledge Graph Conference, held in New York in May covers that topic very thoroughly over multiple days. The emphasis on “semantics,” though, is welcome, since there is no conference dedicated to that subject in the United States. (There is the SEMANTiCS conference in Europe, but it is semi-academic.)

 

Presentations at Semantic Data, New York

The topics of the sessions for the “Semantic Data” included: securing taxonomy and ontology strategy buy-in, why and how to connect taxonomies and ontologies, use of MS Copilot in taxonomy development, a use case in leveraging an LLM-based for content integration and a consumer-based semantic layer, and how to apply semantic models (taxonomies and ontologies) that reduce biases, especially for machine learning models. The opening keynote by Lulit Tesfaye was on realizing the semantic layer keynote, and the closing keynote by Gary Carlison and Bramm Wessel of the lead sponsor, Factor, was on building an organization semantic mindset. Additional sponsored talks were on how ontologies accelerate innovation in the life sciences, as done by the sponsor SciBite, and how semantics enhances modern data platforms, such as the sponsor Datavid.

I presented “Taxonomies to Ontologies: How When and Why to Connect or Extend.” I summarized the benefits of taxonomies and ontologies, including what you could or could not do with each alone, but what you could do with both combined. The fact that both taxonomies and ontologies are now based on compatible Semantic Web standards, which are supported by many tools, makes it easy to combine or extend them. Whether you are “combining” a taxonomy with an ontology or “extending” a taxonomy into an ontology depends merely on your starting point and definition of ontology. Now that I am again vendor neutral, I included screenshots from four different commercial tools for combined taxonomy/ontology management.

About the Semantic Data Conference 2024

Semantic Data New York was similar to Semantic Data Europe (London) in its format and organization. Both provided a combination of session types: instructional talks, industry use cases, round table participant discussions, and thought leadership panels. Both events were chaired by Madi Weland Solomon and featured the same keynote presentation by Lulit Tesfaye on the subject of the semantic layer. The rest of the speakers were different at both events, and each event had different sponsors, based on geographic location. While there were only three sponsors of Semantic Data in New York and only two in London, they shared the same exhibit hall with the main DAM (digital asset management) and thus reached a wider audience.

Attendees of both the London and New York events had a similar number of registrants, about 50. Although the larger co-located DAM conference had separate registration, some registrants of the DAM conference were also seen in Semantic Data sessions. Registrants of Semantic Data represented diverse industries, including financial services, healthcare, software/technology, media, entertainment, publishing, travel and tourism, education, government, and consulting. Roles were also diverse, including company leadership, project and program managers, IT, and content/DAM/taxonomy/information architecture practitioner roles.

I find that the distinction between the roles and activities of taxonomists, ontologists, information architects, digital asset managers, etc. overlaps, so a conference dedicated to semantics brings them together for shared knowledge sharing. This way, their projects can also be broadened and shared within their organizations. I hope the Semantic Data conference can grow in the future to fill this need, and I look forward to next year.

Tuesday, January 31, 2023

Taxonomies vs. Ontologies

The question often comes up: how are taxonomies and ontologies different? While there are some short simple answers (such as: taxonomies are hierarchies, and ontologies are semantic networks), it is understandable that the distinction is not that clear. There is considerable overlap. Ontologies may contain taxonomies, and taxonomies can be semantically enriched to become ontology-like. The same software tools, for example PoolParty, support the creation of both.

One of the trends in data/information/knowledge management in the convergence of systems, methods, and technologies, including the convergence of taxonomies and ontologies. It’s gotten to the point that some people will refer to taxonomies and ontologies almost interchangeably, as if they are essentially the same thing. They are not, although they are increasingly combined. It’s interesting that one of the most active discussion channels within the Taxonomy Talk community on Discord is on ontologies.

Uses

Although both taxonomies and ontologies are kinds of knowledge organization systems, which support access to information, their specific uses tend to differ. The primary use of information taxonomies is for consistent tagging and accurate and comprehensive retrieval of content items. These could be documents, components (sections) of documents, web or intranet pages, or digital assets (image, audio, video files, etc.). Ontologies, with their inclusion or linkages to instances/individuals, with their various attributes, are more focused on the specifics of data: data retrieval, data comparison, and data analysis. Taxonomies are primarily for what a content item is about (although content/document types may also be part of taxonomy), as in “get me all the information resources about…,” or “get me a list of products with…” and specifying set of features and price range as filters. Ontologies, on the other hand, can support more complex, multistep queries, such as “get me a list of products with…” a set of features and price range, whose vendors are located in Canada and have a minimum annual revenue of CAD $50 million.

In comparing retrieval of content and data, for example, taxonomies can retrieve a spreadsheet file, whereas ontologies can retrieve data from individual cells in the spreadsheet. Ontologies can traverse data in a database. While this could be a relational database, increasingly ontologies are used with graph databases, since ontologies are also structured as graphs.

Origins

Another major difference between taxonomies and ontologies is their origins. Information taxonomies (not biological taxonomies) originated in the discipline of library science. Specifically, I would say that taxonomies have evolved as a kind of flexible hybrid of classification systems and thesauri. Ontologies, on the other hand, (when not in philosophy) tend to be taught and researched as a part of computer science. Again, there has also been convergence of library science and computer science in the field of information science. Nevertheless, library/information science and computer/information science are different approaches.

Taxonomies have also become an area of interest in information architecture, user experience design, content management, and digital asset management. Taxonomies are also related to terminology management and information search and retrieval. Ontologies, on the other had, have become an area of interest in data science, data engineering, and graph data management. Ontologies also borrow concepts from set theory in mathematics and logic from philosophy.

Taxonomies and ontologies follow different standards, but the standards have also converged in a way. Taxonomies have no standard of their own but follow the thesaurus standards (ANSI/NISO Z.39.19 and ISO 25964) for recommended best practices. Ontologies are based on W3C standards of RDF, RDF-Schema, and the formal language of OWL (Web Ontology Language). The W3C then published a recommendation for taxonomies, thesauri, and other knowledge organization systems called SKOS (Simple Knowledge Organization System) in 2009, and since then it has become widely adopted. SKOS is based on RDF, as is the ontology standards RSF-S. As a result, SKOS and RDF-S statements or namespaes can be combined in the same knowledge organization system, and taxonomies and ontologies can thus be combined.

Features

Both taxonomies and ontologies aim to describe a knowledge domain with collections of entities structured into groups or types, with relationships between them. Ontologies go further in describing the relationships in more detail. Attributes are also more extensive in ontologies. Both support the options for notes or definitions.

Concepts or Entities

Taxonomies are comprised of concepts (sometimes called terms), which are things. Concepts can be generic or specific and may even include named entities (unique proper nouns). Taxonomies do not differentiate between generic concepts and named entities, which correspond to “individuals” in an ontology. Ontologies, on the other hand, distinguish between two types of entities: classes and individuals. Classes can be broad or specific, but, as the name implies, they are intended to contain something, either subclasses or individuals. By contrast, leaf nodes (the narrowest concepts in a hierarchy) in a taxonomy could actually be quite broad in meaning.

Individuals, as defined by an ontology, tend to be named entities (proper nouns), and they should be uniquely individual. This may not be obvious. A brand name product is a proper noun, but technically it is not an individual, because there are numerous specific instances of the product owned by different people. There may be some differences of opinion on how to define individuals.

Relationships

Taxonomies follow thesaurus standards for relationships. Thesaurus hierarchical relationships comprise three types: generic-specific or “is a” kind of relationship, generic-instance (where the instance is a named entity or proper noun), and whole-part. Ontologies have only generic-specific “is a” hierarchical relationships, which are between classes and subclasses. The relationship between an individual and a class is not considered hierarchical in an ontology but rather a relationships of class-member. Also, the whole-part relationship is not considered hierarchical in ontologies (but could be created as a semantic relationship).

While generic-instance is a permitted hierarchical relationship type In a taxonomy, named entity concepts (proper nouns) are not so often narrower to a corresponding generic concept, but rather tend to be grouped in their own separate concept scheme to serve as a separate search facet or filter.

A generic associative (“related”) relationship may exist in taxonomies, although it is more of a feature of thesauri. It is bidirectional and reciprocal, and it tends to be used between concepts within the same concept scheme, which often corresponds to a class in an ontology. Ontologies do not have a generic associative relationship. Instead, ontologies have semantic relations which are designated by the ontology creator, just as the classes are designated, and they are not used within classes but across a specified pair of classes. Suggestions of what might be of related interest to the end-user is not within the scope of an ontology’s purpose which is more structured and based on rules. Ontologies may have other bidirectional reciprocal relationships, such as “goes with,” “has sibling, “accompanies,” etc.

Equivalency and alternative labels

In a taxonomy, each concept has a single preferred label in each language for display and any number of alternative labels and hidden labels per language to help match on searching or tagging. In the traditional thesaurus model, “nonpreferred” terms redirect to “preferred” terms. The alternative labels are sufficiently equivalent in the context of the taxonomy and content to be used for a given concept, and thus might not be exact synonyms. Alternative labels include synonyms, near synonyms, and possibly even narrower terms not deemed needed as concepts with preferred labels.

In ontologies, the OWL element sameAs is intended for equivalency of individuals, and equivalentClass is for the equivalency of classes, and they mean exact equivalence. But there is no designation of one name being preferred and the other alternative. They all are preferred. The use of sameAs and equivalentClass are not intended for use within a single ontology, but rather across different ontologies. So, those OWL elements are similar to the SKOS exactMatch relationship, which is used across concept schemes or taxonomies. They do not support search within the same data set as alternative labels do.

Enforcement of rules

SKOS is a data model for taxonomies and thesauri, but it does not specify any rules for usage. Rather, the taxonomy creator should attempt to follow the guidelines, not exactly rules, in the thesaurus standards (ANSI/NISO Z39.19 and ISO 25964-1). The quality standards include disjoint labels (a label can be used only once for a concept, preferred or alternative, and for only one concept), single relationships (a pair concepts my have hierarchical or associative relationships between them, but not both), and no hierarchical cycles. The standard for ontologies, on the other hand, OWL, has many rules built into it. This makes OWL ontologies more powerful by supporting inferencing and reasoning.

Conclusions

Taxonomies and ontologies share some features, but each has its own additional features. Thus, a combination of a SKOS taxonomy with an OWL ontology combines the features of both. Furthermore, the combination of a taxonomy with an ontology also enables a combination of uses, namely the search and retrieval for both content and data together. Rather than a convergence of taxonomies and ontologies, they are carefully and deliberately combined to maximize their benefits.

 

 

Friday, December 17, 2021

Named Entities in Taxonomies

I have long felt that there is some uncertainty as to where named entities (names of specific people, places, organizations, products, etc.) fit into taxonomies. Standards suggest one way, and practice tends to follow different way in dealing with these proper nouns. As taxonomy trends evolve so does the position on these named entities. The fact that taxonomies are not well-defined leaves it open to question as whether to taxonomies should have any named entities in them, or if taxonomies should comprise only topics."Hello my Name Is" badge

Historical trends

A historical perspective is needed. Modern, digital information retrieval taxonomies evolved out of thesauri. Thesauri, which originally came out in print format, first appeared in the 1960s and then were formalized by various standards published in the 1970s. The thesaurus standards state clearly that the relationships between a named instance and its type is one of the three kinds of hierarchical relationships permitted and supported in thesauri (the other two being generic-specific and whole-part). While taxonomies may omit the associative (related term) relationship of thesauri, they tend to follow the hierarchical standards of thesauri. Thus, named entities could be included in the taxonomy as the narrowest terms, narrower to a term for whatever “type” they are. But should it always be this way?

Then faceted taxonomies started being implemented in the early 2000s, first in ecommerce and then by the end of the decade in intranets, content management systems, digital asset management systems, and various content-rich websites. Once facets became adopted in information retrieval applications (aside from ecommerce), it became obvious from a user design perspective that named entities belonged in a different facet than the subjects. Facets are for refining a complex search query by different aspects. Sometimes these aspects follow the types of questions: What? Who? Where? When? “What” is usually for a subject,” but “who,” “where,” and “when” (for taxonomy terms naming events, not date ranges) refer to named entities. Sometimes people start a query about a subject, and sometimes  people start a query about a named entity, and facets allow people to start off searching any way they wish.

Then in 2009 the World Wide Web Consortium published the Simple Knowledge Organization System (SKOS) recommendation for taxonomies, thesauri, and other controlled vocabularies, which over the following decade became adopted as the standard model for building machine-readable taxonomies. One of the elements described in SKOS is that of the concept scheme, which is defined merely as “an aggregation of one or more SKOS concepts.” There is nothing comparable in the thesaurus standards. While a taxonomist may choose what to do with an “aggregation” of concepts, it has proven practical to separate out different kinds of named entities into concept schemes separate from concept schemes for topics. Thus, the widespread adoption of SKOS has contributed to the trend of separating different named entity sets, which had already started with faceted taxonomies.

My initial, and longest, experience in the domain of taxonomies and controlled vocabularies was as a controlled vocabulary editor at the library database vendor Gale. At Gale (and its predecessor company), named entity controlled vocabularies ("name authorities") have been separate from the subjects, but there were reasons for this. The named entities (named persons, companies, organizations and agencies, named works, products, laws, events, and fictional characters), each have had different sets of attributes and rules for maintenance.  Some even have different customized relationships with other controlled vocabularies. Interestingly, it was not always this way. Before I joined in the mid-1990s, some of these named entities (agencies, organizations, works, geographics, and events) were mixed in with the “descriptors” in a Subject MegaFile. But eventually specific attributes and relations, not to mention the growing number of terms and a new vocabulary management system, combined to make it more logical to split off each of the named entity vocabularies. The Events were the last to be split out of the Subjects.  So, it’s not because the controlled vocabularies were named entities per se, but rather their growing specialized maintenance needs due to an increase in specific attributes that led to managing them as separate controlled vocabularies. Attributes include, for example, birth date and place for a person, latitude and longitude for a location, and website URL and address for companies and organizations, among many more.

Taxonomies and ontologies

This feature of attributes brings us to the most recent trend in taxonomies, which is the occasional, but growing, convergence of taxonomies and ontologies. Ontologies divide up a knowledge domain into classes, and each class (like the Gale named-entity controlled vocabularies) has its own set of attributes and customized relationships with other classes. Ontologies, according to the Web Ontology Language (OWL) standard, however, have a different perspective on named entities. Ontologies are comprised of classes and subclasses, in hierarchies, which, in turn contain “instances” or “individuals,” which are unique named entities. The relationships between an instance and a class (or subclass) is not, however, considered hierarchical, but rather of a “member” type. Thus, while thesauri make no distinction for named entities, and taxonomies separate out name entities when it’s practical, ontologies make a strict distinction.

Furthermore, for ontologies, which originated in the domains of philosophy and computer science, a named entity as a proper noun is not what matters. Rather, it’s the fact that the instance is unique, and there is only one. This is true for people, companies/organizations, and places. It is not true for brand name products, though. A named product is a proper noun, such as MacBook Pro or Honda Accord, but it is not a unique instance, because there are millions of individual MacBook Pros and Honda Accords in existence. It’s a similar matter for named works, such as books, where one title has millions of copies. “Named entities” or “proper nouns” are grammatical or linguistic designations, which are OK for taxonomies and thesauri, but are not a feature of ontologies, with their philosophical origins.

Fortunately, you don’t have to worry about this philosophical problem if you choose to follow the approach of applying a high-level ontology model to an existing taxonomy or set of controlled vocabularies to extend the ontology with specific terms and named entities (or, from the other direction, to extend the taxonomy with semantic relations and attributes). The OWL-based ontology then may comprise only as many classes and subclasses needed to designate the usage of distinct custom relations and attributes.  With this approach, a different ontology class is mapped to each subset or hierarchy or SKOS concept scheme of a larger taxonomy. Each named entity type would typically correspond to a different ontology class, based on the named entity’s own attributes and relations. So, each named entity type would be in its own controlled vocabulary or SKOS concept scheme.

Just because OWL ontologies may include named instances as members of a subclass, does not mean you have to set up your knowledge model that way. This is similar to the idea of the thesaurus standard, which permits named entities to be narrower terms to generic subjects, but you don’t have to set it up that way. Omitting an option described in the thesaurus or ontology standards does not mean you are not in compliance with those standards.  

So, in conclusion, while some things about taxonomies have remained constant, other things, such as where to put named entities, have changed over time.