The Accidental Taxonomist

Monday, May 29, 2023

Taxonomies and ChatGPT

ChatGPT, generative AI, and large language models (LLMs) are hot topics of interest in fields of data, information, and knowledge management. LLMs dominated the keynote presentations at the networking conversations at Knowledge Graph Conference in New York and were also discussed in presentations and panels of this conference and Data Summit in Boston, both of which I attended this month. The technology is relevant to taxonomies as well.

ChatGPT is the user interface application on top of GPT (Generative Pre-Trained Transformer), a publicly available LLM developed by OpenAI, which is now in version 4. ChatGPT is thus a form of generative AI, in how it generates answers. There are many other LLMs (Neural network-based AI, trained with deep learning on very large volumes of text), including those which are proprietary, restricted, or for non-commercial research, but only some have generative AI user interfaces. Although we may think of generative AI for providing answers to questions, it can do a lot more, including tasks related to taxonomies.

Organizing terms into hierarchies

Building a taxonomy is a combination of top-down design (identifying the top concepts or facets) and bottom-up building (identifying specific concepts from content analysis). The top-level of a taxonomy is designed to serve user needs and thus should be based on stakeholder interviews, surveys, and brainstorming workshops, which is not something ChatGPT can do. The bottom-up building a taxonomy, based on terms extracted content or search log terms, may benefit from some AI involvement.

I have made a few test requests of ChatGPT for “Put the following list of terms into a hierarchical taxonomy…,” and the results are bulleted lists with indented narrower concepts. ChatGPT can also generate a taxonomy in a machine-readable SKOS in a requested RDF serialization format, as Bob DuCharme explained in his May 20 blog post “Getting ChatGPT to turn a flat vocabulary list into a hierarchical taxonomy.”

Like card sorting exercises, you can specify the top categories/concepts (like a “closed card sort”), or you can let ChatGPT create the top categories (like an “open card sort”). In any case, better results are with context, of course, so you should also tell ChatGPT what the subject domain or context is. Asking for a hierarchical taxonomy results in a third level of hierarchy sometimes, and not just a single level of grouping. Near duplicates usually appear next to each other in the list, and the taxonomist can then decide if and how to merge them into a single concept.

It is particularly for long lists of terms, where automated methods can save the taxonomist’s time. If a taxonomist comes up with terms based on manual content analysis, stakeholder interviews, or submitted lists from subject matter experts, the term lists tend not to be very long, and even the process of coming up with the terms tends to include some thoughts toward categorization at the same time. Longer term lists (such several hundred) are derived from automated term extraction (using text analytics technologies) across a corpus of dozens or hundreds of documents and from search log reports. ChatGPT is practical for putting these long lists of terms into draft hierarchies. There are inevitably some taxonomic errors in the results, which should be obvious to any taxonomist. For example, I have seen duplicated terms on different levels of the hierarchy.

In both lists of extracted terms and search log lists, terms occur that are not suitable as concepts for a taxonomy, such as verbs and adjectives or vague words. ChatGPT understands grammatical rules, so my prompt also says “Include in the taxonomy only nouns and noun phrases and omit the other terms.”

Generating alternative labels (“synonyms”) for concepts

Asking ChatGPT to “provide a list of synonyms for…” a given term can also be helpful for coming up with alternative labels for taxonomy concepts. Alternative labels should be customized for the context of the content and users, so alternative labels for a concept will vary from one taxonomy to another, and an external source, such as ChatGPT should not relied upon as the only source for alternative labels, but merely as a supplemental source of suggestions to be considered.

Again, context can help and should be provided. I asked “Provide a list of synonyms for “healthcare” and got 20 terms. But then when I asked “Provide a list of synonyms for health care, meaning the industry,” I received a slightly more focused list of 15 terms. Interestingly, the two-word variant “health care” was not on the list, so “synonyms” is understood by ChatGPT to mean different words with the same meaning and not orthographic variations. Nevertheless, even 15 terms are too many, and the taxonomist should select from the list of suggestions. It might be a good idea to then test search the suggested alternative labels in the content and system being used.

Although by strict definition a “synonym” is a single word with the same meaning as another word, ChatGPT provides acceptable synonyms for terms which are multi-word phrases, or synonymous multi-word phrases, such as “Chemical manufacturing and distribution” provided as a synonym for “chemical industry.”

Other taxonomy-related uses of ChatGPT

Getting help in designing an ontology (a more complex, yet high-level semantic model with defined classes of concepts, customized relationships, and attributes) is also possible with ChatGPT or other LLMs. Again, submitting the request multiple times with slight variations will yield multiple different responses for the ontologist to consider and select ideas from. Ontologies are not expressed in simple text, though, so the prompt request should specify it, such as RDF TTL. Dean Allemang, author of Semantic Web or the Working Ontologist, has written multiple articles (medium.com/@dallemang) recently on ChatGPT and ontologies/knowledge graphs.

ChatGPT can also be used for comparing lists of terms, data conversion, and basic coding, which may be useful for taxonomists who lack coding skills. It can convert taxonomy or ontology data from one data format to another (although taxonomy/ontology management software also imports/exports in multiple formats). Taxonomies and ontologies in their raw data format are most commonly expressed in the RDF (Resource Description Framework) data model which has various serialization format: RDF/XML, JSON, JSON- LD, .ttl (Turtle), etc., and ChatGPT can convert data from one to another. Data extraction can also be done with ChatGPT. For example, knowledge management professional Camille Mathieu recently shared in a LinkedIn post how she used ChatGPT to write a Python script to extract text & metadata from PDFs.

Perhaps what is most intriguing as a future implementation of taxonomies and ChatGPT is to go in the other direction and have knowledge organization systems, such as taxonomies, support the creation and use of queries (as called “prompts”) for generative AI, to obtain better results. This requires some back-end development, though, and is not merely a matter of putting a taxonomy into a prompt. Since a taxonomy is created for a specific subject domain, the questions need to be confined to the domain of the taxonomy. Semantic Web Company has developed a simple publicly accessible demo “PoolParty Meets Chat GPT,” whereby you can compare the results of questions you ask in the subject area of ESG (Environmental, Social, and Governance) that are submitted directly to ChatGPT and with those which are filtered through an ESG taxonomy and knowledge graph (managed in PoolParty software) so that the questions are enriched before being sent to ChatGPT. The semantically enriched questions generate answers that have more detail, better accuracy, and even web links to definitions and other articles.

Conclusions

While it’s arguable whether ChatGPT alone is a good way to obtain “facts,” there is no doubt that it is a good way to get suggestions and ideas. These suggestions can support the work of taxonomists and ontologists, and taxonomies and ontologies in turn can support the results of ChatGPT and other LLMs. Because there will be errors from ChatGPT, it should not be used to generate taxonomies by those who are not already knowledgeable with taxonomy requirements and best practices, nor should it be used as a substitute for the expertise of taxonomists.

I hope to experiment more with ChatGPT for taxonomies and share additional details in future blog posts.

Sunday, April 30, 2023

Taxonomies for Content Components

The primary purpose of taxonomies is to support consistent topical tagging (indexing) of content and full and accurate content retrieval based on the tagged taxonomy concepts that the end-user selects. The unit of content that is tagged makes a difference in the retrieval results and user experience. Users want to find specific content, such as a paragraph, a captioned image, a timestamp section within an audio or video file. This is not always possible. The traditional method of tagging is to tag the entire file, document, or web page, even if the specific topic with the desired information is only part of the larger file, such as a few sentences within a web page or document of multiple paragraphs. The user then spends time (or wastes time) trying to find the desired information in the larger file.

Content components

Fortunately, there are methods to tag and retrieve content at smaller units, such as a text section identified with a heading, within a longer document. These methods depend on having “structured” content, where sections are marked off using a markup language, most commonly Extensible Markup Language (XML). As XML is rather generic, there have emerged standards specifically for XML-based component-based content management, including DITA (Darwin Information Typing Architecture).

www.dita-ot.org

Structuring content was not originally developed for the purpose of detailed topical tagging/indexing and retrieval, though, but rather for the purpose of creating (authoring) and publishing content, especially to the web, more efficiently. Originally, the focus of structured content was on marking up the document style and supporting keyword tags for the entire document. The first content management systems (CMSs) were developed shortly after the web in the 1990s to facilitate the publishing of web pages, although later a distinction emerged be web content management systems and enterprise content management systems.

By the early 2000s, component content management systems (CCMSs) emerged, whereby content is managed in units (components) smaller and more specific than an entire document. CCMSs enable content publishing to be more modular and flexible, supporting content reuse, and making it easier to update content, by updating only the relevant components, instead of the entire document. CCMSs are especially used for creating technical documentation, but they are not limited to that use. Examples of CCMSs include Adobe FrameMaker, Documentum, Hereto, Kontent.ai, Quark, Paligo, Sanity, and Tridion Docs. While more precise tagging was not the original goal of CCMSs, it is a beneficial outcome.

Taxonomies and component content management

CCMSs, along with all CMSs, have come to support taxonomies and tagging better over the years. This includes both support for more taxonomy features, such as hierarchies and synonym (alternative labels), and support for importing and exporting taxonomies in standard interoperable formats. With respect to CCMSs, taxonomies can be built out to a greater level of detail, with concepts specific to the component topics of CCMS. However, whoever is creating the taxonomy should remember not to create concepts that are so specific that a concept is applicable to only a single component topic. A single taxonomy concept should retrieve multiple results.

CCMSs, along with all CMSs, can also connect to or integrate with taxonomies managed in dedicated taxonomy management systems, such as PoolParty. Since organizations tend to have multiple CMSs, each for different kinds of content and purposes, they are likely to end up creating multiple, separate (siloed) taxonomies with similar or overlapping concepts. Therefore, the best strategy for enterprise taxonomy management is to manage taxonomies centrally, either as a single master taxonomy or with multiple taxonomies linked together in dedicated taxonomy management software, which can connect to CMSs with APIs (application programming interfaces) to push the taxonomy out to the CMSs, including CCMSs. Additionally, prebuilt integrations of taxonomy management systems and CCMSs, such as PoolParty and Tridion Docs, are becoming more common.

There is also a growing interest in taxonomies at conferences dealing with component content management. Last October I attended the LavaCon conference for content strategy for the first time, where my pre-conference workshop on taxonomies was well attended. Two weeks ago, I participated in the ConVEx conference, where there is more focus on component content management than at LavaCon. (ConVEx was formerly the DITA North America conference.) In contrast to LavaCon’s two presentations on taxonomies, ConVEx had a track with the “taxonomy” theme and five presentations focused on taxonomies and another three presentations with topics related to taxonomies.

Component content management enables more targeted topic tagging and opens up more possibilities for rich taxonomies. Thus, as a taxonomist, I look forward to learning more about CCMSs and how they taxonomies can best be applied in these systems.

Friday, March 31, 2023

Taxonomy and Information Architecture Compared

There is considerable overlap between the fields of information taxonomies and information architecture. Both involve information organization, labeling, search, and findability. In some organizations the job roles and titles are combined. I previously blogged on “Information Architecture and Taxonomies,” observing that “information architecture” in name seemed to be declining while aspects of its practice continued to be strong, since it was an underlying theme in several of the talks at major taxonomy conference, Taxonomy Boot Camp in 2013.

Photo of Information Architecture Conference opening: welcome on the screen and a jazz band playing

Information Architecture Conference opening. Photo Marisela Meskus

This week, for the first time, I am attending in person the Information Architecture Conference, being held in New Orleans March 28 - April 1, so it’s been interesting to hear how information architects consider taxonomies.

How Information Architecture and Taxonomy Overlap

The fields of information architecture and taxonomy are related beyond the stated shared practices of information organization, labeling, search, and findability.

When I give an introduction to taxonomies, I explain that a taxonomy is an intermediary between users and content to connect users to content by means of terms that the users understand and by the display of the terms in hierarchies, facet-filters, or type-ahead suggestions, which enable users to explore and interact with the taxonomy. This is clearly an aspect of information architecture.

In my own career path, I discovered taxonomy and information architecture at the same time. I had been working as a “controlled vocabulary editor” and had the opportunity to work on an interdisciplinary team for a newly design information product. A user interface for school library research database included both a hierarchical taxonomy that was designed to fit with a particular user interface.

At the Information Architecture Conference, I asked for a raise of hands of my session audience of how many had worked with taxonomies, and it seemed to be over 80%. At the conference, I met information architects who specialized in taxonomies, and taxonomists who had an interest and done some work in information architecture. Even though I identify as a taxonomist, I already knew a number of speakers at the Information Architecture conference due to the overlapping communities.

How Information Architecture and Taxonomy Differ

Information architecture is a discipline and a profession that is larger and more established than that of taxonomies. Although taxonomy work is growing, there are still more college courses on information architecture than on taxonomies, more books on information architecture than on taxonomies, and more people with “information architect” than “taxonomist” as a job title (based on LinkedIn searches).

Listening to sessions at the Information Architecture Conference and having discussions with participants, I began to see a clearer picture on how the fields of information architecture and taxonomies differ.

The Information Architecture Conference brings together a community of professionals who share ideas and experiences. There is no comparable taxonomist community as taxonomy work, compared to information architecture work, tends to be done by those with different professional backgrounds: information architects, librarians, content managers, metadata architects, indexers, ontologists, etc. It’s telling that there is not just one conference at which I present about taxonomies but multiple. (Knowledge management, content strategy, knowledge graphs, and data science are the fields of conferences at which I have spoken about taxonomies in the past year.) The only conference about taxonomies, Taxonomy Boot Camp, is more of specialized track within the KM World conference, and aims to provide taxonomy best practices and case studies to managers and directors of content, product, or knowledge management. It is not really a forum for taxonomists to discuss topics of their profession, as the Information Architecture Conference is.

It seems that information architecture is more of a discipline and a field, whereas taxonomy is more of tool or system (although a very important one). In addition to information architects in organizations in various industries and consultants, the Information Architecture Conference includes professors and students in the field. By contrast taxonomy is not a field of study, research, or focus in academia. It is a focus area only in industry and consulting. Information architecture seems to allow more room for theory than does the taxonomy field.

How Information Architecture and Taxonomy Are Related

From a "taxonomic" perspective, which is broader? For information architects, taxonomy is narrower than information architecture. There is no doubt that information architecture is broader in various ways, including content/information organization, design, user experience, and even organization of non-digital information spaces. For example, information architects are concerned not only with taxonomies to support searching and browsing for information, but also with content organization and navigation menu structuring in websites and in software user interfaces.

Taxonomists, on the other hand, do not consider taxonomies as a sub-field of information architecture, but rather consider the two fields as adjacent and closely related. This is because the taxonomies that information architects create tend to be small, such as term lists for metadata properties or facets or as hierarchies to model menu navigation or site maps. Professional taxonomists tend to work on large dynamic taxonomies or thesauri that are used to tag/index and retrieve content or data in one or more systems, often where the user interface is already prescribed.

The related fields or disciplines are also different. Information architecture has a closer relationship with fields of design, user experience, sociology, and psychology. Taxonomy has a closer relationship with indexing/tagging, natural language processing, ontologies, Semantic Web technologies, and knowledge management. One related field shared by both information architecture and taxonomy is structured content, which was also a subject of presentations at this year's Information Architecture conference and the field of my next conference.

Saturday, February 25, 2023

Related Concepts in Taxonomies


A and B are related; C and D are related.

Taxonomies and thesauri are characterized by having hierarchical relationships linking their terms. The associative relationship (or related concept, Related Term, or RT), on the other hand, is a fundamental feature of thesauri, but it is merely an optional feature of taxonomies.

An over-simplistic distinction between taxonomies and thesauri is the presence of associative relationships, although I would disagree, because taxonomies can have associative relationships, and there are other structural design differences between taxonomies and thesauri. (See my past blog posts Taxonomies vs. Thesauri and Taxonomies vs. Thesauri: Practical Implementations)

The associative (related) relationship is a generic, nonhierarchical, symmetrical (same in both directions), reciprocal relationship between pairs of terms/concepts in a thesaurus or taxonomy. "Related concept" actually refers to a kind of relationship, not a kind of concept. The following figure illustrates that Data protection and Privacy are related.

It is true that many taxonomies do not have associative relationships. This is for various reasons. The function of the taxonomy in the user interface may not require the support of related concepts, such as when the taxonomy is displayed only as facets for refining results or only as type-ahead taxonomy term suggestions when a user enters a search string into a search box. The taxonomy may be implemented in a system (such as a commercial off-the-shelf content management system or SharePoint) that does not support the links/navigating to related concepts in the user interface. A taxonomy may be too small to make beneficial use of associative relationships if most of the taxonomy can quickly be browsed and seen. Finally, and perhaps of the greatest potential significance, is that relationships across different types of concepts can instead be better supported with customized semantic relationships based on custom schema and ontologies, which can be applied to a taxonomy. For example, having Physicians practice Medicine and Medicine isPracticedBy Physicians, instead of Physicians related Medicine.

It is not so much the presence but rather the extent of associative relationships that also distinguishes thesauri from taxonomies. In a traditional thesaurus, associative relationships are as prolific as hierarchical relationships, and perhaps even more so, and they occur between terms of all different kinds and different types of relatedness. The thesaurus standards (ANSI/NISO Z39.19 and ISO 25964-1) provide a list of possible types of associative relationships (process and agent, action and target, cause and effect, object and property, object and origins, and discipline and object, among many others). When taxonomies have associative relationships, they tend to be limited to only certain categories, facets, or concept schemes of the taxonomy.

Related Concepts and SKOS Concept Schemes

Most taxonomies these days, if they are of any significant size (hundreds or thousands of concepts) and intended for use in more than one application, are created in the SKOS (Simple Knowledge Organization System) data model. (Smaller taxonomies might be created in a spreadsheet and imported into a content management system.) The highest level of organizational structure in SKOS is the concept scheme. SKOS-based taxonomy management software will group and display multiple concept schemes together in a single “project” or “knowledge model,” which is intended for a single business use, set of content, user audience, or implementation (with some overlap of multiple use cases acceptable). While SKOS does not provide any recommendation on what you should use concept schemes for, it has become common practice to designate a concept scheme for a taxonomy facet or a metadata property/field. Even when concept schemes are not currently implemented as facets, they might be in the future, so it is good practice to created concept schemes to represent facets. The structure of concept schemes representing facets is also is also a good organizing principle for constructing any taxonomy. Concept schemes also tend to reflect top-level “classes” of ontologies (although not the very esoteric top class of “Thing”).

SKOS permits the creation of related concept relationships both within and between concept schemes. SKOS also has mapping relationships called matching properties, including relatedMatch, for use between concept schemes, whether they are in the same “project” (sharing the same, initial, domain part of a URI) or not. The option to use either related or relatedMatch across concept schemes of the same project can be a source of confusion.

Best Practices for SKOS Related Concepts

If you are implementing concept schemes each as a facet/filter/refinement in a user interface, then it is best practice not create associative (related) relationships between concepts in different concept schemes. Facets function as mutually exclusive aspects or dimensions of content items and queries. Any “relatedness” is implicit based on the search results, but not from the taxonomy itself, which should be flexible to allow any combination of concepts from facets and not prescribe relatedness. For example, a user may want to filter a search on movies by which movies meet selected criteria (facets) of a chosen genre, actor, director, topical theme, and country of production, and the result set will implicitly indicate in which movies where these aspects are related.

Enriching a taxonomy with the semantics of an ontology, in addition to supporting additional data attributes (such as movie production year, actor nationality and birth date, etc.), supports connections across concept types that can be utilized in a front-end application. The user can search not only for movies, but also search for other entities, such as actors (who appear in movies of a certain genre directed by a certain director), or directors (who directed movies on certain themes from certain countries), etc. This involved creating customized, semantic relationships between classes which correspond to the concept schemes: Actor performsIn Movie title and Movie title hasActor Actor, Movie title isProducedIn Country and Country isOriginOf Movie title, etc. These semantic relationships, of course, make any generic SKOS related relationships across the concept schemes unnecessary, redundant, and rather meaningless.

Thus, regardless of the use of your concept schemes, the related concept relationship is best not used between concepts in different concept schemes. Rather, the related concept relationship is better used between concepts within a concept scheme, especially topical (subject) concepts, for example, relating the concepts Data quality and Quality management. Relatedness between named entities within a concept scheme, on the other hand, such as concept schemes for People, Organizations, and Geographic places, is best left to be implicit from the retrieved content and not prescribed in a taxonomy, which may be dependent on the content, change over time, and be too subjective.

Even if the current end-user application of a taxonomy does not support user interaction with related links, associative relationships can support tagging, both manual and automated. Finally, a taxonomy typically has a longer life than a single application, so incorporating in related concept relationships while the taxonomy is being built and regularly maintained is a good practice for the future use of the taxonomy.

Tuesday, January 31, 2023

Taxonomies vs. Ontologies

The question often comes up: how are taxonomies and ontologies different? While there are some short simple answers (such as: taxonomies are hierarchies, and ontologies are semantic networks), it is understandable that the distinction is not that clear. There is considerable overlap. Ontologies may contain taxonomies, and taxonomies can be semantically enriched to become ontology-like. The same software tools, for example PoolParty, support the creation of both.

One of the trends in data/information/knowledge management in the convergence of systems, methods, and technologies, including the convergence of taxonomies and ontologies. It’s gotten to the point that some people will refer to taxonomies and ontologies almost interchangeably, as if they are essentially the same thing. They are not, although they are increasingly combined. It’s interesting that one of the most active discussion channels within the Taxonomy Talk community on Discord is on ontologies.

Taxonomy vs. Ontology (https://graphviews.poolparty.biz/GraphViews)

Uses

Although both taxonomies and ontologies are kinds of knowledge organization systems, which support access to information, their specific uses tend to differ. The primary use of information taxonomies is for consistent tagging and accurate and comprehensive retrieval of content items. These could be documents, components (sections) of documents, web or intranet pages, or digital assets (image, audio, video files, etc.). Ontologies, with their inclusion or linkages to instances/individuals, with their various attributes, are more focused on the specifics of data: data retrieval, data comparison, and data analysis. Taxonomies are primarily for what a content item is about (although content/document types may also be part of taxonomy), as in “get me all the information resources about…,” or “get me a list of products with…” and specifying set of features and price range as filters. Ontologies, on the other hand, can support more complex, multistep queries, such as “get me a list of products with…” a set of features and price range, whose vendors are located in Canada and have a minimum annual revenue of CAD $50 million.

In comparing retrieval of content and data, for example, taxonomies can retrieve a spreadsheet file, whereas ontologies can retrieve data from individual cells in the spreadsheet. Ontologies can traverse data in a database. While this could be a relational database, increasingly ontologies are used with graph databases, since ontologies are also structured as graphs.

Origins

Another major difference between taxonomies and ontologies is their origins. Information taxonomies (not biological taxonomies) originated in the discipline of library science. Specifically, I would say that taxonomies have evolved as a kind of flexible hybrid of classification systems and thesauri. Ontologies, on the other hand, (when not in philosophy) tend to be taught and researched as a part of computer science. Again, there has also been convergence of library science and computer science in the field of information science. Nevertheless, library/information science and computer/information science are different approaches.

Taxonomies have also become an area of interest in information architecture, user experience design, content management, and digital asset management. Taxonomies are also related to terminology management and information search and retrieval. Ontologies, on the other had, have become an area of interest in data science, data engineering, and graph data management. Ontologies also borrow concepts from set theory in mathematics and logic from philosophy.

Taxonomies and ontologies follow different standards, but the standards have also converged in a way. Taxonomies have no standard of their own but follow the thesaurus standards (ANSI/NISO Z.39.19 and ISO 25964) for recommended best practices. Ontologies are based on W3C standards of RDF, RDF-Schema, and the formal language of OWL (Web Ontology Language). The W3C then published a recommendation for taxonomies, thesauri, and other knowledge organization systems called SKOS (Simple Knowledge Organization System) in 2009, and since then it has become widely adopted. SKOS is based on RDF, as is the ontology standards RSF-S. As a result, SKOS and RDF-S statements or namespaes can be combined in the same knowledge organization system, and taxonomies and ontologies can thus be combined.

Features

Both taxonomies and ontologies aim to describe a knowledge domain with collections of entities structured into groups or types, with relationships between them. Ontologies go further in describing the relationships in more detail. Attributes are also more extensive in ontologies. Both support the options for notes or definitions.

Concepts or Entities

Taxonomies are comprised of concepts (sometimes called terms), which are things. Concepts can be generic or specific and may even include named entities (unique proper nouns). Taxonomies do not differentiate between generic concepts and named entities, which correspond to “individuals” in an ontology. Ontologies, on the other hand, distinguish between two types of entities: classes and individuals. Classes can be broad or specific, but, as the name implies, they are intended to contain something, either subclasses or individuals. By contrast, leaf nodes (the narrowest concepts in a hierarchy) in a taxonomy could actually be quite broad in meaning.

Individuals, as defined by an ontology, tend to be named entities (proper nouns), and they should be uniquely individual. This may not be obvious. A brand name product is a proper noun, but technically it is not an individual, because there are numerous specific instances of the product owned by different people. There may be some differences of opinion on how to define individuals.

Relationships

Taxonomies follow thesaurus standards for relationships. Thesaurus hierarchical relationships comprise three types: generic-specific or “is a” kind of relationship, generic-instance (where the instance is a named entity or proper noun), and whole-part. Ontologies have only generic-specific “is a” hierarchical relationships, which are between classes and subclasses. The relationship between an individual and a class is not considered hierarchical in an ontology but rather a relationships of class-member. Also, the whole-part relationship is not considered hierarchical in ontologies (but could be created as a semantic relationship).

While generic-instance is a permitted hierarchical relationship type In a taxonomy, named entity concepts (proper nouns) are not so often narrower to a corresponding generic concept, but rather tend to be grouped in their own separate concept scheme to serve as a separate search facet or filter.

A generic associative (“related”) relationship may exist in taxonomies, although it is more of a feature of thesauri. It is bidirectional and reciprocal, and it tends to be used between concepts within the same concept scheme, which often corresponds to a class in an ontology. Ontologies do not have a generic associative relationship. Instead, ontologies have semantic relations which are designated by the ontology creator, just as the classes are designated, and they are not used within classes but across a specified pair of classes. Suggestions of what might be of related interest to the end-user is not within the scope of an ontology’s purpose which is more structured and based on rules. Ontologies may have other bidirectional reciprocal relationships, such as “goes with,” “has sibling, “accompanies,” etc.

Equivalency and alternative labels

In a taxonomy, each concept has a single preferred label in each language for display and any number of alternative labels and hidden labels per language to help match on searching or tagging. In the traditional thesaurus model, “nonpreferred” terms redirect to “preferred” terms. The alternative labels are sufficiently equivalent in the context of the taxonomy and content to be used for a given concept, and thus might not be exact synonyms. Alternative labels include synonyms, near synonyms, and possibly even narrower terms not deemed needed as concepts with preferred labels.

In ontologies, the OWL element sameAs is intended for equivalency of individuals, and equivalentClass is for the equivalency of classes, and they mean exact equivalence. But there is no designation of one name being preferred and the other alternative. They all are preferred. The use of sameAs and equivalentClass are not intended for use within a single ontology, but rather across different ontologies. So, those OWL elements are similar to the SKOS exactMatch relationship, which is used across concept schemes or taxonomies. They do not support search within the same data set as alternative labels do.

Enforcement of rules

SKOS is a data model for taxonomies and thesauri, but it does not specify any rules for usage. Rather, the taxonomy creator should attempt to follow the guidelines, not exactly rules, in the thesaurus standards (ANSI/NISO Z39.19 and ISO 25964-1). The quality standards include disjoint labels (a label can be used only once for a concept, preferred or alternative, and for only one concept), single relationships (a pair concepts my have hierarchical or associative relationships between them, but not both), and no hierarchical cycles. The standard for ontologies, on the other hand, OWL, has many rules built into it. This makes OWL ontologies more powerful by supporting inferencing and reasoning.

Conclusions

Taxonomies and ontologies share some features, but each has its own additional features. Thus, a combination of a SKOS taxonomy with an OWL ontology combines the features of both. Furthermore, the combination of a taxonomy with an ontology also enables a combination of uses, namely the search and retrieval for both content and data together. Rather than a convergence of taxonomies and ontologies, they are carefully and deliberately combined to maximize their benefits.

Monday, May 29, 2023

Taxonomies and ChatGPT

Organizing terms into hierarchies

Generating alternative labels (“synonyms”) for concepts

Other taxonomy-related uses of ChatGPT

Conclusions

Sunday, April 30, 2023

Taxonomies for Content Components

Content components

Taxonomies and component content management

Friday, March 31, 2023

Taxonomy and Information Architecture Compared

How Information Architecture and Taxonomy Overlap

How Information Architecture and Taxonomy Differ

How Information Architecture and Taxonomy Are Related

Saturday, February 25, 2023

Related Concepts in Taxonomies

Related Concepts and SKOS Concept Schemes

Best Practices for SKOS Related Concepts

Tuesday, January 31, 2023

Taxonomies vs. Ontologies

Uses

Origins

Features

Concepts or Entities

Relationships

Equivalency and alternative labels

Enforcement of rules

Conclusions

Subscribe to The Accidental Taxonomist Blog