The Accidental Taxonomist: Knowledge organization systems

Showing posts with label Knowledge organization systems. Show all posts

Sunday, August 18, 2024

Taxonomies and Ontologies as Semantic Models

In describing what taxonomies and ontologies are and what they can do, we are hearing the word “semantics” more often. “Semantics” means “meaning,” which is nothing new, and taxonomies and ontologies are not new. What is new is that taxonomies and ontologies are now combined more, and we need a way to describe them together, and that involves the description of “semantic.” Furthermore, taxonomies and ontologies are being implemented in new and expanded applications, where the word semantic(s) has significance.

Semantics in Taxonomies and Ontologies

Taxonomies have semantics in their concepts. A taxonomy is not just a term base or a term list, but rather is an organized set of concepts, each with its own unambiguous meaning. The concepts bring together different labels, like “synonyms” for the same thing, and their meaning and usage is further clarified by their arrangement in a hierarchy. It’s often said that a taxonomy comprises “things” (concepts), not mere “strings” (of text).

Ontologies have a higher level of semantics than taxonomies. Even if they don’t contain synonyms, the relationships between concepts (entities) and sets (classes) of entities have additional semantics. The relationships in an ontology are convey meanings beyond mere hierarchy or a generic “related term.” For example, relationships between entities may be “is located in,” “has customer,” and “sells product.” Furthermore, entities in an ontology may have various types of attributes, such as contact information for offices and people, which is another application of semantic data.

Bringing Together Taxonomies and Ontologies

Taxonomies and ontologies have different origins, but now they are increasingly based on shared Semantic Web data models and guidelines, which enables them to be integrated seamlessly. Taxonomies have their origins in library science structures, including thesauri, subject headings, and classification schemes. Ontologies have their origins in computer science and data science with a focus on data models.

Combining them brings the benefits of both: the linguistic aspect of controlled terminology and their synonyms with hierarchical structure in taxonomies and the custom semantic relationships and other additional properties provided by ontologies. This allows users to search for concepts/things, not just text strings while also linking to others things related in a specific way and being able to create complex multi-step queries.

Taxonomies are considered a kind of “controlled vocabulary” or “knowledge organization system.” Ontologies are considered a kind of “knowledge model,” and as a knowledge representation system, rather than a knowledge organization system. When we combine taxonomies and ontologies or speak of them collectively, it’s logical to use the word “semantic,” whether as semantic structures or semantic models, because they both involve semantics and both are usually based on Semantic Web guidelines.

Taxonomies are increasingly based on the Semantic Web recommendation (published by the World Wide Web Consortium) of SKOS (Simple Knowledge Organization System), which is based on RDF (Resource Description Framework). Most ontologies are based on RDF-Schema, an extension of RDF, and OWL (Web Ontology Language), another Semantic Web recommendation. The data models of SKOS, RDF, RDF-S, and OWL may all be integrated into the same knowledge model for a combined taxonomy-ontology. Most software for dedicated taxonomy-ontology management uses these data models.

Semantic Search and Semantic Tagging

Taxonomies support semantic search and tagging. “Semantic search” is the third-ranked autocomplete suggested search phrase in a Google search I did recently on “semantic,” so this is clearly a popular application of semantics. Semantic search refers to search that focuses on concepts and meaning rather than just strings of text. This is not new, but since search that is based on text strings and statistical algorithms is so common, improving search results with the focus on semantics is getting more attention.

Semantic search is best enabled with the tagging of taxonomy concepts, which we may call “semantic tagging” (which I first heard of when asked to write a article on it in 2008). Advanced text analytics technologies, going beyond entity recognition and natural language processing to include natural language understanding so as to analyze sentence structure, syntax, and sentiment, can also yield search results based somewhat on meaning and not just words.

Semantic Data

Taxonomies are traditionally for tagging and retrieving content, whereas ontologies are traditionally for exploring and retrieving data. The combination of a taxonomy and an ontology enables users to retrieve both content and data that are related to each other. Semantics for content is a given, because content (whether text, image, or other media), by its very nature, has meaning. Data by itself may not have much meaning, unless it is related to other data and that relationship has meaning, too. Thus, “semantic data” is significant. We hear reference to “semantic data” much more often than to “semantic content.

You don’t need to add a taxonomy to content to make it “semantic” and understood (rather a taxonomy helps you find the content). However, depending on how data is presented, you may need to add an ontology or at least a semantic data model (a method to describe objects in a database and their relationship to one another) to make data “semantic.” Experts can analyze raw data, but the data is more valuable if non-experts can understand it, too, and that’s why “semantic data” is important. There is also a lot of attention on “semantic data models.”

Semantic Layer

The idea of a “semantic layer” as a framework or approach to make an organization’s information, both data and content, more structured, findable, and actionable, has been gaining popularity recently. Whether the “semantic layer” is new or just a new way of describing something is arguable.

A semantic layer is a standardized framework that organizes and abstracts organizational data and serves as a connector for all knowledge assets. It’s a method to bridge content and data silos through a structured and consistent approach to connecting instead of consolidating data, which data warehouses do. The idea of a “layer” is that it is part of an enterprise-wide architecture of information, data and content, that connects horizontally across siloed content and data repositories. Taxonomies and ontologies, in addition to potentially other knowledge organization systems, such as a business glossary, are key components of a semantic layer.

More Talk of Semantics with Taxonomies and Ontologies

I’ve definitely been hearing of “semantics” more in the world of taxonomies and ontologies, and now I am bringing the word more into my own presentations. Following are some past and future examples.

“Core Concepts of Semantic Intelligence” was a presentation I gave in June 2022 in the Semantic Content Graph Guild , a community of practice led by Michael Iantosca
“The Role of Taxonomy and Ontology in Semantic Layers” was a webinar in which I presented in April 2024
“Enterprise Knowledge Graphs: The Importance of Semantics,” was a presentation I gave at Data Summit conference in May 2024.
“Semantic Data: Taxonomy, Ontology, and Knowledge Graphs” is the name of a new conference organized by Henry Stewart Events, first held on June 27 in London and upcoming on October 23 in New York. I will be presenting at it.
“Semantic Data Enrichment: Taxonomies and Ontologies” is a new asynchronous course I will teach through eLearningCurve and which will be available in spring 2025.

Saturday, December 5, 2020

Differing Definitions of Ontologies

In my last blog post I discussed the different definitions and features of thesauri. Now, I will turn to the next kind of knowledge organization system in the spectrum of complexity: ontologies.

Actually, to consider an ontology as a more (or most) complex type of controlled vocabulary or knowledge organization system, after thesauri, due to additional features, is just one perspective or definition of ontologies, which is not universally shared.

When I first learned about ontologies, coming from my taxonomist perspective, I considered ontologies as merely a more complex type of taxonomy or thesaurus, characterized by customized semantic relationships between concepts (rather than merely hierarchical or associative relationships), more expressive attributes for concepts (rather than mere scope notes), and the grouping of concepts into classes to manage the semantic relationships and attribute types. In fact, I wrote in 2008 for the first edition of my book “An ontology can be considered a type of taxonomy with even more complex relationships than in a thesaurus,” which the following graphic represents.

As my understanding has evolved, I would consider this just to be one kind of understanding or definition of ontology among others. In other words, a controlled vocabulary that has the features of semantic relationships, classes of concepts, and attributes for concepts, can be considered a kind of ontology, but there are other definitions and understanding of ontology within the field of information/knowledge management.

While we usually refer to “controlled vocabularies” as the over-arching category for these things, it is probably better to go up a further level and call an ontology a kind of “knowledge organization system,” rather than a kind of controlled vocabulary. Controlled vocabularies are kinds of knowledge organization systems, where the emphasis is on managed terms or concepts for the purpose of tagging or categorizing and information retrieval. Ontologies, by themselves, are not necessarily for information retrieval, at least not directly. And this is one of the points of differing definitions of ontologies.

Differing definitions and perspective

There are differing definitions of the word ontology: (1) branch of philosophy that studies existence, being, becoming, and reality (Wikipedia: Ontology), and (2) a representation, formal naming, and definition of categories, entities, properties, and relations within a domain (Wikipedia: Ontology (information science)). Of course, we are interested in the second definition, although there are some connections between the two.

The second definition, however, is already multidisciplinary, as it is a concept shared in both information science and computer science. Information scientists (including librarians, taxonomists, and knowledge managers) and computer scientists do not have different definitions of ontologies, but rather different approaches to and perspectives of ontologies and different purposes for the ontologies they create. For computer scientists, modeling data and information helps them design a computer program to perform desired functions. For information scientists, modeling data and information makes it easier to retrieve information with complex queries. Information scientists consider an ontology as a kind of knowledge organization system, whereas computer scientists tend to consider an ontology as a form of knowledge representation.

Yet even among information scientists, who consider ontologies as knowledge organization systems and have the same objectives in developing ontologies, there are different understandings of what exactly constitutes an ontology and how it relates to other knowledge organization systems, such as taxonomies. This is due to (1) different emphasis on various ontology components, (2) the question of adherence to ontology standards, and (3) the way different ontology software tools model ontologies and their relations to taxonomies differently.

Differing understandings of ontology components

There is a shared understanding that ontologies are composed of things, their properties/attributes, and their relationships.

Ontology model example with classes, relations, and attributes

Ontology example with components: classes, relations, and attributes

However, there are differences in understand of the two kinds of “things”: classes and individuals. Classes are categories or groups of things with shared characteristics, whereas individuals are specific instances of things. This seems obvious, but if you approach ontology design from the perspective of taxonomy design it can become less certain. Is an individual the most specific concept (also called “leaf node”) in a hierarchy, or is an individual a named entity/proper noun? The definition of components of ontologies does not answer this question, because ontology structures are meant to model data, not to organize taxonomy concepts that could be either generic (common nouns) named entities (proper nouns). Drawing the line between classes and individuals can be challenging, but whether this matters may depend on what tool you are using.

Furthermore, ontologies may have other components, such as axioms, rules, restrictions, events, and function terms, but ontologies as knowledge organization systems rarely have most of these.

Differing ontology standards or languages

In 2004 the World Wide Web Consortium (W3C) published the Web Ontology Language (OWL) specification, which is based on the Resource Description Framework (RDF), as “a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things,” which has become widely adopted. Now it is common to think that ontologies must follow OWL guidelines. But (information science) ontologies have existed before OWL, and an ontology does not have to follow OWL to be called an ontology. There are other ontology languages besides OWL, but they are not as common. To share and reuse ontologies, it is recommended to follow the OWL standard.

Differing ontology modeling software

While one could design the high-level model of an ontology in a mind-mapping tool, there would be no enforcement of standards or best practices (preventing duplications or incomplete data, etc.), and it’s difficult to scale, so dedicated ontology modeling software is recommended. However, ontology modeling/editing software does not model ontologies all in the same way.

The main difference is probably between stand-alone ontology software (such as Protégé or TopBraid Composer) and software that combines ontology with taxonomy/thesaurus development and editing (such as PoolParty, Semaphore, or Graphite). Stand-alone ontology editing software supports creating a detailed ontology as single model, thus including classes, multiple levels of subclasses, and individuals (instance concepts). In integrated software that combines taxonomy/thesaurus development with ontology development, the taxonomy or thesaurus (or multiple controlled vocabularies) is created in one space with one set of software features, and the ontology is created in another space with a different set of features. The ontology (or even just parts of it) is then applied to the taxonomy, so that concepts in the taxonomy inherit the attribute types and relationships of their associated class, and the taxonomy concepts are like individuals in the ontology. The ontology can be considered a semantic layer in the model, as the following graphic illustrates.

These two different approaches to ontology modeling thus result in different definitions of an ontology. A ontology is likely to be considered as a more complex type of knowledge organization system by users of stand-alone ontology software, whereas an ontology is likely to be considered and expressive semantic layer applied to one more taxonomies by users of integrated taxonomy/ontology software.

Ontology lite or ontology-like

When I was still considering ontologies more akin to thesauri with semantic relationships, and I expressed such views in a discussion forum, someone (whom I don’t remember), referred to this kind of ontology as “ontology lite,” since it has features of an ontology, but does not fully follow an ontology model and standards. This is not necessarily a bad thing. Controlled vocabularies and knowledge organization systems can be considered along a continuum, and you should build what works for your situation.

Another kind of ontology-like structure is when you start linking multiple controlled vocabularies together. My initial experience with working on commercially implemented ontologies had been with such ontology-like systems, which were not actually called ontologies, at a former employer Gale. There we had controlled vocabularies (also called object classes) for subjects, persons, places events, products, companies/organizations, named works, etc., many of which had customized reciprocal relationship pairs between them (such as the relationship pair Creator/Creatby, between person names who were authors, and named works) and many customized term attributes (such as Birthdate, Death date, Birth city/state/country, Death city, state/country for persons).

I also heard this approach recently from a speaker, Ahren Lehnart, at Taxonomy Boot Camp conference, who described the linking of controlled vocabularies with related match (not equivalent match) relationships as “trending toward” creating an ontology.

Friday, March 29, 2019

Knowledge Modeling

I usually have spoken or written only of creating controlled vocabularies, or more specifically taxonomies, rather than creating knowledge models. Now, I am beginning to think of knowledge models and knowledge modeling.

A knowledge model is not just a fancy buzzword for a controlled vocabulary. It’s more complex than that. A knowledge model is more similar to a knowledge organization system, which I defined in an earlier blog post. As a system or a model, it comprises not only the concepts, their labels and attributes, and their relationships, but also rules or policies for their use. Furthermore, a knowledge model is either a complex type of knowledge organization system, such as a thesaurus or an ontology, or a set of multiple controlled vocabularies to be used in combination for the same content set that form a set of taxonomies, such as facets, but it is not a simple single controlled vocabulary. The designation of “model” is also what is used for RDF, SKOS, and OWL-based systems. These are often called semantic models.

The activity of “knowledge modeling” is also slightly different and more complex than mere “taxonomy creation.” Taxonomy creation involves identifying concepts through obtaining input from stakeholders/users and from surveying the content, possibly with some additional external resources, but the extent of obtaining user input may vary. It is possible to build a taxonomy, especially one for external users, with no user input and just input from some other stakeholders. Knowledge modeling also involves inputs of people and content, but more emphasis is on stakeholder/user input. Content contains information, but people contain knowledge, so knowledge modeling requires the input of various people, with the input gathered in a comprehensive and systematic way, such as through interactive brainstorming workshops and interviews. Furthermore, knowledge modeling does not look at merely content, but starts out considering the body “knowledge” that can be derived from the content.

Knowledge modeling may also involve a slightly different thinking of the taxonomist or knowledge modeler. Instead of thinking of what terms are needed for indexing and retrieval of a set of content, the knowledge modeler thinks of what are the possible classes, facets, or concept schemes to describe a domain of knowledge, and what are the various user activities and use cases that could be supported. From there, specific concepts are then created. Taxonomy creation involves a combination of top-down and bottom approaches to the hierarchy of concepts, but knowledge modeling puts more emphasis on the top-down approach.

Knowledge modeling is a very apt description for what is involved in designing and creating ontologies, which are knowledge organization systems that describe a domain of knowledge, through concepts, classes of concepts, and customized semantic relationships between concepts of different classes. (Ontologies, by definition, should also follow the OWL standards of the World Wide Web Consortium for data representation.) There are knowledge organization systems which are not ontologies yet make use of some semantic relationships, and designing these also involves the activity knowledge modeling. Determining what additional semantic relationships are desired, how specific they should be, and what they should be named in both directions is very much a knowledge modeling task.

Knowledge modeling also suggests that it is an activity of knowledge management and not merely information management. Knowledge management is defined as “the process of capturing, distributing, and effectively using knowledge,”(Tom Davenport, 1994), which goes beyond the mere support of search, discovery, and retrieval. Knowledge management is especially for internal enterprise-level knowledge.

I think knowledge modeling is more challenging than mere taxonomy creation, but I am up for the challenge.

Friday, March 17, 2017

Taxonomies as Knowledge Organization Systems

A taxonomy is a kind of controlled vocabulary. A taxonomy is also a kind of knowledge organization system. So, the question is: what’s the difference, if any, between a controlled vocabulary and a knowledge organization system? When I first heard of “knowledge organization system” I perceived it as merely a more academic term for controlled vocabulary. While it’s true that knowledge organization systems are discussed more in library and information science literature and courses than they are in corporate enterprises, there are additional nuanced differences between the two.

Controlled vocabularies comprise simple term lists, synonym rings (search thesauri), authority files, taxonomies, and thesauri. Knowledge organization systems comprise all of these, plus categorization schemes, classiﬁcation schemes, dictionaries, gazetteers, glossaries, ontologies, semantic networks, subject heading schemes, and terminologies. As such, knowledge organization systems can be considered to be broader than controlled vocabularies, including all kinds of controlled vocabularies and more.

Yet, it’s not simply a matter of more types that distinguish knowledge organization systems. Knowledge organization systems include “schemes” that go beyond how the terms are organized and related to each other. Categorization schemes, classification schemes, semantic networks, ontologies present not only terms and relationships but also models of how information/knowledge can be managed and organized. These typically involve additional specifications and documentation on how they are to be used. There is indeed something to the name “knowledge organization system.” A “system” is more than just terms and their relationships.

As such, there is more discourse around knowledge organization systems than controlled vocabularies, per se (separate from discussions specifically about taxonomies or thesauri). Conference sessions of the Association for Information Science & Technology (ASIS&T) more often have “knowledge organization systems” in their titles than “controlled vocabularies.” There is even a professional association dedicated to knowledge organization systems, the International Society for Knowledge Organization (ISKO). There is no comparable organization for controlled vocabularies or just taxonomies or thesauri. ISKO holds conferences with sessions around the various issues of knowledge organization systems, including taxonomies. Recognizing that taxonomies are an important kind of knowledge organization system, the ISKO UK chapter co-sponsors the Taxonomy Boot Camp London conference.

Taxonomies are not only included within knowledge organization systems, but they are also a part of the field of knowledge management. As a consultant, I worked with clients who managed taxonomies within their knowledge management services, headed by a manager or director of knowledge management. Also, at a consultancy where I previously worked, taxonomy consulting was part of the larger knowledge management consulting practice

I used to describe taxonomies as only a kind of controlled vocabulary, but now I will start referring to them as knowledge organization systems as well.