Sunday, June 26, 2022

SKOS Taxonomies

Over the 26 years that I have been involved in controlled vocabularies, thesauri, and taxonomies, the biggest change I have seen in the field is the adoption of SKOS (Simple Knowledge Organization System) as a schema model and standard.

If you are creating taxonomies exclusively within a single system (such as the SharePoint Term Store or controlled tags or categories of a content management system, documentation management system, DAM, etc.), then you probably have not paid much attention to SKOS. It’s true that taxonomies created within and used within a single system, do not have to follow an external standard. But that is not the trend of information management and technology anymore. Connectivity, interoperability, data sharing and reuse, data-centric architecture, vendor-neutral formats, linked data and linked open data, breaking down data silos, enterprise-wide knowledge, and enterprise knowledge graphs have become the preferred trends and directions.

Different Kinds of Standard

With respect to standards, there exist two basic kinds: (1) standards for design, functionality, and a consistent user experience, and (2) standards for compatibility, interoperability, and machine-readability. For this reason, there are two separate sets of standards for taxonomies and other knowledge organization systems. Another way to think of it is that there are standards for each the front end (user interface and experience) and the back end (computer-readable code) of taxonomies, and they are somewhat independent yet still compatible with each other.

For taxonomies and thesauri, more has been written about the front-end design and best practice standards than the back-end interoperability standards. This is for several reasons. The design and best practices standards (ANSI/NISO Z39.19 and ISO 25964 and its predecessors ISO 2788 and ISO 5964), have been around longer. They are lengthier and more detailed than interoperability standards, and they apply to taxonomies and thesauri regardless of their digital or nondigital format. So, this article will focus instead on the back-end, interoperability standard, which is SKOS.

SKOS logo
SKOS Background

SKOS is a recommendation for "a common data model for sharing and linking knowledge organization systems via the Semantic Web". These knowledge organization systems include thesauri (as defined by the ANSI/NISO and ISO thesaurus standards), taxonomies, classification schemes, subject heading systems, and other controlled vocabularies. SKOS is based on RDF (Resource Description Framework), a World Wide Web Consortium (W3C) standard for description and exchange of graph data. RDF specifies that all statements consist of subject-predicate-object triples, and all resources have URIs (uniform resource identifiers).

The development of SKOS aimed to  build upon RDF to provide a recommended schema for thesauri.  SKOS development was first undertaken as the Semantic Web Advanced Development for Europe (SWAD-Europe) project before being adopted and supported by the W3C in 2004. The W3C formally released the SKOS recommendation in 2009.

Meanwhile, the W3C had been working on other recommendations for web-based ontologies, including RDF Schema (RDFS) and Web Ontology Language(OWL). SKOS is compatible with RDFS and OWL, and elements from the different models can be combined. Furthermore, SKOS can even be considered as a very generic upper ontology itself, and the W3C documentation describes SKOS in terms of OWL and RDFS expressions.

The main types of elements of SKOS are concepts, lexical labels, documentation properties (notes), semantic relationships, mapping properties, and concept collections. (Concepts, concept schemes, and collections are ontology classes, and the others are ontology properties.) In their machine-readable form, the SKOS elements are concatenated with no spaces, such as preLabel, scopeNote, and exactMatch.

SKOS Concepts, Labels, and Notes

SKOS is concept-centric. Making a distinction between concepts and labels is the biggest departure from traditional thesaurus standards and past controlled vocabulary practice. A concept is an idea of something, and a label is a name for that idea. Thus, a concept may have multiple labels. For the organization of a vocabulary, especially as a hierarchy, one of the various labels needs to be designated as the preferred displayed label. The others are alternative labels and its sub-type, hidden label, which may be used to designate that the label should not display to end-users. Labels for the same concept may exist in multiple languages, but there may be only one preferred label per language.

Notation is intended for use as an appending part of a label, such as an alpha-numeric code, which is commonly used in classification schemes.

Documentation comprises various types of notes, including scope note, editorial note, change note, and history note. Definition and example are additional documentation types. Scope notes are commonly used in thesauri to clarify the usage of a concept in tagging/indexing for the specific context of controlled vocabulary and its set of content. They serve an important role for manual tagging. Other note types may be utilized for administration and management of the controlled vocabulary. Definitions may be entered for more technical controlled vocabularies or when the controlled vocabulary also serve the function of a glossary.

SKOS Concept Schemes and Collections

What constitutes an individual "taxonomy," "thesaurus" or other controlled vocabulary? This may not be very clear. SKOS introduces the formal organizing unit called a concept scheme, as a “collection of concepts.” A concept scheme is a single controlled vocabulary, thesaurus, hierarchical taxonomy, facet within a faceted taxonomy, or metadata property within a larger metadata schema.

There are some advanced, lesser used features of SKOS, including in scheme, which allows you to control whether a concept is in a concept scheme regardless of whether it’s within the concept scheme’s hierarchy (which is otherwise the default). There is also a special designation of top concept for the top concepts of a concept scheme, a designation which could be utilized for a front-end display implementation.

Collections are an additional optional way to designate a grouping of concepts for a purpose, such as the taxonomy concepts to be used in only specified implementations or those of subject categories for subject matter expert review. Furthermore, concepts can be ordered within collections.

SKOS Relations and Mapping Properties

SKOS includes what are called semantic relations, although this name could cause confusion, since they are the basic thesaurus relationships (broader, narrower, and related), not customizable semantic relations characteristic of ontologies. These thesaural-type relationships are used between concepts within the same concept scheme. In addition, SKOS specifies broader transitive and narrower transitive, meaning the inheritance of the relationship to additional levels of the hierarchy. This is usually assumed to be the case by default, and thus these specifically transitive relations are rarely implemented, but if there are reasons not to inherit and extend the logical hierarchy by default, then the transitive relations may be used. (I have not come across a use case, though.)

Since SKOS specifies concept schemes, SKOS also specifies an additional set of relation types called mapping properties that are to be used between concepts in different concept schemes or different taxonomies.  These comprise exact match, close match, narrower match, broader match, and related match. Exact match and close match are used to map existing taxonomies together, often so that one is used in the tagging and the other is used in the retrieval. The other mapping relations may be used to extend one taxonomy with another while still maintaining a distinction between the two.

Following is a table of SKOS elements by type (class or property) with the concatenated machine-readable forms.

Implementation of SKOS

Most commercial and open-source taxonomy/thesaurus management software now supports SKOS. There are also simple free tools called SKOS editors. SKOS elements are presented in their full human readable names (such as Preferred Label, instead of prefLabel), so it is intuitive to understand. Thus, taxonomists don’t have to worry about SKOS, but should at least be familiar with its principles. Familiarity with SKOS makes it easier to switch from using one software package to another. Software may vary, however, in how well they support some of the less common features, such as in scheme, collections, and broader/narrower transitive.

Taxonomy/thesaurus management software often has the additional administrative grouping of related concept schemes for the same implementation into what may be called a “project” or “knowledge model.” SKOS mapping relations tend to be used more often across concept schemes that are managed in different projects, rather than within the same project. Within the same project, concept schemes tend to represent facets (which have no relations between them) or ontology classes (which have customized semantic relations between them).

Since all elements of SKOS are standard machine-readable, you can leverage any element with rules for usage, such as for how tagging should be done and how concepts and relationships are displayed. Custom applications of SKOS vocabularies are thus common.

If you want to dive into all the details of SKOS, consult these resources from the W3C:

SKOS is intended to be flexible, and it is more suggestive than restrictive. Thus, a SKOS-based taxonomy or thesaurus could still be poorly designed, and that’s why the other standards for best practices, ANSI/NISO Z39.19 and ISO 25964 are also important.