The Accidental Taxonomist: 2022

Friday, December 30, 2022

Taxonomy Definition

I usually explain that a taxonomy is a structured kind of controlled vocabulary, which is list of terms (or concepts) usually used to tag content to aid in its retrieval. The structure can be hierarchical, faceted, or a combination. Other people have defined taxonomies for a general audience in more simplistic ways as a kind of hierarchical classification system. So, while a taxonomy has two main features (naming and structure), my preferred definition has focused on the controlled vocabulary and naming aspect, whereas other definitions focus on the hierarchical classification aspect of taxonomies. However, a taxonomy and a classification system are not necessarily the same. While it is understandable that a definition is simplified for a general audience, it should not be simplified to the extent of being misleading.

I have blogged previously on the differences between taxonomies and classification systems, so I won’t repeat all the differences again. The main point is that a classification system is generic and rigid and is intended to be used widely, such as the Dewey Decimal Classification for libraries, whereas a taxonomy tends to be customized for a particular use case and context and is flexible and undergoes changes.

Meanwhile, there are also a few well-known classification systems that are called “taxonomies,” such as the Linnaean taxonomy of organisms and Bloom’s taxonomy of educational objectives. These seem quite different from the information-retrieval type of taxonomy. The Linnaean hierarchical levels have names (Kingdom, Phylum, Class, etc.). The relationship of the hierarchical levels to each other are not all of the thesaurus standards: generic-specific, generic-instance, or whole-part. Rather, the Linnaean taxonomic relationship are generic-specific only, or more precisely that of member of class or subclass. Bloom's taxonomy has a completely different hierarchical model that does not follow thesaurus standards at all.

How does a taxonomy of concepts for information retrieval relate to a scientific taxonomy? They are similar, and the differences are not so great that there should be considered different meanings of the word “taxonomy.” If we consider that taxonomies are systems to name and organize things hierarchically, then a taxonomy for information retrieval, comprised of terms for tagging and retrieving content (documents, images, etc.), can be considered a taxonomy of a controlled vocabulary, in contrast to taxonomies of things, such as organisms. This is a slightly different perspective than to consider a taxonomy as a kind of controlled vocabulary, as I previously had. The following diagram illustrates a possible way to consider how information-retrieval taxonomies related to classification systems and controlled vocabularies.

Diagram showing that information taxonomies are at the interssection of classification systems and controlled vocabularies

Several kinds of knowledge organization systems are defined by their published standards. For thesauri, there are ANSI/NISO Z39.19 and ISO 25964. For terminologies, there is ISO/TC 37/SC 3 and other related standards. For ontologies, there is OWL (Web Ontology Language) from the W3C. There is no standard, however, specifically for “taxonomies” or even for “classification systems,” which is a reason why these remain difficult to define. The designations “classification system,” “classification scheme,” and “taxonomy” have been used interchangeably.

Wikipedia provides the definition at the entry for Taxonomy: “A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types.” But then it goes on to say, “it may refer to a categorisation of things or concepts.” Thus, an information-retrieval taxonomy is a categorization of concepts (also called terms in a controlled vocabulary). It is not a classification system, since the goal is not to classify things, not even the things tagged with the taxonomy concepts, but rather to organize the set of concepts that have been identified as appropriate for tagging and retrieving a set of content.

Sunday, November 27, 2022

Taxonomies to Bridge Silos

There is increasing interest in organizations to “break down silos” of content and data. Silos may be different software applications, distinct web or intranet content, or merely different computer drives and folders. The goal is to enable search and retrieval across content that is stored in different content/document management systems and shared folders and the analysis and comparison of data stored in different kinds of database management systems, records management systems, and spreadsheets. This results in better, more complete information to enable more informed decisions and knowledge discovery, along with improved user satisfaction, while also saving time. Breaking down or bridging such silos was a theme of my two most recent conferences.

LavaCon: Connecting Content Silos

The 20^th annual LavaCon conference on content strategy, held October 23-26 in New Orleans, had the theme this year of “Connecting content silos across the Enterprise.” The conference had a number of presentations tied to the theme, 10 of which had “silos” in their titles. Two presentations I especially enjoyed were by leading content strategy consultants about how to connect silos.

Sarah O’Keefe of Scriptorium, in her presentation “From Silo Busting to CaaStle Building,” with a fairy tale castle metaphor, explained that completely unified content cannot be achieved, because CMSs are tuned to specific content domains, corporate websites accommodate different goals of different groups, content silos have their own delivery pipelines, and silos often match the organizational structure. Her solution was to provide Content as a Service (CaaS), or a “CaaStle in the cloud(s).” Silos are kept, allowing for unique requirements, and perhaps reduced in number, but are connected were needed.

Val Swisher of Content Rules, in her presentation “Creating a Unified (Siloed) Content Experience: The Importance of Terminology and Taxonomy,” explained that siloed content results in different user experiences for each silo. But silos are not going away, because there is no single toolset, particular content has its owners, and certain content may be considered special. Therefore, the user experience should be improved to “ensure that all content looks like it comes from the same company” and to “eliminate the confusion that users experience when they consume content created by various silos.” This is done by standardizing the content, the search, page layout, navigation, content types, terminology, and taxonomy.

At LavaCon, I presented a pre-conference workshop with the title “Using Taxonomies and Tagging to Connect Content Across the Enterprise.” While most of my workshop addressed the general principles and best practice for taxonomy creation, along with the basics of tagging, I did discuss a how centrally managed taxonomy, external from but linked to various content management systems and other applications or repositories of content, can bridge silos. Taxonomy management software positioned as “middleware” such as PoolParty, connects to these different content applications and repositories, and then the taxonomy is presented to the user in a single user interface.

Taxonomy Boot Camp: Taxonomy Breaking Down Silos

At the annual Taxonomy Boot Camp conference, held November 7-8 in Washington, DC, and co-located with the KM World conference, I spoke in a two-presentation session titled “Taxonomy Breaking Down Silos.” The idea is that taxonomies provide the connections to break down barriers between different systems and teams. I presented on taxonomy linking jointly with Donna Popky, Senior Taxonomy & Information Architecture Specialist, Harvard Business School. I explained the principles of taxonomy project linking, and Donna presented a case study of taxonomy linking using a hub and spoke method to link separate taxonomies managed by different business units with separate content repositories for different purposes at Harvard Business School. So, this was a case of creating a hub taxonomy linked to the various business unit spoke taxonomies.

The other speaker in the session, Rachael Maddison, Content Infrastructure Architect & Taxonomy Product Manager for Adobe Digital Media Experience and Engagement, presented on taxonomy adoption across corporate silos and not merely content silos. Collaboration plays a role in wider taxonomy adoption, and as Rachael stated: “Mapping or merging can’t happen until there is stakeholder buy-in.”

Over the years, my list of the benefits of taxonomies has grown. Linking data, content, and corporate silos are additional benefits. This can be done with a single, enterprise taxonomy or with multiple linked taxonomies. In either case, the taxonomy needs to be managed externally from any individual siloed application in a dedicated taxonomy management system. Taxonomies can then break down corporate silos and connect content and data silos.

Tuesday, October 18, 2022

The Accidental Taxonomist, Third Edition

The third edition of my book, The Accidental Taxonomist, will officially be published November 7, and I just received advance printed copies, so now is a good time to talk about. Details of the book are on its website. For those who wonder how this edition differs from the prior edition, I discuss that in the preface of the 3rd edition, which I have copied here.

****

I am thrilled that taxonomies are as relevant now as they were when I was writing my first edition in 2009 and second edition in 2015 and even more so. Some people had previously thought that improved search algorithms would largely replace the need for taxonomies, but users want to be able to select search refinement terms, and the greater adoption of search has led to more taxonomies. Some thought that AI technologies of text analytics and auto-classification might replace human-created taxonomies, but, on the contrary, they made taxonomies more valuable. Some thought that ontologies would replace taxonomies, but instead ontologies have connected and extended taxonomies, providing additional uses for taxonomies. Innovations and trends in digital content and data have given rise to new uses for taxonomies, including support for recommendation, personalization, data-centric enterprise knowledge management, voice of the customer analysis, and chatbot design.

There are signs of interest in taxonomies in various places: social media posts, conference presentations and workshops in a greater number of different conferences, and a continued strong enrollment trends in my online taxonomy course. Taxonomy consultants I know are doing well with business. A search on “taxonomy” in Google Trends shows a continued steady interest in the term since around 2006. Members of the Taxonomy and Ontology Community of Practice LinkedIn group has grown from 3,330 in 2015 to 5,564 in June 2022. More people continually get involved in taxonomy work, as our survey of taxonomists indicates relatively more people with fewer years of experience. (See Appendix A, Question 2.) The number of jobs for taxonomists continues to increase, as evidenced by repeated taxonomy job searches over the years on job boards, job alert postings, and direct queries colleagues of mine have reported receiving from recruiters. The trend toward remote work, especially for knowledge workers, has opened up more job possibilities for taxonomists, who are no longer limited by their geographic location, which had previously been an issue for this very niche specialization. We may soon see more digital nomad taxonomists living and working all over the world.

Meanwhile, as I have continued to engage in taxonomist discourse, consulted for more taxonomy clients, and attended and created new conference presentations, I have continued to learn more and thus refine how I understand and explain taxonomies. It is time that this book also catches up to how I have been explaining taxonomies in my most recent presentations and workshops. I have even revised my thinking on the definitions and types of controlled vocabularies, so the definitions and types section of chapter 1 has been rewritten in this edition. Also in the first chapter, additional uses for taxonomies have been included.

In addition, perspectives on taxonomies have gradually changed, and I am finally catching up. One of the main updates to this third edition has been to move decisively from the traditional thesaurus model and adoption of the language of the SKOS (Simple Knowledge Organization System) with respect to taxonomies. Most significantly this means referring to concepts and their labels and not to terms. An oft repeated phrase is that it’s about “things, not strings.” Concepts are things, whereas terms, as words or phrases, are merely strings (of text). This has also involved removing the equivalence relationship section from the chapter on relationships and adding a section on alternative labels to the chapter Creating Concepts and Labels (which has been renamed from Creating Terms).

When I updated the 2nd edition, I was working at the time for a library database vendor, so my perspective was somewhat biased toward that industry and use case, despite having had experience has a consultant too. Now, with not only more consulting experience in the interim, but from the perspective of working for a taxonomy software vendor, I see better the varied uses and implementations of taxonomies. As a result, I have changed number of the examples. I also made updates to the chapter on manual tagging (formerly called human indexing) and replaced many references to “indexing” with “tagging,” in recognition of the more commonly used term, although they are not identical. I had entered this field as an indexer, but I should no longer let my indexing roots influence my perspective. I also cut out some information on thesauri, such as details of the various thesaurus print display formats.

This edition features a new chapter on ontologies. This is not merely because ontologies may be of interest to taxonomists, but because ontologies in business and industry are increasingly created as an extension of existing taxonomies thus enabling taxonomies to serve more purposes. A convergence of taxonomies and ontologies is now possible with SKOS-based taxonomies, whereby both taxonomies and ontologies are based on RDF and other W3C standards. I am also seeing more taxonomist/ontologist hybrid jobs posted.

Technologies and vendors change, so the chapters on software and auto-categorization needed updating. There have been evolving trends in software, such as the ability to connect and integrate with other systems through APIs, instead of exporting and importing taxonomies, and including auto-tagging within the same tool. Other updates include data from a new survey, nearly all new screenshots, and updated information on taxonomy courses, conferences, and other resources in the final chapter. About half of the chapter head quotes are also new.

In case you missed it in the preface to the second edition, the updates from the first to the second edition (and thus also updates between the first and the third edition) include the following: managing taxonomies in SharePoint, the relationship between taxonomies and metadata, reference to updated ISO standards of 25964 of 2011 and 2013, the introduction of the SKOS standard, and improved explanations on planning and designing taxonomies, along with results of a new taxonomist survey and software information updates.

Friday, September 30, 2022

Taxonomies and Semantics

How are taxonomies related to “semantics”? I considered this question, as the latest conference I participated was SEMANTiCS, the European conference of semantic technologies, which took place this year in Vienna, Austria, September 13 - 15. Topics presented and discussed in this conference included ontologies, knowledge graphs, semantic models and reasoning, linked open data, machine learning, natural language processing, and other language technologies. Yet taxonomies were also discussed in a number of presentations. In contrast to a conference dedicated to taxonomies, such as Taxonomy Boot Camp, where taxonomies are the focus, at SEMANTiCS, in the context of semantic technologies, taxonomies are a component or an underlying layer in the application of semantic technologies.

Semantics means “meaning.” Like the words “taxonomy” and “ontology,” there is a traditional meaning that is more academic and, in the case of semantics and ontology, also connected to philosophy, but there is also a modern meaning that deals with information science and knowledge management. For example, “semantic search,” means searching for concepts and ideas, not merely matching search strings of text. Thus, a taxonomy or thesaurus supports semantic search by comprising unambiguous concepts of “things, not strings” of text.

Semantics also implies Semantic Web, with technology that complies with the Semantic Web that have been developed by the World Wide Web Consortium (W3C). The Semantic Web, also known as Web 3.0, is not component of the World Wide Web nor a different web, but rather a kind of extension of the web to include not merely content and simple hyperlinks, but also all kinds of data that is semantically linked (where the links/relationships also have meaning). The Semantic Web allows more complex data, and data stored and organized in graph databases, to be machine-readable. This could be either on the public web or within an organization that follows Semantic Web standards for managing its data and content.

Taxonomies were mentioned in a number of other presentations as a given foundation to ontologies, semantic networks, or knowledge graphs. For example, taxonomies and ontologies were the basis of knowledge-based recommendation system, described by Andreas Blumauer in his presentation on that subject. In her talk “ Real World Case Studies: Five Success Factors to Implementing an Enterprise Data Fabric,” Lulit Tesfaye explained that the components of a data fabric are metadata, taxonomy, ontology, knowledge graph, connections and integrations, and front-end applications.

A session titled Taxonomies included a talk on “Taxonomy and Terminology,” compared and contrasted taxonomies and terminologies with respect to their kinds of terms and purposes, but also explained the semantics role of taxonomies. The presenter, Klaus Fleischmann, said that terminologies guide content creators, ensuring consistent, correct use of language company-wide, whereas taxonomies provide a semantic layer on top of content and metadata, often for semantic applications. Fleischmann also explained that taxonomies can be extended to ontologies or, in his words, taxonomies “modeled relationships via ontologies.”Also speaking in the Taxonomies session, Nimit Mehta whose presentation was titled “The Semantic Data Stack - A user story on building a data fabric,” Mehta described taxonomies as “A layer between your data and your business applications” and a “governance layer.”

Finally, I presented a taxonomy-related tutorial, although not on taxonomy creation alone, but rather titled “Knowledge Engineering of Taxonomies, Thesauri, and Ontologies,” in which I explained that taxonomies and ontologies are not so much distinct knowledge organization systems, but rather than ontologies are a semantic layer that are applied to and extend a taxonomy, giving it a greater degree of semantics.

I hope to participate in the next SEMANTiCS conference in September 2023 in Leipzig, Germany.

Wednesday, August 31, 2022

SKOS-XL for Taxonomies

I recently posted about SKOS (Simple Knowledge Organization System). If you have read anything about SKOS, then you might have come across SKOS-XL (SKOS eXtension for Labels) and wondered what that is. The World Wide Web Consortium (W3C) released its recommendations for SKOS and SKOS-XL at the same time in 2009 but chose to make them separate recommendations. One way to see it is that, by separating out SKOS-XL, SKOS is indeed truly “simple.” In the detailed SKOS reference, SKOS-XL is an appendix.

www.w3.org/TR/skos-reference/skos-xl.html

Extending labels to become resources

“Things, not strings” is a tagline for semantic models, such as SKOS, which emphasize concepts in taxonomies and other knowledge organization systems and not terms or words. Of course, strings of text exist, and when associated with concepts they are called “labels.” The distinction between a label and the concept that the label describes may seem indistinguishable or perhaps just philosophical. The main difference is that concepts are unique within a taxonomy, but labels are not. A concept may have multiple labels (synonyms or names in different languages), and the same label might apply to different concepts (homographs).

SKOS specifies preferred labels, alternative labels, and hidden labels as options for concepts. Hidden labels can be considered as a type of alternative label that should never be displayed. Alternative labels may display, depending on the front-end application. Preferred labels are what are displayed, especially in hierarchies and facets.

Concepts, as things, have properties or characteristics. Labels do not. But sometimes there are reasons to assign properties to labels, such as to indicate the purpose or use of different labels. In this sense, you would want to turn a string into a thing. More correctly, a thing is called a resource, as described by the Resource Description Framework (RDF) the model upon which SKOS is based. This is what SKOS-XL supports: converting labels to resources. It does this by adding three more elements not found in SKOS: label, label relation, and literal form. It is the label relation in particular that enables the extension to establish a link between a concept and a label. Further details are in the W3C's SKOS-XL recommendation, which I am not going to repeat here.

Use for SKOS-XL

A typical use case for SKOS-XL to assign properties to labels is if you want to have different labels for different user groups, such as a medical taxonomy for shared medical content to be accessed by both medical professionals and lay people. Medical professionals may prefer a concept labeled Neoplasms, while lay people could call it Cancer. Different user groups could be based in different regions. Although different ISO-code based language labels can be used to distinguish regions in addition to language (such as en-US and en-GB), you may not want to duplicate the vast majority of preferred labels and merely distinguish the few that are actually different.

While SKOS permits multiple alternative labels, aside from hidden labels, there is no way to distinguish their types or purposes in SKOS. You may want to alternative labels support search in one front-end application and not another. You may want to designate official acronyms as distinct from other alternative labels. You may even want to distinguish between different kinds of hidden labels, such as those that should be hidden because they might be pejorative or offensive, and those that you wish to hide only from a type-ahead display because they are near duplicates of other alternative labels and too many alternative labels would clutter up the display. Finally, there may be alternative labels used by only certain users or in certain regions.

SKOS-XL lets you assign properties or attributes to labels. Assigning the purpose or use of the label is only one possibility, although it is the most common use of SKOS-XL. You may wish to manage more administrative metadata about labels, such as the source or origin of different labels.

Implementing SKOS-XL

The principle of SKOS-XL is not complex, but implementation can be more challenging, and if you are building taxonomies with the SKOS-XL capability, you would want to use taxonomy management software that supports SKOS-XL, such as PoolParty. Taxonomy management software products are quite consistent when it comes to their user interface for supporting the editing of basic SKOS taxonomies, but they are not the same for creating and editing SKOS-XL labels, which is a less common function.

Having properties, such as types, for terms is not new, but required some more innovation in the SKOS model of things (concepts), not strings (terms). It was common for non-SKOS taxonomy/thesaurus management software, which treated different terms with the same meaning as equivalence relationships, to support the customization of relationships, including the equivalence relationship. SKOS-XL ensures that this earlier feature is supported in the current standard, in machine-readable format.

For SKOS-XL to be more widely used and maybe even more elegantly supported requires a great sharing of use cases. I hope the taxonomist community will share their experiences with SKOS-XL, so we can talk about practices and recommendations and not just theory.

Further information:

“Taxonomy Management Based on SKOS-XL” 2016 presentation slides
“From SKOS over SKOS-XL to Custom Ontologies” 2016 webinar video and slides
“What SKOS-XL adds to SKOS” 2011 blog post by Bob DuCharme

Sunday, July 31, 2022

Taxonomy Challenges Discussed at SLA Conference

When it comes to conferences dealing with the subject of taxonomy creation, implementation, and maintenance, without a doubt Taxonomy Boot Camp and Taxonomy Boot Camp London are by far the best conferences for their content, speakers, and networking opportunities. However, there are other conferences that have sessions on taxonomies.

The annual conference of the Special Libraries Association (SLA) usually has multiple taxonomy-related sessions. This year, July 31 - August 2 in Charlotte, NC, the first in-person conference in three years, was no exception.

Thanks to the volunteer programming efforts of SLA’s Taxonomy Community (one of over 20 specialized topic groups, formerly called “Divisions"), the annual conference is able to include multiple taxonomy sessions, some of which bring together multiple speakers, either co-presenting a single talk or coming together. Even sessions not organized by the Taxonomy Community may include taxonomy topics, such as those dealing with knowledge management, information architecture, or research that uses a taxonomy. A Taxonomy Community networking event is also regularly part of the SLA conference.

This year’s conference is hybrid, so some of the taxonomy sessions are in-person, and some are pre-recorded and available on-demand. Live-streaming was also done for keynotes and some sessions. The following are the in-person taxonomy sessions at the SLA 2022 conference:

“The Role of DEI in Taxonomy Development, Maintenance, Search, and Retrieval,” presented by Marisa Hughes. (This presentation on a popular topic was additionally live-streamed and pre-recorded for on-demand viewing.)

“Current Challenges and Advanced Taxonomy Topics” panel comprising Marisa Hughes, Heather Kotula, John Bertland, and myself.

“Research Sources and Methodologies for Taxonomy Development,” jointly presented by Marisa Hughes and myself.

The following are pre-recorded, on-demand only taxonomy sessions:

“There ain’t no Sanity Clause: Taxonomy and Data Analysis” presented by Michele Lamorte

“Metadata Governance” presented by John Horodyski

Conference session on diversity, equity, and inclusion in taxonomies

Diversity, Equity & Inclusion (DEI) is a growing area of interest in information management/sharing and content creation. Marisa Hughes, the taxonomist who edits the APA Thesaurus of Psychology Index Terms explained the challenges of revising the thesaurus terms to reflect DEI, for which she gave the following definitions:

Diversity: “The vast range of differences among individuals and groups.”
Equity: “The contain of being fair and impartial”
Inclusion: “Welcoming and respecting diverse individuals and Groups. Diversity in practice.

She has been reviewing thousands of terms for accuracy, currency, inclusivity, avoidance of bias, stereotypes, or discrimination. Areas that this DEI review has focused on are:

Racial, ethnic, and cultural identity
Gender diversity and sexual orientation
Age, disability status, and socioeconomic class bias

In the area of disability status, for example, the term should focus on the disability and not the person. Thus, “Hearing impaired” is changed to “Person with hearing loss”; and “Mentally ill” is changed to “Individual with a mental illness.”

Marisa Hughes presenting “The Role of DEI in Taxonomy”

Additional challenges include taking the hierarchical relationships, term usage, and change management. If users can see hierarchical relationships, even if not the full hierarchy, these relationships need to be appropriate. For example, certain personal conditions and behaviors should not be narrower to the term “Disorders.” Term frequency of usage (also called “literary warrant”) is important, but the larger goal is to have respectful terms. Change management involves care that the term changes to not impact search and retrieval. Marissa oversees the large job of reindexing content with new terms, and adding change notes or history notes to changes terms.

Conference panel on current taxonomy challenges

In this session, the four panelists each gave brief opening talks, then were asked questions by the moderator, Judith Theodori, and then it was opened up for general Q&A and discussion with the audience.

I presented on the themes of challenges which came from 138 taxonomist survey responses to the question "What are the pain points or challenges in your taxonomy work?" The leading trends in the responses were:

Achieving stakeholder understanding and buy-in
Competing interests, expectations, and requests
Organizational challenges
Tools and technology inadequacies or not integrated

John Bertland, Digital Librarian and Content Specialist at the Presidio Trust spoke of the taxonomy challenges in his organization including governance at the time organizational change and funding. A specific challenge is expanding and adapting a taxonomy that was originally just for digital asset management to include the content of the intranet.

“Current Taxonomy Challenges” panelists Marisa Hughes,
John Bertland, Heather Kotula, and Heather Hedden

Marisa Hughes, Taxonomist at the American Psychological Association, related the challenge of having to quickly come up with all the COVID related taxonomy in time for the usual thesaurus update scheduled in April 2020. This involved a lot of research on literature that was still rather lacking on the subject.

Another challenging project was to determine the role of historical data in the vocabulary of 3500 terms for the period of 1967 to 1973, which involved removing offensive terms. It was a judgement call of whether to continue to use a potentially offensive term as a non preferred term (alternative label) or not. Heather Kotula, VP, Marketing and Communications of Access Innovations, Inc., the fourth panelist, also discussed the same subject of excluding pejorative terms, referred to “semantic censorship.” In the end it was concluded that often pejorative terms are actually not that much in use in the documents being tagged.

Sunday, June 26, 2022

SKOS Taxonomies

Over the 26 years that I have been involved in controlled vocabularies, thesauri, and taxonomies, the biggest change I have seen in the field is the adoption of SKOS (Simple Knowledge Organization System) as a schema model and standard.

If you are creating taxonomies exclusively within a single system (such as the SharePoint Term Store or controlled tags or categories of a content management system, documentation management system, DAM, etc.), then you probably have not paid much attention to SKOS. It’s true that taxonomies created within and used within a single system, do not have to follow an external standard. But that is not the trend of information management and technology anymore. Connectivity, interoperability, data sharing and reuse, data-centric architecture, vendor-neutral formats, linked data and linked open data, breaking down data silos, enterprise-wide knowledge, and enterprise knowledge graphs have become the preferred trends and directions.

Different Kinds of Standard

With respect to standards, there exist two basic kinds: (1) standards for design, functionality, and a consistent user experience, and (2) standards for compatibility, interoperability, and machine-readability. For this reason, there are two separate sets of standards for taxonomies and other knowledge organization systems. Another way to think of it is that there are standards for each the front end (user interface and experience) and the back end (computer-readable code) of taxonomies, and they are somewhat independent yet still compatible with each other.

For taxonomies and thesauri, more has been written about the front-end design and best practice standards than the back-end interoperability standards. This is for several reasons. The design and best practices standards (ANSI/NISO Z39.19 and ISO 25964 and its predecessors ISO 2788 and ISO 5964), have been around longer. They are lengthier and more detailed than interoperability standards, and they apply to taxonomies and thesauri regardless of their digital or nondigital format. So, this article will focus instead on the back-end, interoperability standard, which is SKOS.

SKOS Background

SKOS is a recommendation for "a common data model for sharing and linking knowledge organization systems via the Semantic Web". These knowledge organization systems include thesauri (as defined by the ANSI/NISO and ISO thesaurus standards), taxonomies, classification schemes, subject heading systems, and other controlled vocabularies. SKOS is based on RDF (Resource Description Framework), a World Wide Web Consortium (W3C) standard for description and exchange of graph data. RDF specifies that all statements consist of subject-predicate-object triples, and all resources have URIs (uniform resource identifiers).

The development of SKOS aimed to build upon RDF to provide a recommended schema for thesauri. SKOS development was first undertaken as the Semantic Web Advanced Development for Europe (SWAD-Europe) project before being adopted and supported by the W3C in 2004. The W3C formally released the SKOS recommendation in 2009.

Meanwhile, the W3C had been working on other recommendations for web-based ontologies, including RDF Schema (RDFS) and Web Ontology Language(OWL). SKOS is compatible with RDFS and OWL, and elements from the different models can be combined. Furthermore, SKOS can even be considered as a very generic upper ontology itself, and the W3C documentation describes SKOS in terms of OWL and RDFS expressions.

The main types of elements of SKOS are concepts, lexical labels, documentation properties (notes), semantic relationships, mapping properties, and concept collections. (Concepts, concept schemes, and collections are ontology classes, and the others are ontology properties.) In their machine-readable form, the SKOS elements are concatenated with no spaces, such as preLabel, scopeNote, and exactMatch.

SKOS Concepts, Labels, and Notes

SKOS is concept-centric. Making a distinction between concepts and labels is the biggest departure from traditional thesaurus standards and past controlled vocabulary practice. A concept is an idea of something, and a label is a name for that idea. Thus, a concept may have multiple labels. For the organization of a vocabulary, especially as a hierarchy, one of the various labels needs to be designated as the preferred displayed label. The others are alternative labels and its sub-type, hidden label, which may be used to designate that the label should not display to end-users. Labels for the same concept may exist in multiple languages, but there may be only one preferred label per language.

Notation is intended for use as an appending part of a label, such as an alpha-numeric code, which is commonly used in classification schemes.

Documentation comprises various types of notes, including scope note, editorial note, change note, and history note. Definition and example are additional documentation types. Scope notes are commonly used in thesauri to clarify the usage of a concept in tagging/indexing for the specific context of controlled vocabulary and its set of content. They serve an important role for manual tagging. Other note types may be utilized for administration and management of the controlled vocabulary. Definitions may be entered for more technical controlled vocabularies or when the controlled vocabulary also serve the function of a glossary.

SKOS Concept Schemes and Collections

What constitutes an individual "taxonomy," "thesaurus" or other controlled vocabulary? This may not be very clear. SKOS introduces the formal organizing unit called a concept scheme, as a “collection of concepts.” A concept scheme is a single controlled vocabulary, thesaurus, hierarchical taxonomy, facet within a faceted taxonomy, or metadata property within a larger metadata schema.

There are some advanced, lesser used features of SKOS, including in scheme, which allows you to control whether a concept is in a concept scheme regardless of whether it’s within the concept scheme’s hierarchy (which is otherwise the default). There is also a special designation of top concept for the top concepts of a concept scheme, a designation which could be utilized for a front-end display implementation.

Collections are an additional optional way to designate a grouping of concepts for a purpose, such as the taxonomy concepts to be used in only specified implementations or those of subject categories for subject matter expert review. Furthermore, concepts can be ordered within collections.

SKOS Relations and Mapping Properties

SKOS includes what are called semantic relations, although this name could cause confusion, since they are the basic thesaurus relationships (broader, narrower, and related), not customizable semantic relations characteristic of ontologies. These thesaural-type relationships are used between concepts within the same concept scheme. In addition, SKOS specifies broader transitive and narrower transitive, meaning the inheritance of the relationship to additional levels of the hierarchy. This is usually assumed to be the case by default, and thus these specifically transitive relations are rarely implemented, but if there are reasons not to inherit and extend the logical hierarchy by default, then the transitive relations may be used. (I have not come across a use case, though.)

Since SKOS specifies concept schemes, SKOS also specifies an additional set of relation types called mapping properties that are to be used between concepts in different concept schemes or different taxonomies. These comprise exact match, close match, narrower match, broader match, and related match. Exact match and close match are used to map existing taxonomies together, often so that one is used in the tagging and the other is used in the retrieval. The other mapping relations may be used to extend one taxonomy with another while still maintaining a distinction between the two.

Following is a table of SKOS elements by type (class or property) with the concatenated machine-readable forms.

Implementation of SKOS

Most commercial and open-source taxonomy/thesaurus management software now supports SKOS. There are also simple free tools called SKOS editors. SKOS elements are presented in their full human readable names (such as Preferred Label, instead of prefLabel), so it is intuitive to understand. Thus, taxonomists don’t have to worry about SKOS, but should at least be familiar with its principles. Familiarity with SKOS makes it easier to switch from using one software package to another. Software may vary, however, in how well they support some of the less common features, such as in scheme, collections, and broader/narrower transitive.

Taxonomy/thesaurus management software often has the additional administrative grouping of related concept schemes for the same implementation into what may be called a “project” or “knowledge model.” SKOS mapping relations tend to be used more often across concept schemes that are managed in different projects, rather than within the same project. Within the same project, concept schemes tend to represent facets (which have no relations between them) or ontology classes (which have customized semantic relations between them).

Since all elements of SKOS are standard machine-readable, you can leverage any element with rules for usage, such as for how tagging should be done and how concepts and relationships are displayed. Custom applications of SKOS vocabularies are thus common.

If you want to dive into all the details of SKOS, consult these resources from the W3C:

SKOS is intended to be flexible, and it is more suggestive than restrictive. Thus, a SKOS-based taxonomy or thesaurus could still be poorly designed, and that’s why the other standards for best practices, ANSI/NISO Z39.19 and ISO 25964 are also important.

Tuesday, May 31, 2022

A Taxonomist Community

Taxonomists and others whose work involves taxonomies have not been a unified professional community. Taxonomy development work is interdisciplinary, spanning different specializations, and different organizational functions, including the following:

Information services taxonomies and thesauri, developed by those with a background in library/information science, thesauri, and cataloging, and possibly indexing
Product/ecommerce taxonomies, that may be developed by those with varied backgrounds but experience in retail and product information management
Digital asset management taxonomies and metadata, developed by digital asset managers and others, who might have a background in image and media curation and management
Website taxonomies developed by information architects with a focus on the user experience
Enterprise taxonomies developed by those who are primarily knowledge managers but have also learned about taxonomies
Taxonomies for auto-categorization of large volumes of text, developed by those with expertise in natural language processing, machine learning, and other text analytics technologies
Taxonomies, as controlled vocabularies, to support metadata and master data management, developed by metadata architects, data managers, and possibly data scientists
Taxonomies in support of knowledge graphs, integrated with ontologies, developed by ontologists and other experts in semantic technologies

Thus, people who work with taxonomies, accidental taxonomists and others, associate themselves with different professions and belong to different groups or professional organizations. These include

Special Libraries Association (SLA) and its Taxonomies Community. The next annual SLA conference will be held July 31 – August 2 in Charlotte, NC.
International Society for Knowledge Organization (ISKO) and its various country/region chapters, such as ISKO UK. The 17th international ISKO conference will be held at Aalborg University, Denmark, July 6-8, 2022.
AIIM (Association of Information and Image Management International)
American Society for Indexing and its Taxonomies & Controlled Vocabularies Special Interest Group
CILIP (Chartered Institute for Library and Information Professionals)(UK) and its Knowledge and Information Management Group
Association for Information Science & Technology (ASIS&T)
Knowledge Graph Conference (KGC)'s associated community which has an ongoing Slack space for discussion that is open to all

For information architects, the Information Architecture Institute dissolved in 2019 after 17 years, and until now, information architects have temporarily been gathering on Discord servers associated with the virtual IAConference and the World IA Day conference, but these have been relatively inactive at other times of the year.

Discussion Groups

Taxonomists are thus dispersed among these groups and more. It does not make sense to create a new professional membership association for taxonomists, especially at this time when traditional professional membership associations are experiencing declining membership.

Thus, online discussion groups that do not require a paid professional association membership are a better option. The first taxonomy group, Taxonomy Community of Practice was started as a Yahoo group in 2004. It has become quite popular with over 1000 members posting questions and suggestions about taxonomies. However, Yahoo groups declined, and LinkedIn groups grew, so this group was migrated over to a LinkedIn group, later renamed Taxonomy and Ontology Community of Practice. The problem is that this group, as most LinkedIn groups, is less of a community of practice and more an announcement forum. People are reluctant to post basic questions, as it might indicate that they are not sufficiently knowledgeable. Another former Yahoo group which migrated to Groups.io is Controlled Vocabulary, which is focused on activities of metadata and controlled vocabulary development and tagging of digital assets, mostly images.

Communities Discussed at Conferences

The need for a community of practitioners, whether taxonomists, or related specialties, is something that has been raised at conferences.

At the most recent Information Architecture Conference (IAC), in April 2022, the co-presidents for World IA Day, Grace Lau and Andrea Rosenbusch, gave a talk “(Re)Architecting a Community” discussing their hopes and plans to transform World IA Day from merely a single day annual event to community.

At the most recent Knowledge Graph Conference (KGC), Katariina Kari led a brainstorming workshop “Building Ontologies and Knowledge Graphs,” as “a working group for publicly sharing best practices, stories, and the particularities of our craft of building ontologies and knowledge graphs,” seeking a “soundboard for ideas” that others could participate in.

The upcoming SLA conference will have a live panel session “Communities of Practice: Where Everybody Knows Your Name” on August 2, 2022, in which I will be one of the panel speakers.

A new Taxonomist Community: Taxonomy Talk

Out of conversations and research conducted by Grace Lau in leading up to her IAC talk on an information architecture community, Grace and I discussed in January the idea of an additionally dedicated taxonomist community. I then invited another taxonomist, Bob Kasenchek, to come up with ideas, including what to use for a free platform. Slack, as used by KGC, was dismissed, since the free version has limited data storage, and old messages get deleted. So, we decided to adopt Discord, as it has been used by the virtual IAC in 2021 and 2022. Taxonomy Talk was launched on April 12, and quickly gained sufficient members that they could contribute ideas and be polled for a name. On May 1 it was named Taxonomy Talk. A charter and mission are still in the works. There are several moderators, including Grace, Bob, and myself.

As of this writing Taxonomy Talk has just over 300 members. It has a number of dedicated subject “channels,” some of which are:

New-to-taxonomy
Looking-for-help
Conferences-events
Jobs-and-opportunities
Tools-for-thought
Learning-resources
Reference-resources
Best-practice
Ontologies
Standards
Vocabularies

New channels are created as requested, and we might decide to retire or merge low-use ones.

Discord supports features such as direct one-on-one chats and one-one or group video meetings. There are still features I have yet to learn.

So, if you are not yet in the Taxonomy Talk Community and want to join:

https://discord.com/invite/3qyMVYCAsw
(Please use your real name to promote networking. Some existing Discord users are continuing to use their Discord nicknames.)

Saturday, April 30, 2022

Polyhierarchy in Taxonomies

A defining characteristic of taxonomies is that terms/concepts are arranged in broader-narrower hierarchies, which may resemble tree structures. A limited number of top concepts each have narrower concepts, which in turn may have narrower concepts, etc., and the narrowest concepts at the bottom of the hierarchy are sometimes referred to as leaf nodes, as “leaf” extends the metaphor of “tree.” The tree model has its limits, though, because taxonomies may also have occasional cases of “polyhierarchy,” whereby a concept may have two or more broader concepts, instead of just one.

People who are new to taxonomies, however, might not consider polyhierarchies, because they tend to think of taxonomies as classification systems. Hierarchical information taxonomies have their origin in classification systems, such as the Linnean taxonomy of organisms, library classification systems, and industry classification systems. Classification systems, however, do not allow polyhierarchy within the system. Originally, classification systems were for physical things, such as books, which can belong in only one place, so there could be no polyhierarchy. Standard classification systems, such as industry classification systems, were developed by governmental, international, or nongovernmental organizations with a primary purpose of gathering and organizing statistical data about classes, and thus polyhierarchy is not permitted, as it would lead to double-counting of members of a class.

The primary purpose of hierarchy in a taxonomy is to provide guided browsing of topics to end-users, who may start out looking at broad categories and then drill down to find the narrowest concept of interest. Thus, polyhierarchy serves the same purpose. The idea is that different people will start at different points at the top of the hierarchy to arrive at the same concept of interest, which is tagged to the same content set. A polyhierarchy should be implemented if the concept’s relationship is correctly and inherently hierarchical in both of its cases. An example of a polyhierarchy is Educational software, which has both Software and Educational products as broader concepts. Educational software is a kind of software, fully included within Software, and Educational software is a kind of educational product, fully included within Educational products.

Taxonomy standards and polyhierarchy issues

Taxonomy/thesaurus standards (ANSI/NISO Z39.19 and ISO 25964) describe three kinds of hierarchical relationships--generic-specific, generic-instance, and whole-part,--and polyhierarchy may exist within any of these types. Polyhierarchy that combines different hierarchical types, however, can be problematic, so it is best to avoid mixing hierarchical relationship types. For example, the following polyhierarchy mixes different types:

Washington, DC

Broader: United States (whole-part)

Broader: Capital cities (generic-instance)

The reason to avoid creating a mixed type polyhierarchyis simply that the browsable hierarchy user experience can get compromised and potentially confusing. Extensive hierarchies with large numbers of narrower concept relationships would result. A hierarchical taxonomy tree should be designed with a dominant hierarchy design. An exception is a thesaurus, which is not designed so much for top-down browsing but for browsing from term to term. Mixing hierarchical types within a thesaurus is thus acceptable.

It is also recommended to avoid creating hierarchical relationships across different facets in a faceted taxonomy. This is because facets are designed to be mutually exclusively, so that concepts from multiple facets can be used in combination to limit/filter/refine a search. As such, facets are designed to be distinct aspects. There could be an occasional exception of polyhierarchy, though, but more than 2-3 polyhierarchies across an entire faceted taxonomy should be a cause for review.

With the wider adoption of the SKOS (Simple Knowledge OrganizationSystem) model for taxonomies and in taxonomy management systems, taxonomies are more commonly organized into concept schemes. A concept scheme can be represented as a facet in a faceted taxonomy, but it is not limited to use as a facet. Utilizing concept schemes, it makes sense to have separate concept schemes with different hierarchical types, some for generic-specific (for type, categories, topics), one or more for whole-part (geography, organizational structures), and some containing lists of instances (named entities). In this model, Washington, DC, would be narrower only to the United States in the whole-part hierarchical concept scheme for geographic places. It could also be linked to Capital cities, which is in a different concept scheme for place types, with a different kind of relationship (“related” or perhaps a semantic relationship from an ontology).

Although SKOS permits hierarchical relationships across different concept schemes, it is best practice not to do this but rather to create hierarchical relationships and polyhierarchies confined within a concept scheme, just as it is recommended not to have polyhierarchy across facets.

Additional polyhierarchy considerations

Polyhierarchy concerns concepts in the taxonomy, and it is not about objects, items, or assets that get tagged with taxonomy concepts, such as an individual publication, document, image, product record, etc. Each of these may get tagged with multiple taxonomy concepts, and as such may have multiple “classifications” and thus can appear as if they are in a polyhierarchy, if a frontend application displays tagged items as if they are leaf nodes in a taxonomy.

A polyhierarchy usually involves only two broader concepts, not more. Having more than two broader concepts is extremely rare. If you find yourself creating polyhierarchies of three or more multiple times in a taxonomy, check to make sure you are not doing something wrong with the hierarchy design.

Some content management systems, which have built-in taxonomy management and tagging features, do not support polyhierarchy. The best known is SharePoint with taxonomies managed in its Term Store feature. Taxonomy terms may be “reused” across Term Sets, but they are not permitted within a Term Set, where it is most suitable. See my past post, Polyhierarchy in the SharePoint Term Store, for more details