Sunday, June 26, 2022

SKOS Taxonomies

Over the 26 years that I have been involved in controlled vocabularies, thesauri, and taxonomies, the biggest change I have seen in the field is the adoption of SKOS (Simple Knowledge Organization System) as a schema model and standard.

If you are creating taxonomies exclusively within a single system (such as SharePoint Term Store or controlled tags or categories of a content management system, documentation management system, DAM, etc.), then you probably have not paid much attention to SKOS. It’s true that taxonomies created within and used within a single system, do not have to follow an external standard. But that is not the trend of information management and technology anymore. Connectivity, interoperability, data sharing and reuse, data-centric architecture, vendor-neutral formats, linked data and linked open data, breaking down data silos, enterprise-wide knowledge, and enterprise knowledge graphs have become the preferred trends and directions.

With respect to standards, there exist two basic kinds: (1) standards for design, functionality, and a consistent user experience, and (2) standards for compatibility, interoperability, and machine-readability. For this reason, there are two separate sets of standards for taxonomies and other knowledge organization systems. Another way to think of it is that there are standards for each the front end (user interface and experience) and the back end (computer-readable code) of taxonomies, and they are somewhat independent yet still compatible with each other.

For taxonomies and thesauri, more has been written about the front-end design and best practice standards than the back-end interoperability standards. This is for several reasons. The design and best practices standards (ANSI/NISO Z39.19 and ISO 25964 and its predecessors ISO 2788 and ISO 5964), have been around longer. They are lengthier and more detailed than interoperability standards, and they apply to taxonomies and thesauri regardless of their digital or nondigital format. So, this article will focus instead on the back-end, interoperability standard, which is SKOS.

SKOS logo
SKOS Background

SKOS is a recommendation for "a common data model for sharing and linking knowledge organization systems via the Semantic Web". These knowledge organization systems include thesauri (as defined by the ANSI/NISO and ISO thesaurus standards), taxonomies, classification schemes, subject heading systems, and other controlled vocabularies. SKOS is based on RDF (Resource Description Framework), a World Wide Web Consortium (W3C) standard for description and exchange of graph data. RDF specifies that all statements consist of subject-predicate-object triples, and all resources have URIs (uniform resource identifiers).

The development of SKOS aimed to  build upon RDF to provide a recommended schema for thesauri.  SKOS development was first undertaken as the Semantic Web Advanced Development for Europe (SWAD-Europe) project before being adopted and supported by the W3C in 2004. The W3C formally released the SKOS recommendation in 2009.

Meanwhile, the W3C had been working on other recommendations for web-based ontologies, including RDF Schema (RDFS) and Web Ontology Language(OWL). SKOS is compatible with RDFS and OWL, and elements from the different models can be combined. Furthermore, SKOS can even be considered as a very generic upper ontology itself, and the W3C documentation describes SKOS in terms of OWL and RDFS expressions.

The main types of elements of SKOS are concepts, lexical labels, documentation properties (notes), semantic relationships, mapping properties, and concept collections. (Concepts, concept schemes, and collections are ontology classes, and the others are ontology properties.) In their machine-readable form, the SKOS elements are concatenated with no spaces, such as preLabel, scopeNote, and exactMatch.

SKOS Concepts, Labels, and Notes

SKOS is concept-centric. Making a distinction between concepts and labels is the biggest departure from traditional thesaurus standards and past controlled vocabulary practice. A concept is an idea of something, and a label is a name for that idea. Thus, a concept may have multiple labels. For the organization of a vocabulary, especially as a hierarchy, one of the various labels needs to be designated as the preferred displayed label. The others are alternative labels and its sub-type, hidden label, which may be used to designate that the label should not display to end-users. Labels for the same concept may exist in multiple languages, but there may be only one preferred label per language.

Notation is intended for use as an appending part of a label, such as an alpha-numeric code, which is commonly used in classification schemes.

Documentation comprises various types of notes, including scope note, editorial note, change note, and history note. Definition and example are additional documentation types. Scope notes are commonly used in thesauri to clarify the usage of a concept in tagging/indexing for the specific context of controlled vocabulary and its set of content. They serve an important role for manual tagging. Other note types may be utilized for administration and management of the controlled vocabulary. Definitions may be entered for more technical controlled vocabularies or when the controlled vocabulary also serve the function of a glossary.

SKOS Concept Schemes and Collections

What constitutes an individual "taxonomy," "thesaurus" or other controlled vocabulary? This may not be very clear. SKOS introduces the formal organizing unit called a concept scheme, as a “collection of concepts.” A concept scheme is a single controlled vocabulary, thesaurus, hierarchical taxonomy, facet within a faceted taxonomy, or metadata property within a larger metadata schema.

There are some advanced, lesser used features of SKOS, including in scheme, which allows you to control whether a concept is in a concept scheme regardless of whether it’s within the concept scheme’s hierarchy (which is otherwise the default). There is also a special designation of top concept for the top concepts of a concept scheme, a designation which could be utilized for a front-end display implementation.

Collections are an additional optional way to designate a grouping of concepts for a purpose, such as the taxonomy concepts to be used in only specified implementations or those of subject categories for subject matter expert review. Furthermore, concepts can be ordered within collections.

SKOS Relations and Mapping Properties

SKOS includes what are called semantic relations, although this name could cause confusion, since they are the basic thesaurus relationships (broader, narrower, and related), not customizable semantic relations characteristic of ontologies. These thesaural-type relationships are used between concepts within the same concept scheme. In addition, SKOS specifies broader transitive and narrower transitive, meaning the inheritance of the relationship to additional levels of the hierarchy. This is usually assumed to be the case by default, and thus these specifically transitive relations are rarely implemented, but if there are reasons not to inherit and extend the logical hierarchy by default, then the transitive relations may be used. (I have not come across a use case, though.)

Since SKOS specifies concept schemes, SKOS also specifies an additional set of relation types called mapping properties that are to be used between concepts in different concept schemes or different taxonomies.  These comprise exact match, close match, narrower match, broader match, and related match. Exact match and close match are used to map existing taxonomies together, often so that one is used in the tagging and the other is used in the retrieval. The other mapping relations may be used to extend one taxonomy with another while still maintaining a distinction between the two.

Following is a table of SKOS elements by type (class or property) with the concatenated machine-readable forms.

Implementation of SKOS

Most commercial and open-source taxonomy/thesaurus management software now supports SKOS. There are also simple free tools called SKOS editors. SKOS elements are presented in their full human readable names (such as Preferred Label, instead of prefLabel), so it is intuitive to understand. Thus, taxonomists don’t have to worry about SKOS, but should at least be familiar with its principles. Familiarity with SKOS makes it easier to switch from using one software package to another. Software may vary, however, in how well they support some of the less common features, such as in scheme, collections, and broader/narrower transitive.

Taxonomy/thesaurus management software often has the additional administrative grouping of related concept schemes for the same implementation into what may be called a “project” or “knowledge model.” SKOS mapping relations tend to be used more often across concept schemes that are managed in different projects, rather than within the same project. Within the same project, concept schemes tend to represent facets (which have no relations between them) or ontology classes (which have customized semantic relations between them).

Since all elements of SKOS are standard machine-readable, you can leverage any element with rules for usage, such as for how tagging should be done and how concepts and relationships are displayed. Custom applications of SKOS vocabularies are thus common.

If you want to dive into all the details of SKOS, consult these resources from the W3C:

SKOS is intended to be flexible, and it is more suggestive than restrictive. Thus, a SKOS-based taxonomy or thesaurus could still be poorly designed, and that’s why the other standards for best practices, ANSI/NISO Z39.19 and ISO 25964 are also important.

Tuesday, May 31, 2022

A Taxonomist Community

Taxonomists and others whose work involves taxonomies have not been a unified professional community. Taxonomy development work is interdisciplinary, spanning different specializations, and different organizational functions, including the following:

  • Information services taxonomies and thesauri, developed by those with a background in library/information science, thesauri, and cataloging, and possibly indexing
  • Product/ecommerce taxonomies, that may be developed by those with varied backgrounds but experience in retail and product information management
  • Digital asset management taxonomies and metadata, developed by digital asset managers and others, who might have a background in image and media curation and management
  • Website taxonomies developed by information architects with a focus on the user experience
  • Enterprise taxonomies developed by those who are primarily knowledge managers but have also learned about taxonomies
  • Taxonomies for auto-categorization of large volumes of text, developed by those with expertise in natural language processing, machine learning, and other text analytics technologies
  • Taxonomies, as controlled vocabularies, to support metadata and master data management, developed by metadata architects, data managers, and possibly data scientists
  • Taxonomies in support of knowledge graphs, integrated with ontologies, developed by ontologists and other experts in semantic technologies

Thus, people who work with taxonomies, accidental taxonomists and others, associate themselves with different professions and belong to different groups or professional organizations. These include 

For information architects, the Information Architecture Institute dissolved in 2019 after 17 years, and until now, information architects have temporarily been gathering on Discord servers associated with the virtual IAConference and the World IA Day conference, but these have been relatively inactive at other times of the year. 

Discussion Groups

Taxonomists are thus dispersed among these groups and more. It does not make sense to create a new professional membership association for taxonomists, especially at this time when traditional professional membership associations are experiencing declining membership.

Thus, online discussion groups that do not require a paid professional association membership are a better option. The first taxonomy group, Taxonomy Community of Practice was started as a Yahoo group in 2004. It has become quite popular with over 1000 members posting questions and suggestions about taxonomies. However, Yahoo groups declined, and LinkedIn groups grew, so this group was migrated over to a LinkedIn group, later renamed Taxonomy and Ontology Community of Practice. The problem is that this group, as most LinkedIn groups, is less of a community of practice and more an announcement forum.  People are reluctant to post basic questions, as it might indicate that they are not sufficiently knowledgeable. Another former Yahoo group which migrated to Groups.io is Controlled Vocabulary, which is focused on activities of metadata and controlled vocabulary development and tagging of digital assets, mostly images.

Communities Discussed at Conferences

The need for a community of practitioners, whether taxonomists, or related specialties, is something that has been raised at conferences. 

At the most recent  Information Architecture Conference (IAC), in April 2022, the co-presidents for World IA Day, Grace Lau and Andrea Rosenbusch, gave a talk “(Re)Architecting a Community discussing their hopes and plans to transform World IA Day from merely a single day annual event to community. 

At the most recent Knowledge Graph Conference (KGC), Katariina Kari led a  brainstorming workshop “Building Ontologies and Knowledge Graphs,” as  “a working group for publicly sharing best practices, stories, and the particularities of our craft of building ontologies and knowledge graphs,” seeking a “soundboard for ideas” that others could participate in. 

The upcoming SLA conference will have a live panel session “Communities of Practice: Where Everybody Knows Your Name” on August 2, 2022, in which I will be one of the panel speakers.

A new Taxonomist Community: Taxonomy Talk

Out of conversations and research conducted by Grace Lau in leading up to her IAC talk on an information architecture community, Grace and I discussed in January the idea of an additionally dedicated taxonomist community. I then invited another taxonomist, Bob Kasenchek, to come up with ideas, including what to use for a free platform. Slack, as used by KGC, was dismissed, since the free version has limited data storage, and old messages get deleted. So, we decided to adopt Discord, as it has been used by the virtual IAC in 2021 and 2022. Taxonomy Talk was launched on April 12, and quickly gained sufficient members that they could contribute ideas and be polled for a name. On May 1 it was named Taxonomy Talk. A charter and mission are still in the works. There are several moderators, including Grace, Bob, and myself. 

As of this writing Taxonomy Talk has just over 300 members. It has a number of dedicated subject “channels,” some of which are:

  • New-to-taxonomy
  • Looking-for-help
  • Conferences-events
  • Jobs-and-opportunities
  • Tools-for-thought
  • Learning-resources
  • Reference-resources
  • Best-practice
  • Ontologies
  • Standards
  • Vocabularies

New channels are created as requested, and we might decide to retire or merge low-use ones.

Discord supports features such as direct one-on-one chats and one-one or group video meetings. There are still features I have yet to learn. 

So, if you are not yet in the Taxonomy Talk Community and want to join:

https://discord.com/invite/3qyMVYCAsw
(Please use your real name to promote networking. Some existing Discord users are continuing to use their Discord nicknames.)

Saturday, April 30, 2022

Polyhierarchy in Taxonomies

A defining characteristic of taxonomies is that terms/concepts are arranged in broader-narrower hierarchies, which may resemble tree structures. A limited number of top concepts each have narrower concepts, which in turn may have narrower concepts, etc., and the narrowest concepts at the bottom of the hierarchy are sometimes referred to as leaf nodes, as “leaf” extends the metaphor of “tree.” The tree model has its limits, though, because taxonomies may also have occasional cases of “polyhierarchy,” whereby a concept may have two or more broader concepts.

 

People who are new to taxonomies, however, might not consider polyhierarchies, because they tend to think of taxonomies as classification systems. Hierarchical information taxonomies have their origin in classification systems, such as the Linnean taxonomy of organisms, library classification systems, and industry classification systems. Classification systems, however, do not allow polyhierarchy within the system. Originally, classification systems were for physical things, such as books, which can belong in only one place, so there could be no polyhierarchy. Standard classification systems, such as industry classification systems, were developed by governmental, international, or nongovernmental organizations with a primary purpose of gathering and organizing statistical data about classes, and thus polyhierarchy is not permitted, as it would lead to double-counting of members of a class.

 

The primary purpose of hierarchy in a taxonomy is to provide guided browsing of topics to end-users, who may start out looking at broad categories and then drill down to find the narrowest concept of interest. Thus, polyhierarchy serves the same purpose. The idea is that different people will start at different points at the top of the hierarchy to arrive at the same concept of interest, which is tagged to the same content set. A polyhierarchy should be implemented if the concept’s relationship is correctly and inherently hierarchical in both of its cases. An example of a polyhierarchy is Educational software, which has both Software and Educational products as broader concepts. Educational software is a kind of software, fully included within Software, and Educational software is a kind of educational product, fully included within Educational products.

 



 

Taxonomy standards and polyhierarchy issues

 

Taxonomy/thesaurus standards (ANSI/NISO Z39.19 and ISO 25964) describe three kinds of hierarchical relationships--generic-specific, generic-instance, and whole-part,--and polyhierarchy may exist within any of these types. Polyhierarchy that combines different hierarchical types, however, can be problematic, so it is best to avoid mixing hierarchical relationship types. For example, the following polyhierarchy mixes different types:

 

Washington, DC

Broader: United States (whole-part)

Broader: Capital cities (generic-instance)

 

The reason to avoid creating a mixed type polyhierarchyis simply that the browsable hierarchy user experience can get compromised and potentially confusing. Extensive hierarchies with large numbers of narrower concept relationships would result. A hierarchical taxonomy tree should be designed with a dominant hierarchy design. An exception is a thesaurus, which is not designed so much for top-down browsing but for browsing from term to term. Mixing hierarchical types within a thesaurus is thus acceptable.

 

It is also recommended to avoid creating hierarchical relationships across different facets in a faceted taxonomy. This is because facets are designed to be mutually exclusively, so that concepts from multiple facets can be used in combination to limit/filter/refine a search. As such, facets are designed to be distinct aspects. There could be an occasional exception of polyhierarchy, though, but more than 2-3 polyhierarchies across an entire faceted taxonomy should be a cause for review.

 

With the wider adoption of the SKOS (Simple Knowledge OrganizationSystem) model for taxonomies and in taxonomy management systems, taxonomies are more commonly organized into concept schemes. A concept scheme can be represented as a facet in a faceted taxonomy, but it is not limited to use as a facet. Utilizing concept schemes, it makes sense to have separate concept schemes with different hierarchical types, some for generic-specific (for type, categories, topics), one or more for whole-part (geography, organizational structures), and some containing lists of instances (named entities). In this model, Washington, DC, would be narrower only to the United States in the whole-part hierarchical concept scheme for geographic places. It could also be linked to Capital cities, which is in a different concept scheme for place types, with a different kind of relationship (“related” or perhaps a semantic relationship from an ontology).

 

Although SKOS permits hierarchical relationships across different concept schemes, it is best practice not to do this but rather to create hierarchical relationships and polyhierarchies confined within a concept scheme, just as it is recommended not to have polyhierarchy across facets.

 

Additional polyhierarchy considerations

Polyhierarchy concerns concepts in the taxonomy, and it is not about objects, items, or assets that get tagged with taxonomy concepts, such as an individual publication, document, image, product record, etc. Each of these may get tagged with multiple taxonomy concepts, and as such may have multiple “classifications” and thus can appear as if they are in a polyhierarchy, if a frontend application displays tagged items as if they are leaf nodes in a taxonomy.

A polyhierarchy usually involves only two broader concepts, not more. Having more than two broader concepts is extremely rare. If you find yourself creating polyhierarchies of three or more multiple times in a taxonomy, check to make sure you are not doing something wrong with the hierarchy design.

Some content management systems, which have built-in taxonomy management and tagging features, do not support polyhierarchy. The best known is SharePoint with taxonomies managed in its Term Store feature. Taxonomy terms may be “reused” across Term Sets, but they are not permitted within a Term Set, where it is most suitable. See my past post, Polyhierarchy in the SharePoint Term Store, for more details

Tuesday, March 22, 2022

Taxonomy Quotes

Taxonomies are very valuable, but not always easy to define, and they are described in various ways. They are also interdisciplinary, as taxonomies are developed by people in different fields for slightly different, yet similar purposes. I have heard various comments about taxonomies over the decades.

 

In the earlier years of the Taxonomy Community of Practice discussion group, a Yahoo group, which was the precursor of the current Taxonomy and Ontology Community of Practice LinkedIn group, the group’s moderator, Seth Earley, put out a call to the group’s members for a motto for the group. The winning quote, which became the group’s motto, was: “Taxonomies: That’s classified information,” by Jordan Cassel.

 

 

There were over a dozen other good suggestions for the motto which were posted in the group in January 2009. That turned out to be shortly before I wrote the first edition of my book, The Accidental Taxonomist, so, with permission, I took additional motto-quotes as opening headers to each of the 12 chapters of my book. The same quotes continued in publication of my second edition in 2016.

 

As I now am preparing a third edition (expected out in late fall 2022), I decided to refresh the chapter head quotes. Last month I put out a call for quotes in both the Taxonomy and Ontology Community of Practice LinkedIn group and in my own network. Some quotes were lengthier than before, as they were no longer submissions for a motto. I received far more submissions than I have chapters, and I have also decided to keep some of the original quotes (including the first one). Yet many of these quotes are quite thoughtful and/or clever, so I would like to share these new quotes here.

 

In true taxonomist fashion, I have categorized these quotes as about taxonomies, about taxonomy creation, about ontologies as compared to taxonomies, about taxonomies, and the a few particularly witty quotes at the end.

 

About taxonomies

 

Taxonomies: organizing the disorganized.
—June Tsang

 

Without Taxonomies; entropy!

—Hakan Strom

 

Ambiguity is the thief of Knowledge.

—Robert Vane

 

Good taxonomy is a love letter to the future.

—Gary Carlson

 

Taxonomies - organised, effective tagging. 

—Alison Jones

 

Taxonomy: Levels in the Playing Field

—Merridy Cox (Bradley)

 

Knowledge organisation, search, and use combine to enable us to navigate the workplace.

—Bill Proudfit

 

Your Taxonomy, like all metadata, is an expression of what's important to you and to the collection.

—Peter Krogh

 

Taxonomies are, first of all, an act of self discovery on how we understand the world.

—Andrea Splendiani

 

 

About taxonomy creation

 

Taxonomy: generalize or specify, that is the question.

—Fabiola Aparecida Vizentim

 

Taxonomy: The perfect mix of art and science.

—Mollee Marcus

 

Taxonomies: Normalizing to help you find, report and aggregate across data & content

—Rita M. Benitez

 

Regardless of domain, taxonomy is the science of sorting and labelling information so it can be retrieved for future use.

—Leah B.

 

Do your best to ignore even your most strongly held convictions. If you want to create a user-friendly taxonomy/ontology system, follow the data, not your heart.
—Rebecca B. Weiss

 

Successful data management requires a model-based architecture for operational efficiency, usability, and governance. Taxonomies extend these benefits to information and content.
—Vanessa Vavra-Laughlin

 

Taxonomy is such a great battleground to focus consistently on improving the user experience; it’s a first key activity to drive the user experience.

—Vellaichamy Shunmugavel

 

To ontologize or not to ontologize, that is the question you should ask yourself in the first place.

—Erick Antezana

 

 

 

About ontologies (or ontologies compared with taxonomies)

 

Taxonomies tell stories, ontologies create worlds.

—Fran Alexander

 

Taxonomies classify; ontologies reify.

—Beatrice Larentis

 

Ontology: generating knowledge by connecting the dots.

Taxonomy: is like a drawer organizer for kitchen cutlery.
—Brigita Perchutkaite Vollstedt

 

If a taxonomy is an elevator, an ontology is a Wonkavator!

—Caroline Coward

(Referencing Willy Wonka and the Chocolate Factory: like an elevator but also can go sideways and in all directions.)

 

Ontologies make the implications explicit.

—Michele Ann Jenkins

 

A good ontology maps the way out of chaosville.

—Mark Atkins

 

Ontologies: organizational substrate for your data, information, and know-how enzymes.

—Heather Fox

 

 

About taxonomists

 

—Meg Morrissey

I wanted to figure out my place in the world, so I hired a taxonomist.

 

Only when one’s data is all over the place is it discovered that a taxonomist is necessary.

—Rebecca Custis

 

Be the Taxonomy you want to see in the World!
— Elaine Chu

 

I say this categorically, taxonomists are an organized bunch.

Jordan Casell

 

Taxonomies: now you're where you belong.

—Alan S. Michaels

 


And the especially witty ones 😉

 

Ontology, Category, Property - Happy user will be! Try me, Find me, Surprise me :)

—Dorothee Balas

 

Year Make Model Engine Transmission Leather Navi Owners Accidents Miles Color: = my used-Taxi Taxonomy.

—Tony Mariella

 

Taxonomy is taxidermy for data -- mounted on a framework and stuffed for the purpose of display and study.

—Phil Taylor

 

Ontology: One graph to rule them all, one graph to find them, one graph to bring them all and in the semantic web bind them.
—Xeni Kechagioglou


I never metadata I didn't like

—Paul Belfanti

 

Taxonomy? Taxonoyou!

—Ron Cascella

 

Friday, February 4, 2022

Defining a Taxonomy’s Scope

In planning a taxonomy, I have often said that it is important at the beginning to define the taxonomy’s scope, specifically the subject area scope of the taxonomy’s terms, but without going into more detail. Recently I was asked by a client how to define a taxonomy’s scope. This is a good question. The taxonomy should be suited to the subject area scope of the content that will be tagged with the taxonomy and to the scope of the user’s expectations. Terms or topics only marginal to the subject scope, however, could occur in the content, and whether they should also be included in the taxonomy is a question. Ultimately, that should depend on whether user expectations justify it, as the needs of users should also be a factor in creating a taxonomy. A taxonomy should suit both its content and its users.

Sources for Taxonomy Terms

For content as a source of taxonomy terms, a combination of manual and automated approaches is recommended. By manually reviewing sample individual documents or content items, you can discern the main ideas and main topics, which should form the start and basic structure of the taxonomy and also help define its scope. Automated methods of extracting terms, through text analytics technologies, can bring in many additional terms from a much larger corpus of documents more quickly, picking up terms that a limited manual review would miss. Even though automated text analytics extracts terms based on relevancy and frequency of occurrence, such terms could be out of scope of the subject domain. That’s why it’s important to start first with a manual review of content to define the subject scope.  Then, when you enrich the taxonomy with automated extraction, you can approve terms that appear to be in scope or at least closely relevant and reject others. But should you reject all that are out of scope, even if they appear with sufficient frequency and relevancy? My advice is to try to assume the role of the user. Ask yourself: Might a user want to search for content on this term in this content collection?
 
For user needs and expectations as a contributing source of taxonomy terms, obtaining this information can be very direct, such as by creating a user questionnaire (at least for your internal users) that asks what the topics of importance are, how those users would define the scope, and what “marginal” topics would be acceptable for them to include. You could also request sample challenging (not expected, basic, typical) queries that the users would make.  Another good way to obtain input from the user side is to look at search query logs that list search strings that users have entered over a period of time, ranked by frequency. If a search phrase that is slightly out of scope of the subject occurs frequently, then the term should still be considered for inclusion in the taxonomy.

In either case, the scope of the subject gets better defined as the taxonomy is created. For example, a taxonomy for recipes may initially be scoped to comprise terms for the names of dishes, ingredients, and cooking method. But then a different term shows up significant frequency, “Nutrition Facts.” If it occurs in both the content and the user research, then it likely should be included.  If it shows up in the content only, but is not validated in user research, then it is more questionable.

Taxonomy Structure

The initial taxonomy structure itself tends to impose limits on scope. Taxonomies tend to be hierarchical with a limited number of top terms. If a candidate term appears in the content that does not seem to belong anywhere in the current taxonomic hierarchy, you might be inclined to exclude it. Factors of user needs (they might want to look up this term in this content), however, should take precedence. For example, the term “COVID-19” might be marginal but still of interest to be included many taxonomies on varied subjects, but there would exist no broader term for diseases in those taxonomies. Then adjustments need to be made, such as renaming or adding broader terms, or perhaps, more likely, the proposed term should be modified to fit the context of the taxonomy, such as becoming “COVID-19 impacts.”

Another thing to consider is adopting more a thesaurus structure than a taxonomy structure, at least for the facet or concept scheme of the taxonomy that is for miscellaneous “topics.” One characteristic of thesauri is to not rely so heavily on extensive hierarchical trees. What this means is that you could decide that it is acceptable that not all terms have broader terms and thus it’s OK to have a very large number of top terms, with the more specific terms linked to other terms only by related-term relationships, another feature of thesauri, if not by broader/narrower-term relationships. Abandoning the full hierarchical tree structure should only be considered if this hierarchy is not displayed as a navigation to the end users.

Documenting Policy

In any case, you need to define policies regarding what kinds of terms can be added and what kinds should not. This will evolve out of the activity of building the taxonomy, especially from evaluating what extracted terms to approve and what search log terms to approve. Whoever is doing this task (hopefully more than one person), should document each instance of uncertainty. While many term approvals and rejections will be obvious, there will be a gray area. This should be collected and discussed together, and then a policy can emerge.