Tuesday, September 23, 2014

One or More Taxonomies


In the various definitions of taxonomy, one aspect of the definition that is often missing is what constitutes a single taxonomy (or thesaurus) versus multiple related taxonomies (or thesauri). If you hire a taxonomy consultant, they won’t tell you because they will defer to their client’s terminology. If you are designing a taxonomy/taxonomies for your own organization, however, this is often an issue of concern.

Hierarchies and other relationships

In simple hierarchical taxonomies, a single hierarchy could be a single taxonomy. Not all terms on the same subject, however, may fit neatly in one hierarchy while complying with ANSI/NISO hierarchical relationship guidelines. So, more often than not, a hierarchical taxonomy may have multiple top terms. For example, a taxonomy on health care might have top terms for hierarchies on conditions and diseases, diagnostic procedures, treatments, medical equipment and supplies. If for some reason you needed a single hierarchy, then you would bend the hierarchical-relationship rules to make such top terms narrower to the term that is the name of the taxonomy. Thus, whether there is one top term or multiple top terms, it is still considered one taxonomy.
Facets are a special case. Each facet consists of its own hierarchy of terms, or may even have multiple top-term hierarchies of similar-type terms on the same subject, and there are no relationships between terms in different facets. So, you might consider each facet to be a taxonomy. However, the facets are intended to be used only in combination, not in isolation. In fact, we often speak of a “faceted taxonomy,” implying a single taxonomy comprised of multiple facets. So, a single facet is not a taxonomy.

A more thesaurus-like structure, may have fewer large hierarchies and more smaller hierarchies with more numerous top terms, but it will also have associative relationships that link terms across hierarchies. So, a possible definition of a taxonomy or thesaurus is a set of terms where there is at least some kind of relationship between every term and at least one other in the same set. However, you could end up with a situation of just a couple of terms related to each other but none of them are related/linked to any other terms in the taxonomy. So, additional criteria are needed to define a single taxonomy as to include such terms.

Thus, at a minimum, a taxonomy comprises one or more hierarchies, but what about at a maximum? The question came up in my online course, in an assignment to create polyhierarchies, in which I suggest that the broader terms are from different hierarchies. A student asked: “Are the different hierarchies supposed to be within the same Taxonomy, or merely two different hierarchies from two different Taxonomies?” Generally, standard hierarchical and associative relationships do not transcend multiple taxonomies. An exception would be instance-type hierarchical relationships between topics in a taxonomy and named entities (proper nouns) maintained in a separate controlled vocabulary. Other types of relationships may link terms across multiple taxonomies, but they would likely be special-purpose relationships, such as equivalency mappings or translations.

Subject scope and purpose

In addition to considering the relationships between terms, another determining factor of what constitutes a single taxonomy is the subject area scope. One taxonomy is for one subject area, although that subject area could be very broad, especially if the taxonomy’s purpose is to support indexing of the topics in a daily national newspaper. More often, a taxonomy is more limited in scope, such as just technology topics or health topics.

Related to subject scope is how the taxonomy will be used in both indexing/tagging and retrieval. Generally, a single taxonomy is utilized in a single indexing/tagging method and with its own indexing policy. Policy, comprising both editorial style for terms and indexing rules, is often a defining factor for a single taxonomy. Different taxonomies will have different policies. For the end-user, a retrieval function is served by a single taxonomy, such as supporting a search function or providing a set of browse categories. If you want to enable multiple unrelated methods of retrieval (such as type-ahead for the search box, dynamic filtering facets, and a navigational browse), then you will need to create separate taxonomies for each. At a former employer I built taxonomies for SharePoint, and it turned out that I had to build three completely separate taxonomies: (1) the consistently labeled hierarchy of libraries and folders, (2) terms and their variants to support search with a third-party auto-classification tool, and (3) controlled vocabularies of terms for consistent tagging and metadata management of uploaded documents.

There is also the question of whether the content to be accessed by the taxonomy is together in one set or separated out for different purposes or different audiences. A taxonomy should be designed to suit its own content. This was the case in a current project I am working on. There are two distinct sets of content available on a web site. The content sets have many similarities, so could be browsed via the same one hierarchical taxonomy, but they are for potentially different audiences. If the content set were to remain separate, we would have created two separate taxonomies, each customized to best suit its own set of content. But the site owners decided that the two sets of content would be presented together, “blended,” to cross-sell content, in addition to standing on their own elsewhere on the site. Thus, a single taxonomy was the chosen option. The use of two content categories for terms within the taxonomy will enable the additional, separate content set option.

Conclusions


In sum, a single taxonomy:

  • Has standard relationships (BT/NT, RT, USE/UF) confined within it. Cross-taxonomy links, if any, are of non-standard types.
  • Has a defined, restricted subject scope.
  • Has its own indexing/tagging policy.
  • Could function in isolation, unlike a single facet (although may be supplemented by other controlled vocabularies/metadata).
  • Has its own implementation, function, and purpose (although taxonomies can be reused and repurposed).

It’s important for a taxonomist to determine what constitutes a single taxonomy versus multiple taxonomies, not so much for communicating with stakeholders, but rather to plan the initial design of the taxonomy within a taxonomy management tool. Taxonomy/thesaurus software allows for the designation of one or more taxonomies/thesauri that may be linked to each other or not. The use of multiple so-called files, thesauri, vocabularies, objects, classes, categories, etc. are different ways that the various software tools allow the taxonomist to control the divisions between and within taxonomies.