Tuesday, January 28, 2014

Taxonomies vs. Thesauri

Two taxonomy consulting projects I worked on last year seemed to lend themselves more to the development of a thesaurus than a set of hierarchical taxonomies. But clients usually ask for a taxonomy and not a thesaurus.  Perhaps we need to ask what is in mind with the notion of a “taxonomy.” When someone wants a “taxonomy” developed, do they want a structured kind of controlled vocabulary to support consistent indexing/tagging and retrieval (the broad meaning of taxonomy), or do they specifically want a browse display of topics in a top-down navigation structure in a user interface (the narrower meaning of taxonomy)? The broad meaning of “taxonomy” includes thesauri, too. So, if you are looking for the former, maybe it is actually a thesaurus that you want.

In its broad meaning, “taxonomy” often refers to any of various kinds of controlled vocabularies: synonym rings to support search without being displayed (which a search vendor might call a “thesaurus”), hierarchical topic trees without synonyms, faceted taxonomies, and finally the more complex taxonomies that include all of hierarchical relationships, associative relationships, and synonyms. The latter is what may be called a thesaurus. In such a case, I would be asked for “a taxonomy with hierarchical relationships, associative relationships, and synonyms, and possibly term notes or definitions,” rather than “at thesaurus.” The word “taxonomy” has become the standard term of reference in the business, outside library applications.

The usual differentiating distinction between a strictly defined taxonomy (its narrower meaning) and a thesaurus is that a thesaurus has all the features of a taxonomy plus the addition of associative relationships. This is largely true, and I will add that a thesaurus also must have equivalence relationships (between a “preferred term” and its synonyms or nonpreferred terms), whereas synonyms/nonpreferred terms are merely optional in taxonomies, depending on the taxonomy size. Thesauri should also be built according to the standards of ANSI/NISO Z39.19 or ISO 25964, whereas taxonomies can be a little more flexible in their adherence to standards.

The extent of hierarchies

However, in my experience, I would say there is another very important distinction between a narrowly defined taxonomy and a thesaurus. A taxonomy has hierarchical relationships that bring in all of the terms/concepts into one or more (but a limited number) of hierarchical tree structures or facets. (We can consider a facet as a simple two-level hierarchy comprising the facet label and its narrower facet values.) Think of a taxonomy as supporting classification, categorization, and concept organization, with a basis in the Linnean taxonomy of animals and plants that is the most well-known meaning of “taxonomy.” The user typically enters a taxonomy from the top down.

In a thesaurus, by contrast, it is not necessary to structure all concepts (terms) into a limited number of top level hierarchies. A thesaurus focuses on terms and their immediate relationships with other terms. Hierarchical relationships between terms may result in extended hierarchies of various degrees, whether just two terms or more, but do not extend the depth of the entire taxonomy.  Thus, numerous isolated hierarchies could exist. What this means is that a top down hierarchical display of a thesaurus would not comprise simply a few equally sized hierarchies, but rather numerous hierarchies of varied sizes and specificities. “Top terms” are not all of the same equal weight, importance of generalness.  Therefore, while any thesaurus could be displayed hierarchically, it might not be desired to display hierarchically. Instead, the user might browse the terms of thesaurus alphabetically to select a term. A selected term will then indicate that term’s hierarchical relationships.

The idea of navigating without high-level hierarchies through which to drill down may seem odd, especially since hierarchy trees have become so common in website navigation. But there is no single right way to navigate. “Navigate” and “browse” are not synonymous with “drill down” through a hierarchy. Browsing could start out alphabetically and then jump from one term to the next via both hierarchical and associative relationships.

Blurred distinctions

You may have a hierarchical taxonomy with the additional thesaurus features of associative relationships, synonyms, scope notes for terms, etc., and then you can call it “a taxonomy with thesaurus features.” On the other hand, you may have a thesaurus that does in fact have an over-arching hierarchical structure, and you may call it “a thesaurus with a taxonomy structure.” Both of these kinds of “taxonomies” and “thesauri” would thus have essentially the same structure.

An organization might start calling its taxonomy a “thesaurus” if it chose to follow the terminology of its selected thesaurus software vendor. The following vendors, for example, call their products thesaurus management software and the results created as “thesauri”: Synaptica, Data Harmony, PoolParty, and MultiTes. Vendors have developed software that is full-featured, so not only can the software be used to create simple hierarchical taxonomies, but it also supports the full range of relationship types (hierarchical, associative, and equivalence) along with term notes, term attributes, and other maintenance tracking features. Thus, it is thesaurus management software that may be used for either thesauri or taxonomies or anything inbetween and other simpler types of controlled vocabularies.

Choosing the approach

The choice between adopting a hierarchical taxonomy vs. a thesaurus depend on the nature of the content and the users.
A hierarchical taxonomy would be fine if:
- The content is of a homogenous type that can be characterized by the same set of facets.
- The nature of the topics for the content falls neatly into a hierarchy.
- Users are not experts in the subjects and need to be guided by hierarchies.
A thesaurus would be more suitable if:
- Multiple, overlapping subject areas or domains are covered with diverse content.
- The terms need to be highly specific for detailed indexing.
- The topics do not lend themselves to neat hierarchies.
- Users are knowledgeable of the subject and will likely look for specific terms.