Wednesday, June 22, 2016

Taxonomies vs. Thesauri: Practical Implementations


The differences between taxonomies and thesauri and when to implement which has been a subject of previous presentations of mine and a previous blog post, Taxonomies vs. Thesauri. Most recently, a presentation of a case study of controlled vocabularies at Cengage Learning, which I gave at the “Taxonomy CafĂ©” session at the SLA annual conference this month, the post-presentation roundtable discussions got me thinking more about the differences in practical implementations.

To summarize the differences, while both taxonomies and thesauri have hierarchical relationships among their terms, in a taxonomy all terms are connected into a few large hierarchies with a limited number of top terms so as to serve top-down navigation or drilling-down of topics. While faceted taxonomies function differently, each facet label can be seen as a top term. Associative relationships (related terms) are a standard feature of thesauri but not of taxonomies. Synonyms/nonpreferred terms/alternate labels are required for thesauri, but could be optional in small taxonomies. Taxonomies serve browsing and drilling down by end users who are exploring topics, whereas thesauri serve users who search for (look up) a specific concept and then may following “use” (preferred term), broader, narrower, or related term links to find the best term. A taxonomy works well for a controlled vocabulary that is limited in scope and easily categorized into hierarchies, whereas a thesaurus works better for content and a set of terms that is not easily categorizable and does not have a limited scope.

In practice, I have found that taxonomies are useful for classifying products and services (such as in ecommerce), general enterprise document management, implementations in content management systems which support taxonomies, and all faceted or filtering implementations (SharePoint search, Endeca, and other post-search filtering enterprise search software). Thesauri, on the other hand, are more suitable for indexing and retrieval research literature (articles, white papers, conference presentations and proceedings, patents, etc.), whether commercially published or not.

Taxonomies are easier to create and often easier to implement than thesauri. They generally do not have associative (related term) relationships. In absence of associative relationships between terms and with the emphasis on creating large top-term hierarchies, the thesaurus standard (ANSI/NISO Z39.19) rules for hierarchical relationships do not always have to be strictly followed. The inclusion of synonyms/nonpreferred terms also tends to be less thorough in taxonomies than in thesauri. Thesauri, on the other hand, require greater expertise in the field of information/knowledge organization, particularly to distinguish between hierarchical and associative relationships and to create the optimal number of those relationships and the optimal number of nonpreferred terms. Taxonomies, whether hierarchical or faceted, also tend to be easy to understand and use, accommodated by out-of-the-box content management software, and easier to maintain (and could be maintained by subject matter experts instead of taxonomists). Therefore, if a taxonomy, rather than a thesaurus, will suffice, then it makes more sense to create and maintain a taxonomy.

Thesauri, on the other hand, are more appropriate for the indexing repositories of content for research because they do not restrict the inclusion of terms to established hierarchies. Any terms that represent a minimal threshold of content can be added, even if at first glance they may seem out of scope. For example, a term “Hot drinks” would not likely fit into a taxonomy on health/medicine, but the term would be desired for articles on research correlating the drinking of very hot beverages to esophageal cancer.  Thesauri allow for inclusion of terms that, in combination with other terms, can achieve a more nuanced meaning, which may be needed in the research and discovery of what is contained in a body of research literature.

Indeed, in practice, the majority of new controlled vocabularies that are being created are taxonomies, not thesauri, and in fact taxonomies are usually all that are needed. The new implementations tend to be of the kind that are suitable for taxonomies. New repositories of documents for research, on the other hand, while highly important to be indexed with thesauri, do not arise as frequently. More often, collections of documents for researching are already established and often already have thesauri. These thesauri do require the work of taxonomists to update and maintain them, though.