The differences between taxonomies
and thesauri and when to implement which has been a subject of previous
presentations of mine and a previous blog post, Taxonomies vs. Thesauri. Most recently, a presentation of a case study
of controlled vocabularies at Cengage Learning, which I gave at the “Taxonomy
CafĂ©” session at the SLA annual conference this month, the post-presentation
roundtable discussions got me thinking more about the differences in practical
implementations.
To summarize the differences, while both taxonomies and
thesauri have hierarchical relationships among their terms, in a taxonomy all
terms are connected into a few large hierarchies with a limited number of top
terms so as to serve top-down navigation or drilling-down of topics. While
faceted taxonomies function differently, each facet label can be seen as a top
term. Associative relationships (related terms) are a standard feature of
thesauri but not of taxonomies. Synonyms/nonpreferred terms/alternate labels are
required for thesauri, but could be optional in small taxonomies. Taxonomies
serve browsing and drilling down by end users who are exploring topics, whereas
thesauri serve users who search for (look up) a specific concept and then may
following “use” (preferred term), broader, narrower, or related term links to
find the best term. A taxonomy works well for a controlled vocabulary that is
limited in scope and easily categorized into hierarchies, whereas a thesaurus
works better for content and a set of terms that is not easily categorizable
and does not have a limited scope.
In practice, I have found that taxonomies are useful for classifying
products and services (such as in ecommerce), general enterprise document
management, implementations in content management systems which support
taxonomies, and all faceted or filtering implementations (SharePoint search,
Endeca, and other post-search filtering enterprise search software). Thesauri,
on the other hand, are more suitable for indexing and retrieval research
literature (articles, white papers, conference presentations and proceedings,
patents, etc.), whether commercially published or not.
Taxonomies are easier to create and often easier to
implement than thesauri. They generally do not have associative (related term)
relationships. In absence of associative relationships between terms and with
the emphasis on creating large top-term hierarchies, the thesaurus standard
(ANSI/NISO Z39.19) rules for hierarchical relationships do not always have to
be strictly followed. The inclusion of synonyms/nonpreferred terms also tends
to be less thorough in taxonomies than in thesauri. Thesauri, on the other
hand, require greater expertise in the field of information/knowledge
organization, particularly to distinguish between hierarchical and associative
relationships and to create the optimal number of those relationships and the
optimal number of nonpreferred terms. Taxonomies, whether hierarchical or
faceted, also tend to be easy to understand and use, accommodated by out-of-the-box
content management software, and easier to maintain (and could be maintained by
subject matter experts instead of taxonomists). Therefore, if a taxonomy,
rather than a thesaurus, will suffice, then it makes more sense to create and
maintain a taxonomy.
Thesauri, on the other hand, are more appropriate for the
indexing repositories of content for research because they do not restrict the
inclusion of terms to established hierarchies. Any terms that represent a
minimal threshold of content can be added, even if at first glance they may
seem out of scope. For example, a term “Hot drinks” would not likely fit into a
taxonomy on health/medicine, but the term would be desired for articles on
research correlating the drinking of very hot beverages to esophageal
cancer. Thesauri allow for inclusion of
terms that, in combination with other terms, can achieve a more nuanced
meaning, which may be needed in the research and discovery of what is contained
in a body of research literature.
Indeed, in practice, the majority of new controlled
vocabularies that are being created are taxonomies, not thesauri, and in fact
taxonomies are usually all that are needed. The new implementations tend to be
of the kind that are suitable for taxonomies. New repositories of documents for
research, on the other hand, while highly important to be indexed with
thesauri, do not arise as frequently. More often, collections of documents for
researching are already established and often already have thesauri. These thesauri do require the work of taxonomists to update and maintain them, though.