Sunday, April 23, 2017

Taxonomy Term Specificity

One of the challenges in creating or editing taxonomies is determining how specific the terms should be. This is a key issue in making a taxonomy customized for a certain implementation, which involves a unique set of content to be tagged/indexed and a certain set of users. Highly specific terms tend to be the consequence of deeper hierarchies. So, the decision of how specific the terms should be is also related to the decision of how many hierarchical levels of depth the taxonomy should be. Taxonomies that are organized into multiple facets, on the other hand, tend to have more limited hierarchy, if any, and terms that are not so specific.

Having taxonomy terms that are more specific than necessary inevitably means that there are more taxonomy terms than necessary. The larger taxonomy is more difficult to maintain both in currency and consistency. Terms that are more specific than necessary are also likely to be more specific than expected by the users and might get overlooked and not even used. If the taxonomy is searched, the users will not likely search for such highly specific terms. If the taxonomy is browsed, the users might stop at a higher-level broader term and be satisfied with that. Furthermore, users like to retrieve multiple results (content items or references) for a single search term, so that they can browse the list and evaluate what they want. Highly specific terms will match fewer content items, so retrieved results could comprise only one or two items per taxonomy term, which may not satisfy most users. Having a greater number of more specific terms can also lead to more inconsistency in the indexing/tagging, whether manual or automated. 

Having taxonomy terms that are not specific enough means that each taxonomy term is indexed to a relatively large number of content items, and the users may have to scroll through multiple screens of returned results and look at multiple items to find what they really want. The availability of additional filters or facets can help limit the results, though. Having terms that are not specific enough also makes it more difficult for users to “discover” potential related topics of interest, whether the terms have “related-term”/”see also” relationships between them or whether “related” terms are suggested by shared tagged occurrence among content items.

Taxonomists sometimes refer to term specificity as “granularity” or a taxonomy being “granular.” There is the irony that, although the scope and meaning of specific terms is granular/narrow/small, the terms themselves are not small. The “granular” terms tend to be longer, more complex, multi-word terms. If combining multiple concepts into a single term, such terms might also be called "pre-coordinated" terms. Following are examples of specific, granular taxonomy terms from different specialized taxonomies:

  • Possessed object access systems (in an information technology taxonomy)
  • Fingerstick blood sugar testing (in a health care taxonomy)
  • Standard manufacturing overhead cost (in a business taxonomy)

The taxonomist typically creates specific/granular terms, based on the concepts of sample content to be tagged. There may be a document with the phrase in the title, an image with the phrase in its caption, a product with this description as its type, a department with the phrase in its name, etc. Obviously, source phrases would need to be edited to become well-formed taxonomy terms, but they may still be multi-word, complex terms. Creating a taxonomy from scratch usually involves a combination of a top-down and bottom-up approach in the development of terms and hierarchical relationships. The specific/granular terms are the result of the bottom-up component of taxonomy development.

Taxonomies available for license might be appropriate in their subject area and scope, but chances are that their terms get either too specific or not specific enough for different implementations. Thus, if you choose to license a taxonomy, make sure your license allows you to customize the taxonomy so that you can either delete terms that are too specific or add more terms, as narrower terms to existing terms, that are more specific to suit your content

Creating or deleting specific terms is also part of periodic taxonomy maintenance. If a term, which has no narrower terms, is heavily used in indexing, it might be time to “break it up” be creating a few more specific, narrower terms so that the large content set is indexed and retrieved with more specific terms for more manageable result numbers. If, over a period of time, a specific terms has been applied in indexing very few times, or not at all, it should probably be deleted. The deleted term can be changed to a variant/nonpreferred term/alternative label for an existing broader concept. The specificity of a taxonomy should match the specificity of the content being tagged with it, and this can change over time.