Friday, August 26, 2016

Synonyms, Alternate Labels, and Nonpreferred Terms

"Synonyms, Alternate Labels, and Nonpreferred Terms" is the title of my next conference presentation in October and in a different, briefer co-presented format as "How Many Synonyms Should You Have?" in November. So, now would be a good time to explore the topic in this blog.



"Synonyms" is the simple, nonexpert designation to the different names for the same term or concept in a taxonomy or other kind of controlled vocabulary. This is an over-simplification, for what may be involved is far more than just synonyms. Synonyms are words with the same meaning, but a taxonomy comprises terms that are typically phrases, often of two or three words, not just words. Furthermore, synonyms by definition have identical meaning, but in a taxonomy, we can have multiple names for a concept that are merely "close enough" in meaning to function as desired.

"Alternate labels" is a much better designation and is the nomenclature adopted by SKOS-compliant vocabularies. SKOS, which stands for Simply Knowledge Organization System, is a recommended standard of the World Wide Web Consortium for the application of the RDF (Resource Description Framework) interoperability format. Alternate labels refer to "concepts" which are known by their "preferred labels." You could certainly use the designation of "alternate labels" even if the controlled vocabulary or taxonomy is not SKOS compliant, and I have seen that sometimes.

"Nonpreferred terms" is the nomenclature of the thesaurus standard described in either ANSI/NISO Z39-19 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies or ISO 25964 Thesauri and interoperability with other vocabularies, Part 1: Thesauri for Information Retrieval. Trained taxonomists, especially those with a library science background, are most familiar and comfortable with this designation, but its meaning is obviously not as intuitive to non-taxonomists.

Although the aforementioned three designations are the most common, there are others out there. I have run into the use of the following: Aliases, Alternate terms, Cross-references, Entry terms, Equivalent terms, Keywords, Nondescriptors, Non-postable terms, NPTs, See references, Use for terms, Use references, Used for terms, and Variants.

Taxonomy/thesaurus/ontology management software that supports the SKOS standard will typically use the SKOS designation of "alternate label," and software that supports the thesaurus standards will typically use the language of "nonpreferred term." As for software that supports both standards, which is becoming increasingly common, "alternate labels" or "alternate terms" is more common than "nonpreferred terms", and other designations might be used, such as "variants." So, for want of an unambiguous single-word designation, I will refer to these as "variants" for the remainder of this post.

Techniques for creating variants


Since synonyms are for single words, and most taxonomy terms are multi-word phrases, a common technique is to substitute a synonym for one word of a multi-word phrase. For example, Movie reviews and Film reviews.

Variants that are not exactly synonyms would also include technical and layperson language, such as Neoplasms and Cancer; older and newer designations, such as Near East and Middle East; and lexical variants, such as Hair loss and Baldness. Experts will tell you that in all of these cases these are not synonyms. They sufficiently equivalent, though, for most taxonomies.

This brings us to another important point. Variants should be roughly equivalent within the context of the taxonomy and the body of content it is used to index. What serves well as a variant in one taxonomy might not be suitable for the same term in another taxonomy.

The number of variants to create for each taxonomy term/concept depends on the search technology and on the display of the taxonomy in the user interface. While a taxonomy could be browsed, it is more common for a taxonomy to be searched. The user searches for terms within the taxonomy, matching search strings against any variant of a term, if not the preferred term itself. The search does not have to be an exact match and may match to taxonomy terms that have at least the same words (in any order) and grammatically stemmed versions of the words (such as education and educational). With this in mind, taxonomists do not need to create variants for every possible variation of a term, as the search technology will be able to take care of some of that. 

As for sources for variants, other than the taxonomist's own knowledge of language, any term variations in sample source documents to be indexed should be considered. If content to be indexed with the taxonomy has already been published, and users have been searching for it on a website or content management system, the user-entered search strings found in the search logs can be an excellent source for variant terms. External reference sources and similar taxonomies can be consulted as a source, but not relied upon as the primary source for variants.