Thursday, December 31, 2015

Vocabularies and Controlled Vocabularies

I have long considered a taxonomy as a particular, structured kind of controlled vocabulary. More recently, however, I have been hearing of “vocabularies” without the word “controlled” in front, although still for the purposes of information management and retrieval, which is cause to wonder: are controlled vocabularies and vocabularies the same thing or not?

Controlled Vocabularies


Definition

It’s the standards that drive the definitions and also the scope of meaning. “Controlled vocabularies” have been most authoritatively defined and scoped by ANSI/NISO Z39.19-2005 Guidelines for the construction, format, and management of monolingual controlled vocabularies. The Standard’s glossary defines it as: “A list of terms that have been enumerated explicitly.” Vocabulary control is an important part of the definition of controlled vocabularies, whereby synonyms are linked together, homographs are distinguished, and unambiguous concepts are defined or scoped.

Although not part of the standard’s name, ISO 25964 Thesauri and interoperability with other vocabularies (parts 1 and 2 published in 2011 and 2013) also defines controlled vocabularies in its glossary, where it states that a controlled vocabulary is a “prescribed list of terms, headings or codes, each representing a concept.” It is also noted: “Controlled vocabularies are designed for applications in which it is useful to identify each concept with one consistent label, for example when classifying documents, indexing them and/or searching them.”

Scope
As for what is included within the scope of controlled vocabularies, ANSI/NISO Z39.19-2005 states in its Scope section, on the first page that controlled vocabularies include:
  • Lists of controlled terms
  • Synonym rings
  • Taxonomies
  • Thesauri
In the ISO 25964, the scope of inclusion of controlled vocabularies is less clear. In the glossary definition for controlled vocabulary, it states: “Thesauri, subject heading schemes and name authority lists are examples of controlled vocabularies,” but a complete list of controlled vocabularies is not presented.

What is significant is that ISO 25964 does make a distinction between “controlled vocabulary” and just vocabulary. ISO 25964 describes more kinds of vocabularies, but then addresses the issue of vocabulary control in each.  Types of vocabularies that ISO 25964 discusses as having vocabulary control are:
  • Thesauri
  • Classification schemes
  • Classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Name authority lists
According to ISO 25964 part 2, terminologies and ontologies usually have vocabulary control, but vocabulary control is not a requirement. So, it can be inferred that most but not all terminologies (discussed in my last blog post) or ontologies are controlled vocabularies. Name authority lists are “usually controlled vocabularies” according to ISO 25964 part 2 (section 23.1.1). Synonym rings do not have vocabulary control (section 24.2.3).

Structured Vocabularies


Definition

There is another designation less commonly used of “structured vocabulary.” It appears in the name of the British Standard, BS 8723 Structured vocabularies for information retrieval – Guide. BS 8723 was published in five parts over 2005 – 2008, revising and expanding on the earlier BS and ISO standards for monolingual and multilingual thesauri, and, in turn, became the basis for the current ISO 25964 pair of standards.

ISO 25964 also includes “structured vocabulary” in its glossary, defined as an “organized set of terms, headings or codes representing concepts and their inter-relationships, which can be used to support information retrieval,” and goes on to note: “A structured vocabulary can also be used for other purposes. In the context of information retrieval, the vocabulary needs to be accompanied by rules for how to apply the terms.”  Meanwhile, ANSI/NISO Z39.19-2005 does not mention “structured vocabularies.”

Scope
As for what is included within the scope of structured vocabularies, while that is not so clearly stated, it can be assumed, based on the title of BS 8723 Structured vocabularies for information retrieval – Guide, that the vocabularies included within the standard are all “structured vocabularies.” These are:
  • Thesauri
  • Classification schemes
  • Business classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Authority lists
ISO 25964 seems to use “vocabularies” and “structured vocabularies” somewhat interchangeably. While the standard’s title refers to “thesauri and … other vocabularies,” its foreword states “ISO 25964-2 will cover interoperability between different thesauri and with other types of structured vocabulary, such as classification schemes, name authority lists, ontologies, etc.”

If all the types of vocabularies in part 2 are indeed considered as “structured vocabularies” then the scope of structured vocabularies would cover:
  • Thesauri
  • Classification schemes
  • Classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Terminologies
  • Name authority lists
  • Synonym rings
The last two, however, might not be included as structured vocabularies. ISO 25964 part 2 says that name authority lists “may also be structured vocabularies” (23.1.1), implying that they are not always structured vocabularies, and it also explains that synonym rings are “not hierarchically structured.”

Vocabularies


The simple one-word designation of “vocabulary,” when used in the context of support for information retrieval, comprises all controlled and structured vocabularies, including those at the margin of the definitions or not always meeting their strict requirements of controlled or structured vocabularies, such as ontologies, terminologies, name authority lists, and synonym rings, along with other flat (unstructured) term lists.

Vocabularies, not necessarily controlled or structured, are also what are referred to in other frameworks or web contexts, such as SKOS (simple knowledge organization system) vocabularies, Semantic Web Vocabularies, and Linked Open Vocabularies.

What is interesting to note is what other topics are being discussed when the terms “controlled vocabulary” and “vocabulary” alone are used in ISO 25964 part 2 Interoperability with other vocabularies.  Controlled vocabularies are discussed in context of entry terms, pre-coordination, post-coordination, near synonyms, and indexing. Vocabularies in general are discussed in context of equivalence mapping, interoperability, resources and authorities, registries, multilingual types, and management software/systems.

Conclusions


Taxonomies, thesauri, subject heading schemes, and classification schemes are both controlled vocabularies and structured vocabularies. Most controlled vocabularies are structured vocabularies, and almost all structured vocabularies are controlled vocabularies.  But there are other vocabularies that do not meet the criteria of one definition or another, and to recognize and include them, especially as resources or for the mapping of terms, we refer to them as just vocabularies.