Saturday, September 30, 2017

Vocabulary Management Issues



Issues in Vocabulary Management” is the latest Technical Report (TR-06-2017) published by the National InformationStandards Organization (NISO), approved on September 25, 2017. I had the honor of serving on its working group, specifically on its subgroup for Vocabulary Use/Reuse.

The most significant NISO publication for controlled vocabularies is ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, which is referenced several times in TR-06. ANSI/NISO Z39.19 focuses on how to design and create controlled vocabularies (especially thesauri and taxonomies), whereas TR -06 addresses issues in the use of controlled vocabularies. Furthermore, as a Technical Report, rather than a Standard, this 49-page document does not contain requirements, but rather serves an informative purpose. It does have a page of recommendations, though, which are for a vocabulary’s definition and attribute types, its best practices for documentation, and its licensing or provisions for use and reuse.

Over time, the need to create new controlled vocabularies from scratch diminishes, as more vocabularies come into existence, especially those that are made available for sharing or licensing (see my blog post Directories and Databases of Published Controlled Vocabularies) but the need to maintain, revise, and reuse them grows, so this Technical Report serves a valuable role.

What are the “issues” in vocabulary management? They could vary, based on the organization and implementation, but this document considers three areas of

  • Vocabulary use and reuse, dealing with permissions, licenses, maintenance, versioning, extending and mapping vocabularies.
  • Vocabulary documentation, dealing with governance issues and how to document vocabulary properties.
  • Vocabulary preservation, dealing with issues of abandoned or “orphaned” vocabularies, which is especially the case of vocabularies developed by nonprofit organizations which have lost their funding to maintain them.

These issues are relevant to both proprietary controlled vocabularies, which may be reused through licensing agreements, and publicly available vocabularies, which are shared and reused increasingly through linked data on the web, or more specifically the Semantic Web and the Linked Open Data environment.  For publicly available or open vocabularies there are also the issues of simply finding or discovering suitable and sustainable vocabularies and evaluating them and then the communication between the vocabulary owner and user.

TR-06 takes a somewhat broader view of “vocabularies,” not just “controlled vocabularies,” but also including ontologies, unstructured term lists, terminologies, synonym rings, etc. I explored these differences and definitions in detail in my blog post Vocabularies and Controlled Vocabularies, which I wrote shortly after starting work on the NISO working group. The vocabularies of concern of TR-06 also include element sets, which comprise metadata properties/fields and not merely the controlled vocabulary terms/values within those properties.

TR-06 does not seem so much as a “technical report.” It also includes several real-life examples and use cases. To a certain extent, it explains by example.  Appendices include a glossary of terms with extensive definitions; a descriptive list of vocabulary directories, repositories or collections (something that I worked on); a list of free and open vocabulary tools (far more extensive than those I described in a previous blog post Free Taxonomy Management Software); and a list of additional resources with links, besides its bibliography, making this quite a valuable resource.

TR-06 “Issues in Vocabulary Management” will now be added to my list of recommended resources for controlled vocabulary and taxonomy management, and I hope that many of those who manage taxonomies will take a look at it.