The Accidental Taxonomist: NISO

Showing posts with label NISO. Show all posts

Saturday, September 30, 2017

Vocabulary Management Issues

“Issues in Vocabulary Management” is the latest Technical Report (TR-06-2017) published by the National InformationStandards Organization (NISO), approved on September 25, 2017. I had the honor of serving on its working group, specifically on its subgroup for Vocabulary Use/Reuse.

The most significant NISO publication for controlled vocabularies is ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, which is referenced several times in TR-06. ANSI/NISO Z39.19 focuses on how to design and create controlled vocabularies (especially thesauri and taxonomies), whereas TR -06 addresses issues in the use of controlled vocabularies. Furthermore, as a Technical Report, rather than a Standard, this 49-page document does not contain requirements, but rather serves an informative purpose. It does have a page of recommendations, though, which are for a vocabulary’s definition and attribute types, its best practices for documentation, and its licensing or provisions for use and reuse.

Over time, the need to create new controlled vocabularies from scratch diminishes, as more vocabularies come into existence, especially those that are made available for sharing or licensing (see my blog post Directories and Databases of Published Controlled Vocabularies) but the need to maintain, revise, and reuse them grows, so this Technical Report serves a valuable role.

What are the “issues” in vocabulary management? They could vary, based on the organization and implementation, but this document considers three areas of

Vocabulary use and reuse, dealing with permissions, licenses, maintenance, versioning, extending and mapping vocabularies.
Vocabulary documentation, dealing with governance issues and how to document vocabulary properties.
Vocabulary preservation, dealing with issues of abandoned or “orphaned” vocabularies, which is especially the case of vocabularies developed by nonprofit organizations which have lost their funding to maintain them.

These issues are relevant to both proprietary controlled vocabularies, which may be reused through licensing agreements, and publicly available vocabularies, which are shared and reused increasingly through linked data on the web, or more specifically the Semantic Web and the Linked Open Data environment. For publicly available or open vocabularies there are also the issues of simply finding or discovering suitable and sustainable vocabularies and evaluating them and then the communication between the vocabulary owner and user.

TR-06 takes a somewhat broader view of “vocabularies,” not just “controlled vocabularies,” but also including ontologies, unstructured term lists, terminologies, synonym rings, etc. I explored these differences and definitions in detail in my blog post Vocabularies and Controlled Vocabularies, which I wrote shortly after starting work on the NISO working group. The vocabularies of concern of TR-06 also include element sets, which comprise metadata properties/fields and not merely the controlled vocabulary terms/values within those properties.

TR-06 does not seem so much as a “technical report.” It also includes several real-life examples and use cases. To a certain extent, it explains by example. Appendices include a glossary of terms with extensive definitions; a descriptive list of vocabulary directories, repositories or collections (something that I worked on); a list of free and open vocabulary tools (far more extensive than those I described in a previous blog post Free Taxonomy Management Software); and a list of additional resources with links, besides its bibliography, making this quite a valuable resource.

TR-06 “Issues in Vocabulary Management” will now be added to my list of recommended resources for controlled vocabulary and taxonomy management, and I hope that many of those who manage taxonomies will take a look at it.

Sunday, June 18, 2017

Standards for Taxonomies

Since “taxonomies” are rather loosely defined, standards specifically for taxonomies do not exist, but there are standards that are relevant to taxonomies. A taxonomy is a kind of controlled vocabulary, and there are standards for controlled vocabularies. There are also standards specifically for thesauri, a kind of controlled vocabulary with which taxonomies typically share many features.

Standards serve various purposes. Two leading purposes for standards are:

To ensure consistency and ease of use across different products or systems used by different users.
To ensure interoperability, the sharing or exchange of products/services/information.

Standards for Consistency

Standards aimed at ensuring consistency and ease of use would include buttons on devices, menus in user interfaces, pedals in cars. With such standards, users can expect the same experience from manufacturers or service providers and thus they are able to easily use products or systems from different manufacturers/providers/vendors. In the case of information systems, this kind of standard includes those for the design and style of book indexes and thesauri. These “standards” tend to be guidelines, recommendations, or accepted conventions, and not exactly strict standards, even if issued from a standards body. For thesauri, the “standard” is issued by the NationalInformation Standards Institute (NISO), but it is called a "guideline”: ANSI/NISOZ.39.19 Guidelines for the Construction of Monolingual Controlled Vocabularies. The corresponding ISO standard is ISO 25964 Part 1: Thesauri for Information Retrieval.

These guidelines cover style and form of terms, circumstances for creating the various kinds of relationships between terms, use of notes on terms, etc. They are all about how to create well-formed thesauri with consistent design features that are then easy and intuitive to use. For example, when a user sees that two terms are in a hierarchical relationship, the user understands that the narrower term is a kind of, instance of, or integral part of the broader term, and not merely an aspect of or some other related concept of the broader term. In fact, the end-user of a thesaurus does not even need to know and understand thesaurus principles to be able to make use of a thesaurus to find desired concepts and content.

Standards for Interoperability

The other kind of standards, those aimed at ensuring interoperability, would include standards for size and units of measure, data exchange, and communications protocols. Interoperability standards are important for those controlled vocabularies which are intended to be shared or reused. Thus, the content to which controlled vocabularies link can be accessed by third parties or made publicly accessible over the Web. Controlled vocabularies may be “reused”, if the original creator of a controlled vocabulary decides to license the vocabulary (without linked content) to other publishers to use on their own content, so that the second publisher does not have to reinvent a controlled vocabulary that already exists in same subject area.

Interoperability standards for controlled vocabularies include ZThes (a thesaurus schema for XML, which is has since gone out of style), World Wide Web Consortium (W3C) specifications for the Semantic Web including SKOS (Simple KnowledgeOrganization System) and the Web Ontology Language (OWL) for ontologies, and ISO 25964 Part 2: Interoperability with other vocabularies. Indeed, ISO 25964 covers consistency standards in its first part and interoperability standards in its second part.

Metadata Schema

Since taxonomies or other controlled vocabularies may be used to provide terms that fill a certain metadata element/property/field within a larger set of metadata, the use of a standard metadata schema or model is yet another way in which interoperable standards involve taxonomies. If structured content is to be shared or exchanged, the metadata fields need to be standardized with the same names, abbreviations, and purposes.

Examples of standard metadata schema include MARC for library materials, Dublin Core (DCMI) for generic online networked resources, IPTC (International Press Telecommunications Council) for photographs and other media, DDI (Data Documentation Initiative) for describing data from the social sciences, and PREMIS (Preservation Metadata: Implementation Strategies) for repositories of digital objects. Adopting such a metadata schema would be another way to enable sharing of content tagged with the metadata.

I was pleased to have the opportunity to learn more about information and publishing standards recently at the Society for Scholarly Publishing conference in attending pre-meeting seminar “All About Standards.”