Sunday, June 18, 2017

Standards for Taxonomies



Since “taxonomies” are rather loosely defined, standards specifically for taxonomies do not exist, but there are standards that are relevant to taxonomies. A taxonomy is a kind of controlled vocabulary, and there are standards for controlled vocabularies. There are also standards specifically for thesauri, a kind of controlled vocabulary with which taxonomies typically share many features. 

Standards serve various purposes. Two leading purposes for standards are:
  1. To ensure consistency and ease of use across different products or systems used by different users.
  2. To ensure interoperability, the sharing or exchange of products/services/information.

Standards for Consistency

Standards aimed at ensuring consistency and ease of use would include buttons on devices, menus in user interfaces, pedals in cars. With such standards, users can expect the same experience from manufacturers or service providers and thus they are able to easily use products or systems from different manufacturers/providers/vendors. In the case of information systems, this kind of standard includes those for the design and style of book indexes and thesauri. These “standards” tend to be guidelines, recommendations, or accepted conventions, and not exactly strict standards, even if issued from a standards body. For thesauri, the “standard” is issued by the NationalInformation Standards Institute (NISO), but it is called a "guideline”: ANSI/NISOZ.39.19 Guidelines for the Construction of Monolingual Controlled Vocabularies. The corresponding ISO standard is ISO 25964 Part 1: Thesauri for Information Retrieval.

These guidelines cover style and form of terms, circumstances for creating the various kinds of relationships between terms, use of notes on terms, etc. They are all about how to create well-formed thesauri with consistent design features that are then easy and intuitive to use. For example, when a user sees that two terms are in a hierarchical relationship, the user understands that the narrower term is a kind of, instance of, or integral part of the broader term, and not merely an aspect of or some other related concept of the broader term. In fact, the end-user of a thesaurus does not even need to know and understand thesaurus principles to be able to make use of a thesaurus to find desired concepts and content.

Standards for Interoperability

The other kind of standards, those aimed at ensuring interoperability, would include standards for size and units of measure, data exchange, and communications protocols. Interoperability standards are important for those controlled vocabularies which are intended to be shared or reused. Thus, the content to which controlled vocabularies link can be accessed by third parties or made publicly accessible over the Web. Controlled vocabularies may be “reused”, if the original creator of a controlled vocabulary decides to license the vocabulary (without linked content) to other publishers to use on their own content, so that the second publisher does not have to reinvent a controlled vocabulary that already exists in same subject area.

Interoperability standards for controlled vocabularies include ZThes (a thesaurus schema for XML, which is has since gone out of style), World Wide Web Consortium (W3C) specifications for the Semantic Web including SKOS (Simple KnowledgeOrganization System) and the Web Ontology Language (OWL) for ontologies, and ISO 25964 Part 2: Interoperability with other vocabularies. Indeed, ISO 25964 covers consistency standards in its first part and interoperability standards in its second part. 

Metadata Schema

Since taxonomies or other controlled vocabularies may be used to provide terms that fill a certain metadata element/property/field within a larger set of metadata, the use of a standard metadata schema or model is yet another way in which interoperable standards involve taxonomies.  If structured content is to be shared or exchanged, the metadata fields need to be standardized with the same names, abbreviations, and purposes.

Examples of standard metadata schema include MARC for library materials, Dublin Core (DCMI) for generic online networked resources, IPTC (International Press Telecommunications Council) for photographs and other media, DDI (Data Documentation Initiative) for describing data from the social sciences, and PREMIS (Preservation Metadata: Implementation Strategies) for repositories of digital objects. Adopting such a metadata schema would be another way to enable sharing of content tagged with the metadata.

I was pleased to have the opportunity to learn more about information and publishing standards recently at the Society for Scholarly Publishing conference in attending pre-meeting seminar “All About Standards.”