Showing posts with label Taxonomy standards. Show all posts
Showing posts with label Taxonomy standards. Show all posts

Sunday, November 9, 2025

Schema Vocabularies and Value Vocabularies

There are different types of controlled vocabularies for information and knowledge management. Usually, we think of the various kinds of controlled vocabularies for purposes of tagging and finding information, such as term lists, authority files, thesauri, and taxonomies. In the broader context of information and knowledge management, there also exist higher-level controlled vocabularies called schema vocabularies. In this context, the better known (default) controlled vocabularies comprising specific concepts or terms for tagging content are called value vocabularies, since their terms/concepts are considered values.

This dichotomy of schema and value vocabularies occurs particularly within the context of metadata. Metadata management comprises two components: (1) a list of metadata types, also called elements, properties, or fields; and (2) the terms or values possible for each metadata element. I discussed types of metadata in more detail in my last blog post, "Types of Metadata Schema." Thus, a schema vocabulary comprises the names of metadata elements, and a value vocabulary is list of terms/concepts for a specific metadata element. For example, a schema vocabulary, might include Country, Language, Source, and Topic; and the multiple values vocabularies would be the lists of approved countries, languages, sources, and topics. It should be noted that in some systems, e.g. RDF, OWL, etc., the distinction between metadata elements and metadata values can be fuzzy. Furthermore, not all schema vocabulary elements have a corresponding value vocabulary (a controlled vocabulary), though, as some metadata elements may be for such values as title, description, and date. 

In my observation, we speak of “vocabularies” rather than “controlled vocabularies” in this context, especially with respect to schema, for various reasons. Schema vocabularies are referred to simply as “vocabularies,” rather than “controlled vocabularies,” because they are not traditional controlled vocabularies used for tagging, and also because their “control” is different from the control of value vocabularies. Value vocabularies can be changed but through defined policies and procedures, which depend on the implementation and ownership, and changes can be frequent, e.g. weekly, monthly, quarterly, or annually. Schema vocabularies, on the other hand, are intended to be standard, and are updated only very infrequently, such as once per 5-10 years, and usually by a standards body. Schema vocabularies provide control by their very nature. Meanwhile, it is often necessary to call out the controlled feature of value vocabularies, since some metadata properties may have uncontrolled keywords as their values.

Schema vocabularies may be metadata schema, such as Dublin Core (for published resources) or IPTC metadata (for photos), but other kinds of information and content management schema can also be considered as schema vocabularies in that a “vocabulary” defines the various elements. Such other schema vocabularies include SKOS (Simple Knowledge Organization System), DCAT (Data Catalog Vocabulary), and iiRDS (intelligent information Request and Delivery Standard), among others. Our panel “Using Schema and Value Vocabularies to Provide Consistency Across Structured Content” addressed these schema and other data frameworks, which are similar to but not the same as schema, such as OWL and DITA, at the recent DCMI (Dublin Core Metadata Initiative) conference in Barcelona in October.  Other speakers were Joseph Busch, who had the idea of this topic for a conference panel, Lief Erickson, Noz Urbina, and Peter Winstanley.

DCMI 2025 Panel: "Schema and Value Vocabularies for Consistency"

My presentation the DCMI panel, was "Schema and Value Vocabularies for Thesauri and Taxonomies," which explained that SKOS is a schema vocabulary, and specific SKOS-based taxonomies and thesauri are value vocabularies. SKOS (Simple Knowledge Organization System) is the W3C data model schema for knowledge organization systems, especially taxonomies and thesauri. It can also be considered a schema vocabulary, because it has standard elements with defined display names and machine-readable concatenated forms. In fact, the designation “elements” is what is used in the SKOS model. SKOS, however, is a special kind of schema vocabulary, and it’s not a metadata schema. When SKOS-based taxonomies or thesauri serve as the value vocabularies for metadata elements, those metadata elements are managed as specific SKOS Concept Schemes. In a faceted taxonomy, each Concept Scheme serves as a facet.

Taxonomists don’t usually think of vocabularies being classified as either "schema vocabularies" or "value vocabularies." However, as taxonomies have increasingly been integrated with metadata and serve purposes beyond just browsing, searching and retrieving content, it’s important to see the bigger picture of where taxonomies as value vocabularies fit in, and where taxonomies can provide more benefits.

Saturday, June 28, 2025

A Multilingual Thesaurus Standard


Standards for taxonomies are of two kinds:
1) data models for interoperability and machine-readability, namely SKOS (Simple Knowledge Organization System) published by the W3C, and
2) best practices guidelines, which focus on thesauri but are relevant for taxonomies. These are ANSI/NISO Z39.19 and ISO 25964. The International Organization for Standardization will publish a revised edition of ISO 25964 Part 1: Thesauri for Information Retrieval later this year. I have been contributing to the revision as a member of its international working group

I have written before on Standards for Taxonomies, which is at a high level, and I will likely write again about the revisions in the new version of ISO 25964 Part 1, when it will be published. For now, I’d like to discuss some the specifics of defining an international standard which I have been working on recently.

Different Language Versions

The international standard is written in English, and it will be translated into other languages in the future. Since it will not be assumed to be translated into certain languages, and since the standard covers multilingual thesauri, it needs to include examples in different languages. Some of the examples within the sections of ISO 25964-1 are translated into common languages, such as French, German, and Spanish, but other languages are not included. Thus, this standard also includes an extensive table of the “tags” and “expansions” or terminology that appear in a thesaurus for 10 additional languages. Examples of tags include BT (Broader Term), NT (Narrower term), and SN (Scope note).

A German reviewer pointed out some errors in the German column of the table, which prompted me to look more carefully, and I noticed some issues in the Russian and Arabic, which are languages I had studied long ago and which are not represented by native speakers in our working group. I then sought other sources on thesauri in those languages, examples of thesauri on the web, and native-speaker experts.

As it turns out, for the specialized use of thesauri, it’s not just a matter of a translation, but what is used in the context. Scope note could have various translations in a language, as both the words “scope” and “note” can have different translations. Even, “broader,” narrower” and “related” can be translated differently. Broad can mean “wide,” and thus perhaps “superordinate” and “subordinate” are better translations in another language.

Variations and Lack of Standards

The thesaurus terminology is quite standardized in English and somewhat less so in other languages. Although the original ISO and German DIN thesaurus standards go back to 1974 and 1972 respectively, these standards have never been free and are actually rather expensive for the number of pages, unlike the ANSI/NISO standard, which has been made freely available since 1974. Thus, the free English-language standard from the United States has been more widely read and followed than the ISO standard. Creators of other standards sometimes translate from English, but inconsistently, rather than relying on a standard in their own language.

There are different reasons for such variations. Some thesaurus authors prefer to use terminology closer to English, while others prefer to user terminology that is more native, when near-synonyms exist. For example in Russian, “related” could be “assotsiativny” (similar to associative) or “rodstvenny,” and “concept” could be “kontsept,” or “ponyatiya.” There is also the matter of saving space with concise labels. While English has a single word for “broader” and“narrower,” a correct translation for the comparative requires two words, as in “more broad” or “more narrower” in other languages, such as French, Spanish, and Russian. Often the word for “more” is omitted to save space, but in other thesauri it is included for preciseness, such as inserting the word mas in Spanish. Arabic-language thesauri additionally vary in their use of tags/terminology depending on the region within the Arabic-speaking world of 22 countries.

I found the multilingual UNESCO thesaurus and UN library’s UNBIS thesaurus good sources to consult, since you can change not only the term display, but also the user interface with its tags and designation into different languages. However, these two UN-related sources are not even consistent with each other!

I suspect that in some thesauri the terminology was simply translated from English by a translator who was not familiar thesauri, rather than developed by a thesaurus specialist/taxonomist who would research the formats of other thesauri in that language.

Legacy Standards and Future Direction

Thesauri were originally developed to be presented in print, where space is an issue so short tags were created. Now thesauri are online, and two-letter tags are not needed and rarely displayed. But the new edition of the standard continues to include tags to be comprehensive and provide consistency with printed thesauri. However, it is my personal opinion that we should not invent comprehensive tags for all languages where they have not previously existed.

Should the standard be more descriptive or prescriptive? Descriptive would mean describing what is done in thesauri in existence. I looked up various thesauri online to see what tags and terminology they were using. If a certain designation is used more than another, such as the phrase used to mean “broader term,” then we could decide that is the standard for a language.

Prescriptive would mean to dictate the standard, typically based on expertise and belief in what would be best. In face of inconsistencies, the standard should be prescriptive. Being prescriptive would also mean that the latest revision of the standard should try to follow the prior edition and any previous translations of it, rather than merely following the usage practice the of leading examples of thesauri on the Web. The conclusion was to include both.

Although the distinction between terms and concepts is addressed in the current ISO thesaurus standard, the current summary table of tags addresses only “terms” and term relationships. The nuance of term versus concept was discussed at length by the working group and the conclusion was to include both concept to recognize an idea and term to be the representation of the concept itself.  Thus, the table of tags and terminology in the new version will now include Broader concept, Narrower concept and Related concept (which do not have tags). As new additions to the standard, the names for these in other languages thus need to be prescribed by the standard. Relying on thesauri published on the web, I found, results in too much inconsistency.  Official translations of the SKOS data model are a good source for this, but the translations exist for only some languages. I even looked at the German user interface of a SKOS-based taxonomy management software (PoolParty) and found yet other translations for broader, narrower, and related that were not consistent with the official German SKOS translation.

I hope the new edition of ISO 25964 Part 1: Thesauri for Information Retrieval will be read more widely and provide more consistency for thesauri.