Sunday, November 9, 2025

Schema Vocabularies and Value Vocabularies

There are different types of controlled vocabularies for information and knowledge management. Usually, we think of the various kinds of controlled vocabularies for purposes of tagging and finding information, such as term lists, authority files, thesauri, and taxonomies. In the broader context of information and knowledge management, there also exist higher-level controlled vocabularies called schema vocabularies. In this context, the better known (default) controlled vocabularies comprising specific concepts or terms for tagging content are called value vocabularies, since their terms/concepts are considered values.

This dichotomy of schema and value vocabularies occurs particularly within the context of metadata. Metadata management comprises two components: (1) a list of metadata types, also called elements, properties, or fields; and (2) the terms or values possible for each metadata element. I discussed types of metadata in more detail in my last blog post, "Types of Metadata Schema." Thus, a schema vocabulary comprises the names of metadata elements, and a value vocabulary is list of terms/concepts for a specific metadata element. For example, a schema vocabulary, might include Country, Language, Source, and Topic; and the multiple values vocabularies would be the lists of approved countries, languages, sources, and topics. It should be noted that in some systems, e.g. RDF, OWL, etc., the distinction between metadata elements and metadata values can be fuzzy. Furthermore, not all schema vocabulary elements have a corresponding value vocabulary (a controlled vocabulary), though, as some metadata elements may be for such values as title, description, and date. 

In my observation, we speak of “vocabularies” rather than “controlled vocabularies” in this context, especially with respect to schema, for various reasons. Schema vocabularies are referred to simply as “vocabularies,” rather than “controlled vocabularies,” because they are not traditional controlled vocabularies used for tagging, and also because their “control” is different from the control of value vocabularies. Value vocabularies can be changed but through defined policies and procedures, which depend on the implementation and ownership, and changes can be frequent, e.g. weekly, monthly, quarterly, or annually. Schema vocabularies, on the other hand, are intended to be standard, and are updated only very infrequently, such as once per 5-10 years, and usually by a standards body. Schema vocabularies provide control by their very nature. Meanwhile, it is often necessary to call out the controlled feature of value vocabularies, since some metadata properties may have uncontrolled keywords as their values.

Schema vocabularies may be metadata schema, such as Dublin Core (for published resources) or IPTC metadata (for photos), but other kinds of information and content management schema can also be considered as schema vocabularies in that a “vocabulary” defines the various elements. Such other schema vocabularies include SKOS (Simple Knowledge Organization System), DCAT (Data Catalog Vocabulary), and iiRDS (intelligent information Request and Delivery Standard), among others. Our panel “Using Schema and Value Vocabularies to Provide Consistency Across Structured Content” addressed these schema and other data frameworks, which are similar to but not the same as schema, such as OWL and DITA, at the recent DCMI (Dublin Core Metadata Initiative) conference in Barcelona in October.  Other speakers were Joseph Busch, who had the idea of this topic for a conference panel, Lief Erickson, Noz Urbina, and Peter Winstanley.

DCMI 2025 Panel: "Schema and Value Vocabularies for Consistency"

My presentation the DCMI panel, was "Schema and Value Vocabularies for Thesauri and Taxonomies," which explained that SKOS is a schema vocabulary, and specific SKOS-based taxonomies and thesauri are value vocabularies. SKOS (Simple Knowledge Organization System) is the W3C data model schema for knowledge organization systems, especially taxonomies and thesauri. It can also be considered a schema vocabulary, because it has standard elements with defined display names and machine-readable concatenated forms. In fact, the designation “elements” is what is used in the SKOS model. SKOS, however, is a special kind of schema vocabulary, and it’s not a metadata schema. When SKOS-based taxonomies or thesauri serve as the value vocabularies for metadata elements, those metadata elements are managed as specific SKOS Concept Schemes. In a faceted taxonomy, each Concept Scheme serves as a facet.

Taxonomists don’t usually think of vocabularies being classified as either "schema vocabularies" or "value vocabularies." However, as taxonomies have increasingly been integrated with metadata and serve purposes beyond just browsing, searching and retrieving content, it’s important to see the bigger picture of where taxonomies as value vocabularies fit in, and where taxonomies can provide more benefits.

Friday, October 31, 2025

Types of Metadata Schemas

Taxonomies or sets of controlled vocabularies are typically implemented as values for  various metadata elements (also called metadata properties or fields). Metadata elements that contain controlled vocabularies could be Topic, Activity, Location, Organization Name, People/Role Type, Document Type, Content Language, etc. These are often implemented as facets in faceted search, although they do not have to be. There may be additional metadata elements for non-taxonomy values, such as Document Title, Image Caption, Creator/Author, Creation Date, Rights Status, etc. In addition to designing taxonomies, in my consulting projects I often also design such broader metadata schemas.


Custom Metadata Schemas

A custom (use case-specific) metadata schema specifies which metadata elements to include for different purposes. These include content tagging and management, content workflow management, end-user search filters, or merely displayed on content records for identification.

A custom metadata schema may specify the following:

  • A definition for each metadata element
  • Sample values for each metadata element
  • In what user interfaces the metadata element appears
  • The ownership or authority of a metadata element, whether a department or role 

A custom metadata schema also specifies rules about the application of each metadata element, including:

  • The value type for the metadata element (For example, controlled vocabulary terms, uncontrolled keywords, free text, date, integers, Boolean yes/no, etc.)
  • Whether assignment of a value from the metadata element is required or optional for each content item (or depends on the specific type of content item).
  • Whether the assignment of the values from the metadata element is limited to just one or can be multiple, which is referred to as “cardinality.” (For example, the assignment of only one Document Type but up to four Topics per content item.)

Table example of a custom metadata scheme
Example of a Custom Metadata Schema
 

Standard Metadata Element Sets and Schemas

In the context of metadata schemas, there exist not only these custom metadata schemas, but also standard metadata sets of elements and their schemas. They provide predefined metadata elements that are intended to be sufficiently generic for various use cases. Perhaps the most widely used standard metadata schema in is Dublin Core, which is a set of 15 basic (core) elements intended for published documents. These elements are Title, Subject, Description, Type, Source, Relations, Coverage, Creator, Publisher, Contributor, Rights, Date, Format, Identifier, and Language. There are other standard metadata schema that are somewhat more specific for a subject domain, such as IPTC (International Press Telecommunications Council) metadata which is intended for images. When standard data notation, such as XML or RDF, whose specification may also part of the standard metadata scheme, metadata can then be shared.

Standard metadata schema include information for each element such as definition and type, but unlike custom metadata schemas, standard metadata schemas do not include any instructions on their application, such as cardinality and implementation, as that depends on each use case. Therefore, if you choose to apply a standard metadata schema, you need to additionally decide and document how it should be applied, especially which elements are to be used for which purposes, in which systems, along with metadata element-specific rules of requirements and cardinality, as describe above. This kind of document is referred to as an application profile.

My most recent conference presentation, a panel at the DCMI (Dublin Core Metadata Initiative) conference in Barcelona, October 22-25, addressed application profiles. Panel organizer, Joseph Busch, explained in his presentation: “An application profile defines a specific set of requirements, settings, and metadata for a particular application to ensure compatibility and functionality. The profile adapts general standards or frameworks to meet the needs of a specific use case, for example.”

Taxonomists usually don’t speak to their stakeholders or clients of "application profiles," because such specifications are typically already included within a larger taxonomy governance plan, something taxonomists commonly create and promote. When taxonomists work specifically with metadata experts, however, they should consider the specific needs of an application profile.

Finally, a standard metadata schema, with its predefined labels for metadata elements, can also be considered a kind of (controlled) vocabulary. This is the topic of my next blog post, "Schema Vocabularies and Value Vocabularies." 

Saturday, October 18, 2025

Semantic Data Conference 2025

This week I attended the second annual conference “SemanticData: Taxonomy, Ontology, and Knowledge Graphs,” hosted by Henry Stewart (HS) Events and co-located with the HS DAM (Digital Asset Management) conference. I found this conference to be very worthwhile to attend, even without presenting, for its networking opportunities and ideas shared. As a one-day one-track-only conference, it had only 12 speakers, so I was not a speaker again this year, as I was last year, in order to let others speak.

Ideas of Semantics

Semantic data means enriching data with meaning from controlled vocabularies, especially taxonomies, and with meaningful relationships and specific attributes, provided by ontologies. Taxonomies and ontologies are referred to then as “semantic models.” A knowledge graph is a semantic model plus all of the connected data, which is stored in a graph database.

How “semantics” was discussed was up to each speaker. Jessica Talisman gave an overview of semantic models in what she describes as the "semantic pipeline.” In his talk on information ethics, Gary Carlson stayed high-level, stating “Semantics is about moving information from one place to another.” By contrast, Ashleigh Faith focused on the practical application of semantic tags to benefit AI. In his keynote, Ahren Lehnart spoke of the need to trust semantic models and concluded by focusing on the people, listing what “semantic professionals” do, including driving semantic adoptions within an organization, engaging with subject matter experts, seeking out and staying involved in AI projects, targeting high-risk semantic cases, and designing transparency into semantic models.

Turning to practice, Melissa Knudtson Monsalve explained the adoption of “just enough semantics” as a solution for organizations facing challenges of implementing semantic models. The conference also had some interesting case studies. Laura Rodriguez spoke about taxonomy governance strategies undertaken at HealthStream. Tracy Forzaglia explained the use of taxonomy and tagging at Scholastic. Mindy Carner explained the implementation of the DITA structured content standard in conjunction with a controlled vocabulary to manage and deliver Help Center content at LinkedIn. Finally, Dr. Robert Sanderson explained and demonstrated Yale’s LUX Collections Discovery utilizing a cultural heritage ontology and knowledge graph.

Comparisons with Semantic Data 2024

I had blogged about the first conference, Semantic Data 2024, last year. The format was the same: Individual half-hour presentations, the first as a “keynote”, a participant discussion activity, and a panel discussion moderated by the chair. By comparison, the conference was larger this year, up from about 50 attendees to about 70, making the room quite full. Aside from the chair and two of the sponsors, all but one of the speakers were also different this year from last. 

Madi Weland Solomon was again the conference chair and moderator, and Factor and Datavid were again sponsors with sponsored talks that were not promotional. Gary Carlson of Factor presented on the importance of data quality in semantic architecture, and Tim Padilla of Datavid presented on the AI-readiness of enterprise data. Progress Software was a new sponsor, but instead of a sponsored talk, Jim Morris of Progress spoke on the closing panel.

Panel: Solomon, Morris, Sanderson, and Faith
The theme of AI (especially generative AI and LLMs) was somewhat more prominent in the conference this year, taken up in almost half of the sessions. Ashleigh Faith’s talk, “How Semantic Tags Benefit AI,” was especially practical and informative. AI was woven through Ahren Lehnart’s opening keynote, when he discussed semantic trends and predictions. Tracy Forzaglia’s case study was about tagging with AI. Finally, the closing panel discussion had a focus on AI this time even in its title “Semantic Architects vs. AI: Who Curates the Future?” In fact, the conference could be title: “Semantic Data: Taxonomy, Ontology, Knowledge Graphs, and AI.” The importance of “human in the loop” with regard to AI and semantic automation was emphasized.

The “roundtable” group discussion members addressed questions of their organization’s semantic maturity, important changes in the past year, and what topics they would like to have addressed next year. This proved to be a popular session, although the large number of attendees required more time than allotted, and the room did not have tables. Perhaps a larger room or two tracks will be needed next year. I hope to participate next fall, if my schedule allows. Meanwhile, those of you in Europe may attend Semantic Data Europe on June 25, 2026, in London.

 

Thursday, September 18, 2025

Narrower Terms vs. Alternative Terms

A number of years ago I worked on a project of cleaning up a large taxonomy on occupations and job titles. My client contact was sometimes confused between terms to be used as synonyms/variants for a preferred term and terms to be used as narrower terms to a preferred term. This initially surprised me, because the difference seemed so obvious. A more recent project raised the issue again, and I realize challenges.

The word “term” can be confusing, considering the different types of terms that exist. Both variant terms (also called synonym, nonpreferred terms, or entry terms) and narrower terms are kinds of terms. By contrast, focusing on concepts that may have various labels, the distinctions between a concept’s narrower concepts and its alternative labels is quite clear. The widely adopted SKOS (Simple Knowledge Organization System) data model standard follows the concept-based approach. SKOS is now followed by all dedicated taxonomy management software systems.

Many taxonomies, however, are not yet managed in dedicated taxonomy management systems but rather in spreadsheets or internally developed tools, neither of which follow SKOS. This is the case of both my projects in question. Each “term” in the spreadsheet-based tool had its own row, which resulted multiple rows for the same concept. Broader categories were in another column to the right. This format is potentially confusing because the variants appeared in a column as did the hierarchical levels, and you had to remember which column was which.

Regardless of the tool used, what makes it even more confusing is that a narrower concept could be either a variant term or a hierarchically narrower term. What may variously be called synonyms, variants, nonpreferred terms, entry terms, or alternative labels are not merely literal synonyms, but they could be any terms or labels that may be used in tagging to trigger the use of the concept or preferred term. This includes terms whose meaning is narrower or more specific than the term/concept in question, since the latter includes more specific terms within its scope. So, tagging the occurrence of a concept with a broader concept is acceptable.

For example, in a medical taxonomy a concept can be Radiation therapy. Radiotherapy is an alternative label. But then there are specific types of radiation therapy, such as Brachytherapy, Radioimmunotherapy, and Radionuclide therapy. These could be added to the taxonomy either as narrower concepts or as alternative labels to Radiation therapy, depending on how specific the taxonomy should be.

When creating or editing a taxonomy, it is often difficult to decide how specific the taxonomy should be in certain places. Terms that are too specific to warrant use as concepts should then be relegated to the status of variants/alternative labels. Deciding what is too specific depends on the concept’s relative specificity within the entire taxonomy in addition to considering the potential usage of the specific concept.

In sum, if you are not ready to adopt SKOS-based taxonomy management software, at the very least you should adopt a SKOS-based approach in conceptualizing and labeling your taxonomy. Call things “concepts” and “labels”, not “terms.” Concepts are in hierarchical relationships to each other. Labels are the names for concepts. The “preferred label” is the displayed form of the name (such as in facets in the fronted application), and “alternative labels” are variant labels to match against strings of text that may be used for the concept and trigger tagging with the concept.  Furthermore, alternative labels could be displayed differently from preferred labels, such as in italics and/or a different colored shaded cell.

 

Sunday, August 10, 2025

When to Design a New Taxonomy for a New System

Often organizations determine that a suitable time to adopt a new taxonomy is in conjunction with adopting a new system for its implementation, such as a content management system (CMS) or digital asset management system (DAM). They can budget taxonomy design and development services as part of the consulting services needed for the content migration and system implementation project, and they can improve and optimize the taxonomy for its new implementation and use.

There is the question of timing, though. Recently, a prospective consulting client asked me whether the new taxonomy should be developed prior to the selection and implementation of a new system or afterwards. Ideally, both the taxonomy project and the CMS or DAM adoption can happen simultaneously. However, the design and development of a taxonomy takes less time (typically 3-4 months) than the adoption of a new CMS or DAM. Altogether, a system selection, with a trial or a proof-of-concept project, implementation, data/content migration, and user training, can take 6-18 months.

Benefits of Taxonomy Development Prior to System Adoption

The primary benefit of developing a taxonomy prior to system adoption is that you can make it a system requirement that the new system supports the taxonomy that you have designed to best serve your users, your desired tagging method, and the nature of your content. These criteria should take precedence over designing a taxonomy to fit the requirements (or limitations) of a CMS or DAM.

Over time, your organization will adopt other systems, and the taxonomy should be suitable for multiple systems, rather than being system specific. Especially if you have an enterprise (enterprise-wide) taxonomy as your eventual goal, designing your ideal taxonomy first should be your approach. If one system cannot take advantage of all features of your taxonomy, another system may. There are also usually development work-arounds to get the full use out of your taxonomy.

Benefits of Taxonomy Development After System Adoption

A CMS or DAM has a variety of functions, and tagging and retrieval of content with a taxonomy in only one of those functions. Workflow management, rights management, authoring features (for CMS) and image/video editing features (for DAM) tend to matter more than taxonomy use among the requirements for a system. You can make “good support of taxonomy management and tagging” a requirement for your new CMS or DAM without getting into the specifics.

Adding features a taxonomy (such as polyhierarchy, related-concept relationships, end-user scope notes, different sets of synonyms/alternative labels to support each tagging and searching) if the system you later adopt does not support them is a waste of time and resources. It’s better to wait until a system in selected and implemented before fully designing a taxonomy.

Iterative Taxonomy Design Approach

When implementing a new taxonomy with a new system, the ideal approach is to spread out the taxonomy design and development tasks over the phases on the system selection and implementation process.

You should consider basic taxonomy requirements early in the system selection process. To do this, you might categorize different taxonomy support features as essential and nice-to-have. The method of tagging (automated, manual, automated with human review, and a mix) needs to be determined as both a system requirement and as a factor in the design of the taxonomy.

Then during the lengthy process of system testing and selection, information-gathering work for the taxonomy may take place. This involves stakeholder interviews, user focus groups or brainstorming sessions, content analysis, and review of existing/legacy taxonomies and other controlled vocabularies. Draft versions of portions of the taxonomy, without all features, may be created and reviewed, prior to the system selection decision.

After the CMS or DAM is selected and is in the process of being implemented the taxonomy design can be refined with features that the new system can support, and then the taxonomy can be fully built out. The new taxonomy can also be tested in the new system for its suitability for tagging and retrieval, and final enhancements are made based on the test results. The documentation of the taxonomy, including guidelines for its maintenance (a governance plan), should be started early in the taxonomy design process, but additional system-specific documentation is created after the new system is implemented.