The Accidental Taxonomist: October 2025

Friday, October 31, 2025

Types of Metadata Schemas

Taxonomies or sets of controlled vocabularies are typically implemented as values for various metadata elements (also called metadata properties or fields). Metadata elements that contain controlled vocabularies could be Topic, Activity, Location, Organization Name, People/Role Type, Document Type, Content Language, etc. These are often implemented as facets in faceted search, although they do not have to be. There may be additional metadata elements for non-taxonomy values, such as Document Title, Image Caption, Creator/Author, Creation Date, Rights Status, etc. In addition to designing taxonomies, in my consulting projects I often also design such broader metadata schemas.

Custom Metadata Schemas

A custom (use case-specific) metadata schema specifies which metadata elements to include for different purposes. These include content tagging and management, content workflow management, end-user search filters, or merely displayed on content records for identification.

A custom metadata schema may specify the following:

A definition for each metadata element
Sample values for each metadata element
In what user interfaces the metadata element appears
The ownership or authority of a metadata element, whether a department or role

A custom metadata schema also specifies rules about the application of each metadata element, including:

The value type for the metadata element (For example, controlled vocabulary terms, uncontrolled keywords, free text, date, integers, Boolean yes/no, etc.)
Whether assignment of a value from the metadata element is required or optional for each content item (or depends on the specific type of content item).
Whether the assignment of the values from the metadata element is limited to just one or can be multiple, which is referred to as “cardinality.” (For example, the assignment of only one Document Type but up to four Topics per content item.)

Table example of a custom metadata scheme

Example of a Custom Metadata Schema

Standard Metadata Element Sets and Schemas

In the context of metadata schemas, there exist not only these custom metadata schemas, but also standard metadata sets of elements and their schemas. They provide predefined metadata elements that are intended to be sufficiently generic for various use cases. Perhaps the most widely used standard metadata schema in is Dublin Core, which is a set of 15 basic (core) elements intended for published documents. These elements are Title, Subject, Description, Type, Source, Relations, Coverage, Creator, Publisher, Contributor, Rights, Date, Format, Identifier, and Language. There are other standard metadata schema that are somewhat more specific for a subject domain, such as IPTC (International Press Telecommunications Council) metadata which is intended for images. When standard data notation, such as XML or RDF, whose specification may also part of the standard metadata scheme, metadata can then be shared.

Standard metadata schema include information for each element such as definition and type, but unlike custom metadata schemas, standard metadata schemas do not include any instructions on their application, such as cardinality and implementation, as that depends on each use case. Therefore, if you choose to apply a standard metadata schema, you need to additionally decide and document how it should be applied, especially which elements are to be used for which purposes, in which systems, along with metadata element-specific rules of requirements and cardinality, as describe above. This kind of document is referred to as an application profile.

My most recent conference presentation, a panel at the DCMI (Dublin Core Metadata Initiative) conference in Barcelona, October 22-25, addressed application profiles. Panel organizer, Joseph Busch, explained in his presentation: “An application profile defines a specific set of requirements, settings, and metadata for a particular application to ensure compatibility and functionality. The profile adapts general standards or frameworks to meet the needs of a specific use case, for example.”

Taxonomists usually don’t speak to their stakeholders or clients of "application profiles," because such specifications are typically already included within a larger taxonomy governance plan, something taxonomists commonly create and promote. When taxonomists work specifically with metadata experts, however, they should consider the specific needs of an application profile.

Finally, a standard metadata schema, with its predefined labels for metadata elements, can also be considered a kind of (controlled) vocabulary. This is the topic of my next blog post, "Schema Vocabularies and Value Vocabularies."

Saturday, October 18, 2025

Semantic Data Conference 2025

This week I attended the second annual conference “SemanticData: Taxonomy, Ontology, and Knowledge Graphs,” hosted by Henry Stewart (HS) Events and co-located with the HS DAM (Digital Asset Management) conference. I found this conference to be very worthwhile to attend, even without presenting, for its networking opportunities and ideas shared. As a one-day one-track-only conference, it had only 12 speakers, so I was not a speaker again this year, as I was last year, in order to let others speak.

Ideas of Semantics

Semantic data means enriching data with meaning from controlled vocabularies, especially taxonomies, and with meaningful relationships and specific attributes, provided by ontologies. Taxonomies and ontologies are referred to then as “semantic models.” A knowledge graph is a semantic model plus all of the connected data, which is stored in a graph database.

How “semantics” was discussed was up to each speaker. Jessica Talisman gave an overview of semantic models in what she describes as the "semantic pipeline.” In his talk on information ethics, Gary Carlson stayed high-level, stating “Semantics is about moving information from one place to another.” By contrast, Ashleigh Faith focused on the practical application of semantic tags to benefit AI. In his keynote, Ahren Lehnart spoke of the need to trust semantic models and concluded by focusing on the people, listing what “semantic professionals” do, including driving semantic adoptions within an organization, engaging with subject matter experts, seeking out and staying involved in AI projects, targeting high-risk semantic cases, and designing transparency into semantic models.

Turning to practice, Melissa Knudtson Monsalve explained the adoption of “just enough semantics” as a solution for organizations facing challenges of implementing semantic models. The conference also had some interesting case studies. Laura Rodriguez spoke about taxonomy governance strategies undertaken at HealthStream. Tracy Forzaglia explained the use of taxonomy and tagging at Scholastic. Mindy Carner explained the implementation of the DITA structured content standard in conjunction with a controlled vocabulary to manage and deliver Help Center content at LinkedIn. Finally, Dr. Robert Sanderson explained and demonstrated Yale’s LUX Collections Discovery utilizing a cultural heritage ontology and knowledge graph.

Comparisons with Semantic Data 2024

I had blogged about the first conference, Semantic Data 2024, last year. The format was the same: Individual half-hour presentations, the first as a “keynote”, a participant discussion activity, and a panel discussion moderated by the chair. By comparison, the conference was larger this year, up from about 50 attendees to about 70, making the room quite full. Aside from the chair and two of the sponsors, all but one of the speakers were also different this year from last.

Madi Weland Solomon was again the conference chair and moderator, and Factor and Datavid were again sponsors with sponsored talks that were not promotional. Gary Carlson of Factor presented on the importance of data quality in semantic architecture, and Tim Padilla of Datavid presented on the AI-readiness of enterprise data. Progress Software was a new sponsor, but instead of a sponsored talk, Jim Morris of Progress spoke on the closing panel.

Panel: Solomon, Morris, Sanderson, and Faith

The theme of AI (especially generative AI and LLMs) was somewhat more prominent in the conference this year, taken up in almost half of the sessions. Ashleigh Faith’s talk, “How Semantic Tags Benefit AI,” was especially practical and informative. AI was woven through Ahren Lehnart’s opening keynote, when he discussed semantic trends and predictions. Tracy Forzaglia’s case study was about tagging with AI. Finally, the closing panel discussion had a focus on AI this time even in its title “Semantic Architects vs. AI: Who Curates the Future?” In fact, the conference could be title: “Semantic Data: Taxonomy, Ontology, Knowledge Graphs, and AI.” The importance of “human in the loop” with regard to AI and semantic automation was emphasized.

The “roundtable” group discussion members addressed questions of their organization’s semantic maturity, important changes in the past year, and what topics they would like to have addressed next year. This proved to be a popular session, although the large number of attendees required more time than allotted, and the room did not have tables. Perhaps a larger room or two tracks will be needed next year. I hope to participate next fall, if my schedule allows. Meanwhile, those of you in Europe may attend Semantic Data Europe on June 25, 2026, in London.