Monday, July 17, 2017

Metadata and Taxonomies

Metadata and taxonomies are related. In The Accidental Taxonomist, 2nd edition (pp. 15-18), I explain that most, but not all, taxonomies (not purely navigational taxonomies) serve to populate terms/values in metadata fields/elements; and some, but definitely not all, metadata fields are populated by terms/values from controlled vocabularies or, more specifically, taxonomies (in contrast to free text or key words).

The question remains whether to start with creating the overall metadata strategy and schema and then build taxonomies as part of it as needed, or to start with creating a taxonomy and then, in the process, identify the various descriptive metadata.  Ideally the two are developed for implementation combination, as part of an integrated strategy. However, an expert in taxonomy development (a taxonomist) and an expert in metadata design (a metadata architect) are usually not the same person.

A metadata architect can become an accidental taxonomist, and a taxonomist can become an accidental metadata architect, or the two experts can work together on the same project, although it is not so common for an organization to have both such experts on staff.  Whether an organization has a metadata architect or taxonomist depends on the nature of the organization’s content and content organization needs.

Organizations that start with the metadata expertise and approach to information management tend to be those with significant needs in digital asset management (with image or other media collections), records management (in highly regulated industries), publishing, or cultural preservation (museums or libraries). Organizations that start with the taxonomy expertise and approach include product or service providers, distributors and retailers (especially in ecommerce), and organizations focused on providing information resources.

A hierarchical taxonomy can be integrated with metadata, when one of the metadata fields is for “Topic” or “Subject,” and there is a hierarchical taxonomy of subject terms associated with that field. However, it is the faceted type of taxonomy in particular that unites the tasks of taxonomy creation and metadata design.

Faceted Taxonomies and Metadata

A faceted taxonomy comprises a set of facets, each an individual controlled vocabulary, whose terms are generally not linked/related to terms in the other controlled vocabulary facets, but the combination of terms from a combination of facets are used to tag the same set of content, and users search/filter on terms in combination from various facets. Examples of facets may be Product/Service, Market Segment, Location, Document Type, Supplier, etc. A faceted taxonomy is a common type for both enterprise taxonomies and ecommerce or product review taxonomies, and it’s a type of taxonomy that taxonomists are familiar with creating. It’s called a “taxonomy” even though it differs from the classical hierarchical “tree” type of taxonomy, because it involves controlled vocabulary and classification. The name for each facet and the terms within the facet constitutes a simply two-level hierarchy.

Each facet is also a metadata field/element. The taxonomist designing a faceted taxonomy is thus also designing metadata, at least some of it. There are usually more metadata fields to describe the content beyond those which comprise the taxonomy facets. For a faceted taxonomy to best serve the user who is trying to find/discover content based on what it is and what it is about, the number of facets should be limited. (See my earlier post "How Many Facets.") Metadata, however, can serve additional purposes beyond helping users find content. Metadata may describe content for purposes of full identification, source citation, or information on how the content can be used, including rights data.  The taxonomist or metadata architect needs to decide which metadata fields will constitute a displayed faceted taxonomy for the end-user to utilize in search/discovery, and which metadata fields will not but will rather display on a selected content record.

On the other hand, there may be additional metadata fields beyond the scope and definition of “taxonomy” that are nevertheless made available to the end-user to filter/refine results alongside the other, taxonomy facets. These could be for author/creator, date, title keyword, text keyword, file format, etc. Sometimes the distinction between taxonomy facet and other metadata in this case is not so clear, such as for Document/Content Type, Audience, or Language, when these fields utilize controlled vocabularies. Due to this overlap and blurred distinction between taxonomy facets and displayed metadata for filtering, it is a good idea to design the taxonomy and metadata together as an integrated strategy.

1 comment:

  1. This article is my first article on your blog and I look forward to reading more.

    Ontologists, semantic modellers, taxonomists, data modellers, object modellers, data managers are all on a collision course. Each niche discipline has its strengths and weaknesses. As each discipline evolves to reduce the weaknesses they stray into their "cousin" disciplines. They do seem to be on a convergent path. The key barrier to convergent though is language. As an information and data architect I find myself having to reconcile and translate the terminology, for most of the concepts are common. What would be useful would be to build a thesaurus (another discipline) to bring together the concepts and different terms.

    Are you aware of anything of this nature?