Metadata and taxonomies are related. In The Accidental Taxonomist, 2nd edition (pp. 15-18), I explain that most, but not all, taxonomies (not purely navigational taxonomies) serve to populate terms/values in metadata fields/elements; and some, but definitely not all, metadata fields are populated by terms/values from controlled vocabularies or, more specifically, taxonomies (in contrast to free text or key words).
The question remains whether to start with creating the
overall metadata strategy and schema and then build taxonomies as part of it as
needed, or to start with creating a taxonomy and then, in the process, identify
the various descriptive metadata.
Ideally the two are developed for implementation combination, as part of
an integrated strategy. However, an expert in taxonomy development (a
taxonomist) and an expert in metadata design (a metadata architect) are usually
not the same person.
A metadata architect can become an accidental taxonomist,
and a taxonomist can become an accidental metadata architect, or the two experts
can work together on the same project, although it is not so common for an
organization to have both such experts on staff. Whether an organization has a metadata
architect or taxonomist depends on the nature of the organization’s content and
content organization needs.
Organizations that start with the metadata expertise and
approach to information management tend to be those with significant needs in
digital asset management (with image or other media collections), records
management (in highly regulated industries), publishing, or cultural
preservation (museums or libraries). Organizations that start with the taxonomy
expertise and approach include product or service providers, distributors and
retailers (especially in ecommerce), and organizations focused on providing
information resources.
A hierarchical taxonomy can be integrated with metadata,
when one of the metadata fields is for “Topic” or “Subject,” and there is a
hierarchical taxonomy of subject terms associated with that field. However, it is
the faceted type of taxonomy in particular that unites the tasks of taxonomy
creation and metadata design.
Faceted Taxonomies and Metadata
A faceted taxonomy comprises a set of facets, each an
individual controlled vocabulary, whose terms are generally not linked/related
to terms in the other controlled vocabulary facets, but the combination of terms
from a combination of facets are used to tag the same set of content, and users
search/filter on terms in combination from various facets. Examples of facets
may be Product/Service, Market Segment, Location, Document Type, Supplier, etc.
A faceted taxonomy is a common type for both enterprise taxonomies and
ecommerce or product review taxonomies, and it’s a type of taxonomy that
taxonomists are familiar with creating. It’s called a “taxonomy” even though it
differs from the classical hierarchical “tree” type of taxonomy, because it
involves controlled vocabulary and classification. The name for each facet and
the terms within the facet constitutes a simply two-level hierarchy.
Each facet is also a metadata field/element. The taxonomist
designing a faceted taxonomy is thus also designing metadata, at least some of
it. There are usually more metadata fields to describe the content beyond those
which comprise the taxonomy facets. For a faceted taxonomy to best serve the
user who is trying to find/discover content based on what it is and what it is
about, the number of facets should be limited. (See my earlier post "How Many Facets.")
Metadata, however, can serve additional purposes beyond helping users find
content. Metadata may describe content for purposes of full identification,
source citation, or information on how the content can be used, including
rights data. The taxonomist or metadata
architect needs to decide which metadata fields will constitute a displayed
faceted taxonomy for the end-user to utilize in search/discovery, and which
metadata fields will not but will rather display on a selected content record.
On the other hand, there may be additional metadata fields
beyond the scope and definition of “taxonomy” that are nevertheless made
available to the end-user to filter/refine results alongside the other, taxonomy
facets. These could be for author/creator, date, title keyword, text keyword,
file format, etc. Sometimes the distinction between taxonomy facet and other
metadata in this case is not so clear, such as for Document/Content Type,
Audience, or Language, when these fields
utilize controlled vocabularies. Due to this overlap and blurred distinction
between taxonomy facets and displayed metadata for filtering, it is a good idea
to design the taxonomy and metadata together as an integrated strategy.
This article is my first article on your blog and I look forward to reading more.
ReplyDeleteOntologists, semantic modellers, taxonomists, data modellers, object modellers, data managers are all on a collision course. Each niche discipline has its strengths and weaknesses. As each discipline evolves to reduce the weaknesses they stray into their "cousin" disciplines. They do seem to be on a convergent path. The key barrier to convergent though is language. As an information and data architect I find myself having to reconcile and translate the terminology, for most of the concepts are common. What would be useful would be to build a thesaurus (another discipline) to bring together the concepts and different terms.
Are you aware of anything of this nature?