A very common implementation for taxonomies is in content
management systems (CMS). The content managed in this kind of software can be
diverse: office application files, PDF documents, image files, audio files, video
files, and, in the case of web content management systems, also HTML and any
kind of file to be published to the Web. The “management” this kind of software
supports is also diverse: enhancing, annotating, tagging, categorizing,
reviewing, approving, sharing, assigning, publishing, archiving, and
deprecating of content. Finally, the users can be diverse: content creators,
content managers, and anyone in an organization who needs access to a subset of
the content.
Due to the diversity of content types and purposes, the
metadata associated with each content item obviously plays a very important
role in a CMS. As for taxonomies, in the context of a CMS, it is probably best
to consider a taxonomy as a subset of metadata, although the distinction
between taxonomy and metadata can get blurred. Metadata about content can be
descriptive, structural, or administrative. Descriptive metadata comprises the
attributes that help make the content item retrievable or findable, including title, author, source, date,
audience, document type, and also metadata for what the content is about
(abstract, keywords, subjects, etc.) Many of these metadata fields
should be populated with terms that are on controlled vocabulary lists for each
field. In some cases, such as the “subject,” the controlled vocabulary may be
rather large and thus organized into a hierarchy, and thus constitutes a
hierarchical taxonomy of subjects. In
other cases, various aspects of what content is about might be categorized into
different metadata fields with controlled vocabularies, such as: industry, process,
specialty, department, location, etc. As a result, a set of controlled
vocabularies for each field, could be considered as a faceted taxonomy, with
each of these descriptive metadata field functioning as a facet.
With this mind, the task of actually defining the
descriptive metadata fields or taxonomy facets need to involve various stakeholders,
including both users and other experts and managers. Users include the various
people who upload content and will tag the content with metadata and taxonomy
terms, and the various end-users who will browse and search for the content
using the metadata and taxonomy. Other stakeholders to involve from the beginning
may include content managers, metadata architects, content strategists,
business analysts, and user experience designers.
A CMS tends to offer two methods of classification: folders
and tags. Folders (which in a CMS tend to be “virtual” folders, not actual file
directory paths) offer an intuitive user interface for users to put content
into categories and then find the content. Tags, on the other hand, are
appropriate for assigning all kinds of metadata. Typically, if a dominant means
categorizing is identified through conversations with users, such as content type or subject category, this
categorization scheme can be used for the folders, and then all other means of
categorization and classification can be handled with the tags.
Recently a colleague asked me which method I thought was
best for associating subject disciplines with multimedia content stored in a
repository where the system offered both options: put them into folders named
for each discipline or assign metadata tags for the disciplines. The answer, of
course, is “it depends.” It depends on:
- Workflow: Will the files always stay in this repository or will they “travel” downstream to other applications? If the content will likely move to other systems, then tags are preferred.
- Taxonomy size: Is the taxonomy under consideration for folders large? A large set of folders may be cumbersome to browse through and more suitable for type-ahead lookup in a metadata field lookup table or search box.
- User preference: Do users who upload prefer to use folders or tags only? Do users who need to retrieve the content prefer to browse through folders or only search on tags?
- Categorization enforcement: Can you enforce users to assign descriptive tags? If you are concerned that they will not, folders will better enforce the use of the categories.
- Support for hierarchy: Will the system support a hierarchy of categories within the lookup controlled vocabulary lists for the tag fields, or are hierarchies only supported as folders, or neither? Then consider which fields would benefit most from a hierarchy.
- Support for synonyms: Do the lookup controlled vocabulary lists for the tag fields include support for synonyms/alternate labels. If so, and if the controlled vocabulary is large, then tags have the advantage over folders, which cannot have synonym labels.
After determining what part of the categorization system, if
any, goes into folders, and what goes into tags, the next task is to figure out
how many descriptive metadata tag fields to create. Issues include:
- What metadata can be assigned automatically and what must be done manually? If it can be assigned automatically (such as file format type or language by auto-detect software or maybe even subject category by use of auto-categorization software), that’s great, but manually assigned metadata should be limited so as not to make the task burdensome.
- What fields are users likely to search on in retrieval? You need to cover the basics, but there is no need for additional fields that users are not likely to use as lookup criteria.
- What method of classification is important to the users? “Subjects” is a catch-all field, but if users are always thinking of something else too, such Discipline or Product, then these should be pulled out into separate fields or facets.
Finally, when designing taxonomy and metadata for a CMS, the
taxonomist should have use of a test data instance of the system to try out the
implementation of the taxonomy in the CMS user interface. A taxonomy that looks
good offline (in Excel or a taxonomy management system), might appear awkward
within the limitations of a CMS’s user interface.