It’s becoming more common to index images with taxonomy terms, instead of just text documents or instead of just keyword-tagging of images. A taxonomy for the subject-indexing of images need not be significantly different than a taxonomy for indexing textual documents, but other metadata differs, and the indexing activity is also quite different.
A dedicated taxonomy for images might be needed for various reasons:
1. There is no subject-indexing of text documents by an organization.
2. Different software systems are used by the same organization to manage images and for managing text documents.
3. Text documents of the same organization are large and thus indexed or cataloged at a broader level.
1. No text indexing
organizations have a large image collection, and that is what they
focus their indexing efforts on. They thus design or adapt a taxonomy
specific to their image collection. They likely did not have any
taxonomy for indexing text. They either don’t find the need for text
document search and retrieval, or if they do, they will simply use the
search engine instead, since, after all, search engines can search on
text, unlike images.
2. Different systems
collections are increasingly managed in dedicated digital asset
management systems, which are designed to support the various metadata
associated with images and other nontext media files. Text documents, on
the other hand, may be managed in document management systems, record
management systems, or collaboration systems such as SharePoint. Each of
these kinds of system support some form of controlled vocabulary for
tagging content. But if the images are in one system and the text
documents are in another system, different controlled vocabularies are
likely to be developed. Of course, a generic “content management system”
may be used for both images and text documents, but many organizations
don’t manage all their content in a single system.
3. Different levels of indexing detail
classic example of different levels of detail is for materials at
Library of Congress, which had developed Subject Headings for
descriptive cataloging for library materials, which are generally
monographs, such as books, or video-recordings of films, or sound
recordings of music collections. While the subjects of these works might
be quite specific, they are often not as specific as an individual
graphic material. (An entire book may have numerous specific images.)
But over the years, individual images also became part of its
collection, and the LC Subject Headings were not specific enough, so the
Library of Congress development the Thesaurus for Graphic Materials,
which is freely available. The fact that the Thesaurus for Graphic
Materials exists does not mean that a dedicated thesaurus for images is
always needed, but that it was needed in the context of the Library of
Congress collections and the shortcomings of the Library of Congress
Subject Headings for indexing images.
If you already have a detailed taxonomy for
documents, it certainly can be used for images, as well. Some terms,
such as for abstract concepts (such as “Beliefs”), will simply not be
needed in the image indexing, whereas a new terms might need to be added
(such as the name of a specific type of flower.)
definitely unique metadata for images, of which subjects for indexing
are just a part. Examples of other possible image metadata includes
Creator/photographer, Location shown, Location of creation (camera
location), Collection name, Time or part of day (especially if
outdoors), Date taken (in contrast to date the image was digitized or
edited), Number of people depicted, Copyright, Intended purpose, etc.
The Thesaurus for Graphic Materials has had a separate “genre” facet
that is very specific for types of graphical works (such as terms for
Abstract paintings, Family trees, HVAC drawings, and Magazine covers).
Image metadata standards include the IPTC (International Press
Telecommunications Council)’s Photo Metadata for photojournalism. Different metadata may be needed for different kinds of images (news, commercial/advertising, art, etc.)
images is different from indexing text documents. First of all, it’s
mostly manual because automation is very limited in image detection (but
may be able to detect people’s faces). It’s more subjective as to what
is of key importance in an image versus a document. An indexer may also
tend to index for what is not actually depicted but for what is implied,
which often, but not always, should be avoided.
I recently attended a conference presentation on this subject, “Get the Picture: Use Your Taxonomy to Classify Images”
at the SLA conference in Boston earlier this month. The presenter, Ann
Poole from Corbis, mentioned various challenges of image indexing,
including over-indexing by photographer-submitters, indexing for
emotions depicted or implied, and indexing for the backstory of an image
in a known place.