Wednesday, July 1, 2015

Taxonomies for Indexing Images

It’s becoming more common to index images with taxonomy terms, instead of just text documents or instead of just keyword-tagging of images. A taxonomy for the subject-indexing of images need not be significantly different than a taxonomy for indexing textual documents, but other metadata differs, and the indexing activity is also quite different.

A dedicated taxonomy for images might be needed for various reasons:
1.    There is no subject-indexing of text documents by an organization.
2.    Different software systems are used by the same organization to manage images and for managing text documents.
3.    Text documents of the same organization are large and thus indexed or cataloged at a broader level.

1.    No text indexing
Some organizations have a large image collection, and that is what they focus their indexing efforts on. They thus design or adapt a taxonomy specific to their image collection. They likely did not have any taxonomy for indexing text. They either don’t find the need for text document search and retrieval, or if they do, they will simply use the search engine instead, since, after all, search engines can search on text, unlike images.

2.    Different systems
Large image collections are increasingly managed in dedicated digital asset management systems, which are designed to support the various metadata associated with images and other nontext media files. Text documents, on the other hand, may be managed in document management systems, record management systems, or collaboration systems such as SharePoint. Each of these kinds of system support some form of controlled vocabulary for tagging content. But if the images are in one system and the text documents are in another system, different controlled vocabularies are likely to be developed. Of course, a generic “content management system” may be used for both images and text documents, but many organizations don’t manage all their content in a single system.

3.    Different levels of indexing detail
The classic example of different levels of detail is for materials at Library of Congress, which had developed Subject Headings for descriptive cataloging for library materials, which are generally monographs, such as books, or video-recordings of films, or sound recordings of music collections. While the subjects of these works might be quite specific, they are often not as specific as an individual graphic material. (An entire book may have numerous specific images.) But over the years, individual images also became part of its collection, and the LC Subject Headings were not specific enough, so the Library of Congress development the Thesaurus for Graphic Materials, which is freely available. The fact that the Thesaurus for Graphic Materials exists does not mean that a dedicated thesaurus for images is always needed, but that it was needed in the context of the Library of Congress collections and the shortcomings of the Library of Congress Subject Headings for indexing images.

If you already have a detailed taxonomy for documents, it certainly can be used for images, as well. Some terms, such as for abstract concepts (such as “Beliefs”), will simply not be needed in the image indexing, whereas a new terms might need to be added (such as the name of a specific type of flower.)

There is definitely unique metadata for images, of which subjects for indexing are just a part. Examples of other possible image metadata includes Creator/photographer, Location shown, Location of creation (camera location), Collection name, Time or part of day (especially if outdoors), Date taken (in contrast to date the image was digitized or edited), Number of people depicted, Copyright, Intended purpose, etc. The Thesaurus for Graphic Materials has had a separate “genre” facet that is very specific for types of graphical works (such as terms for Abstract paintings, Family trees, HVAC drawings, and Magazine covers). Image metadata standards include the IPTC (International Press Telecommunications Council)’s Photo Metadata for photojournalism. Different metadata may be needed for different kinds of images (news, commercial/advertising, art, etc.)

Indexing images is different from indexing text documents. First of all, it’s mostly manual because automation is very limited in image detection (but may be able to detect people’s faces). It’s more subjective as to what is of key importance in an image versus a document. An indexer may also tend to index for what is not actually depicted but for what is implied, which often, but not always, should be avoided.

I recently attended a conference presentation on this subject, “Get the Picture: Use Your Taxonomy to Classify Images” at the SLA conference in Boston earlier this month. The presenter, Ann Poole from Corbis, mentioned various challenges of image indexing, including over-indexing by photographer-submitters, indexing for emotions depicted or implied, and indexing for the backstory of an image in a known place.