Saturday, March 15, 2014

Indexing vs. Thesaurus Creation


The activities of back-of-the-book indexing, document/digital asset indexing, and thesaurus/taxonomy creation all require similar skills, but each has its own unique requirements. Indeed a typical career path toward an accidental taxonomist is to first work as an indexer. You might think that the two kinds of indexing are similar to each other and thesaurus creation differs more, but having done all three, I can attest that back-of-the-book indexing and thesaurus/taxonomy creation are more similar to each other than the two kinds of indexing are.

What is indexing

In my previous blog post “Tagging vs. Indexing,” I explain that indexing involves designating descriptive terms or labels for what some content is about, and that these terms are organized into a browsable index.  There are two kinds of indexing:

  1. “Closed indexing,” or back-of-the-book indexing, where the index is created based solely on concepts that the indexer identifies within the text of a single monograph. The index is created for that one monograph and then is finished ("closed").
  2. “Open indexing”, or what has been called “database indexing,” for the indexing of articles, documents, content items, or digital assets, whereby the indexer pulls index terms from a controlled vocabulary or thesaurus and assigns them to multiple individual documents or digital assets. The set of content grows over time, and the same terms in the index will point to increasingly more documents over time. It is called “open” indexing, because the task is ongoing. The thesaurus helps ensure consistent indexing over time.

Both kinds of indexing require the skill of analyzing content to determine what concepts are important and deserve indexing. The biggest difference between back-of-the-book indexing and database indexing is that book indexing requires that the indexer additionally invent the index terms and not merely pull them off of a thesaurus.

What is a thesaurus

I use the designation thesaurus here, because I mean the type of taxonomy that features the full set of relationship types between its terms, with each term designating an unambiguous concept (noun or noun phrase). The relationship types are:
  • Hierarchical (broader term/narrower term)
  • Equivalence (use/used from “nonpreferred terms” or “synonyms”)
  • Associative (related terms)
To best support manual indexing, the existence of all these different kinds of relationships help direct the indexers to the most appropriate terms to describe the content they are indexing. The same thesaurus, or parts of it, may be displayed to the end-users to help guide them to find the most appropriate terms to describe the idea about which they are searching for information. The thesaurus thus not only standardizes the language for the concepts, but also provides a guiding structure.|

How they are related

Open/database indexing and thesaurus creation are obviously related, because the thesaurus is used to support this kind of indexing. In an organization which is involved in such indexing, it is not unusual for former indexers to become editors of the thesaurus, since they are already very familiar with it and understand the needs of the indexer-users.

Closed/book indexing and thesaurus creation are related, because they both involve the development of original terms and relationships between them.

Thesaurus and book index similarities and differences

Thesauri and back-of-the-book indexes both have what can be called multiple points of entry. In a book index these can be either See cross-references or “double-posts," whereby additional variant terms or synonyms are included in the index, and they all point to the same set of page numbers. In a thesaurus, this is the equivalence relationships, where nonpreferred terms or synonyms point to the preferred terms (Use/UF). The difference is that a thesaurus distinguishes between the preferred and nonpreferred terms, whereby double-posts in a book index are all of equal standing and none is ”preferred.”

Thesauri and back-of-the-book indexes both have hierarchical structure among their terms. In a thesaurus there are narrower terms to a broader term (BT/NT). In an index, there are subentries indented under a main entry. However, these hierarchies are not identical. In a thesaurus, narrower terms must be generic types, instances or integral parts of the broader term. In a book index, subentries are any aspect of the main entry or merely another concept in combination. In fact, an indexer may choose to switch the main entry and subentry (the subentry becoming a main entry and the main entry becoming its subentry) with no problems. Don’t try to do that in a thesaurus or taxonomy!

Finally, thesauri and back-of-the-book indexes both have indications of related concepts. Thesauri have the associative relationship called Related Term (RT), and book indexes have See also cross-references. While in general these function the same, the rules for thesauri are stricter. If the “related” terms are really hierarchical, then they must have the hierarchical relationship instead. In a book index, it is acceptable to have a See also between two terms where one is actually broader in meaning to the other.

I will be giving a presentation on this in greater detail at the annual conference of the American Society for Indexing, on April 30, 2015, in Seattle, WA.