The activities of back-of-the-book indexing, document/digital asset indexing, and thesaurus/taxonomy creation all require similar skills, but each has its own unique requirements. Indeed a typical career path toward an accidental taxonomist is to first work as an indexer. You might think that the two kinds of indexing are similar to each other and thesaurus creation differs more, but having done all three, I can attest that back-of-the-book indexing and thesaurus/taxonomy creation are more similar to each other than the two kinds of indexing are.
What is indexing
In my previous blog post
“Tagging vs. Indexing,” I explain that indexing involves designating
descriptive terms or labels for what some content is about, and that these
terms are organized into a browsable index.
There are two kinds of indexing:
- “Closed indexing,” or back-of-the-book indexing, where the index is created based solely on concepts that the indexer identifies within the text of a single monograph. The index is created for that one monograph and then is finished ("closed").
- “Open indexing”, or what has been called “database indexing,” for the indexing of articles, documents, content items, or digital assets, whereby the indexer pulls index terms from a controlled vocabulary or thesaurus and assigns them to multiple individual documents or digital assets. The set of content grows over time, and the same terms in the index will point to increasingly more documents over time. It is called “open” indexing, because the task is ongoing. The thesaurus helps ensure consistent indexing over time.
Both kinds of indexing
require the skill of analyzing content to determine what concepts are important
and deserve indexing. The biggest difference between back-of-the-book indexing and
database indexing is that book indexing requires that the indexer additionally invent
the index terms and not merely pull them off of a thesaurus.
What is a thesaurus
I use the designation
thesaurus here, because I mean the type of taxonomy that features the full set
of relationship types between its terms, with each term designating an
unambiguous concept (noun or noun phrase). The relationship types are:
- Hierarchical (broader term/narrower term)
- Equivalence (use/used from “nonpreferred terms” or “synonyms”)
- Associative (related terms)
To best support manual
indexing, the existence of all these different kinds of relationships help direct
the indexers to the most appropriate terms to describe the content they are
indexing. The same thesaurus, or parts of it, may be displayed to the end-users
to help guide them to find the most appropriate terms to describe the idea about
which they are searching for information. The thesaurus thus not only
standardizes the language for the concepts, but also provides a guiding
structure.|
How they are related
Open/database indexing and
thesaurus creation are obviously related, because the thesaurus is used to
support this kind of indexing. In an organization which is involved in such
indexing, it is not unusual for former indexers to become editors of the
thesaurus, since they are already very familiar with it and understand the
needs of the indexer-users.
Closed/book indexing and
thesaurus creation are related, because they both involve the development of original
terms and relationships between them.
Thesaurus and book index similarities and differences
Thesauri and
back-of-the-book indexes both have what can be called multiple points of entry.
In a book index these can be either See cross-references or “double-posts," whereby additional
variant terms or synonyms are included in the index, and they all point to the
same set of page numbers. In a thesaurus, this is the equivalence
relationships, where nonpreferred terms or synonyms point to the preferred
terms (Use/UF). The difference is that a thesaurus distinguishes between the
preferred and nonpreferred terms, whereby double-posts in a book index are
all of equal standing and none is ”preferred.”
Thesauri and
back-of-the-book indexes both have hierarchical structure among their terms. In
a thesaurus there are narrower terms to a broader term (BT/NT). In an index,
there are subentries indented under a main entry. However, these hierarchies
are not identical. In a thesaurus, narrower terms must be generic types,
instances or integral parts of the broader term. In a book index, subentries
are any aspect of the main entry or
merely another concept in combination. In fact, an indexer may choose to switch
the main entry and subentry (the subentry becoming a main entry and the main
entry becoming its subentry) with no problems. Don’t try to do that in a
thesaurus or taxonomy!
Finally, thesauri and
back-of-the-book indexes both have indications of related concepts. Thesauri
have the associative relationship called Related Term (RT), and book indexes
have See also cross-references. While in general these function the same, the
rules for thesauri are stricter. If the “related” terms are really
hierarchical, then they must have the hierarchical relationship instead. In a
book index, it is acceptable to have a See also between two terms where one is
actually broader in meaning to the other.
I will be giving a presentation on this in greater detail at the annual conference of the American Society for Indexing, on April 30, 2015, in Seattle, WA.