Monday, August 31, 2015

Taxonomies and Indexes

Taxonomies and indexes are similar in that they both help guide people to find desired information on a selected topic. While they could be searched, they are designed specifically to be browsed. The obvious difference is that taxonomies for end-users are arranged hierarchically (or by facets), and indexes are arranged alphabetically. I have blogged previously on a comparison of index creation and taxonomy/thesaurus creation, but for those who are not already skilled at creating one or the other, let’s step back and further compare taxonomies and indexes themselves.

Taxonomy and Index Similarities and Differences

Taxonomies and indexes were developed for different kinds of media. Modern taxonomies are designed to function well in online implementations (through clicking on hyperlinks to narrower topics or plus signs to expand hierarchical trees), although taxonomies have existed in print as well. Indexes, specifically the back-of-the-book style, are designed to function well in print (through scanning a large number of entries and subentries on a page), although displayed indexes occasionally exist online as site A-Z indexes on small, static websites. Hyperlinked indexes at the end of ebooks are also possible, but the inadequate application of ebook standards have hindered such indexes from becoming commonplace.

Taxonomies and indexes serve different kinds of content. Taxonomies work well for content in a subject area that is easy or logical to categorize: products or product types, industries, geographic areas, occupational areas, media or document types, etc. Indexes work will for content on a subject area that is more abstract and does not lend itself to hierarchical categories: management concepts, history, news, etc. Indexes, since they are arranged alphabetically, are also excellent for browsing names/proper nouns. Taxonomies work well for a defined scope, such as collections of documents of the same type (all resumes, all marketing materials, all legal documents, etc.). Indexes, on the other hand, tend to serve better for content with a less defined scope, such as general encyclopedic information or detailed user manuals. Not surprisingly, book-like content continues to be best served by indexes.

The differences in structure are not as simple as taxonomies being hierarchical and indexes being alphabetical. Taxonomies also have alphabetical aspects, as terms at the same level of a hierarchy are typically (or by default) arranged alphabetically. Indexes, meanwhile, also have hierarchical aspects, as there are main entries with subentries under them. Some large indexes even have a third level of sub-subentries. Then there are kinds of taxonomies, called thesauri, which are structured more around terms and relationships than hierarchical trees, and such thesauri may be arranged alphabetically. In fact, the same thesaurus can be arranged both hierarchically or alphabetically, with the click of a toggle button in a thesaurus management system. But re-sorting a thesaurus alphabetically does not change it into an index. It will still lack the subentry features of an index.

The defining difference between a taxonomy and an index is that an index is not an index unless it is linked to content, as the word “index” means “to indicate” or “to point,” as in to point to content. A taxonomy is still a taxonomy whether or not it is linked to content. (But it is not really useful, unless it is linked to content.)

Where Taxonomies and Indexes Meet

In addition to back-of-the-book indexes, there also exist periodical article indexes, such as the green-bound printed volumes of the Reader’s Guide to Periodical Literature and subsequent online periodical and reference databases accessed through libraries (InfoTrac, ProQuest, EBSCOhost, etc.) What happens is that indexers index the articles with terms from the taxonomy (or thesaurus or controlled vocabulary). The result of the indexing, an alphabetical arrangement of taxonomy terms that were used in the indexing with their links to content, constitutes an index. So, the index comprises terms in the taxonomy that are linked to content and arranged alphabetically. Displayed browsable alphabetical indexes, however, have become less common in online services, as they have been replaced by features that search on the index terms instead.

The trend toward “multi-channel publishing” means that the same original content may appear in different formats and media, such as print and online. Online, however, may mean more than just a PDF or other ebook format of the printed version. Rather, digital text content gets chunked into units of the size or length that could be indexed as a whole with taxonomy terms, and images and new multimedia exist as separate assets that can also be indexed with taxonomy terms.  What this means is that a manual, user guide, or textbook that in print had a back-of-the-book index, in the digital or online medium consists of multiple files for each section or unit and for each media asset, which are indexed and thus retrieved by taxonomy terms instead of using the back-of-the-book index.

Index Entries for Taxonomy Terms?

I have worked on projects were printed content (books, manuals, etc.) were digitized and put into small chunks or files to be indexed with a taxonomy, and the original printed volume had a back-of-the-book index. So, the issue arose: to what extent should the legacy back-of-the-book index be utilized when developing the new digital retrieval taxonomy?  I had access to the index for candidate taxonomy terms and was encouraged to utilize it.

My conclusions have been that the back-of-the-book index serves a slightly different purpose for users than does an indexed taxonomy. A back-of-the-book index serves to locate the page where something was mentioned on a specific topic. Users of a reference work, however, may at other times consult the table of contents to navigate and find the relevant sections and sub-section. A taxonomy serves a purpose that is both, or something in-between, that of a table of contents and a back-of-the-book index. It’s for searching (like in an index) and also for navigating (like in a table of contents), but it points to the subsection level (as in a detailed table of contents), not to a page (as in an index). Also more content is expected to be linked to a taxonomy term (a section unit, and often multiple such units) than content indicated by an index entry (as little as one sentence). So, it would not be right to use all or most of the main entries of a back-of-the-book index to create a taxonomy for the same content.