Saturday, January 30, 2016

Polyhierarchy in the SharePoint Term Store



Last year I had the opportunity to create some taxonomy in the SharePoint Term Store (also called Managed Metadata), and while I am pleased that hierarchical taxonomies are supported in this widely used platform, I had some concerns about the support of polyhierarchy, as information about this capability is inconsistent. So I experimented further. 

Polyhierarchy means a taxonomy term has more than one broader term or parent term. In a traditional hierarchical taxonomy structure, a term has one broader term (unless it is the top term, in which case it has no broader term) and multiple narrower terms. Occasionally, though, the logic of the hierarchy and the practical need to guide users down different possible paths, makes it beneficial to give a term two or more broader terms. It may appear to the user that the term is duplicated in different locations in the taxonomy, but this duplication is in appearances only, because it is the same term and thus linked/indexed to the same content, no matter which broader term path the user clicked down through.

An example would be the term Financial report, which is shown in Figure 1 screenshot from the SharePoint Term Store.
Fig. 1 Financial report as a narrower to the term Financial documents.

It would be practical to have a broader term of Financial documents and another broader term of Reports. Some users will look for the term under Financial documents, and other users will look for it under Reports.

The SharePoint 2010 or 2013 Term Store claims to support the creation of polyhierarchy, but it has significant limitations.

Polyhierarchy permitted only across different hierarchies

 

The support of polyhierarchy in the SharePoint Term Store, takes the notion of “polyhierarchy” too literally by insisting that the two broader terms of a term in a polyhierarchy actually belong to different hierarchies. This means that the polyhierarchy can only be created across different Term Sets in SharePoint. A Term Set is a hierarchy or a facet with a single top term. It is prohibited to create a polyhierarchy within the same Term Set. This is quite problematic, because I find that the vast majority of the time that I want to create a polyhierarhcy it is within the same top-level hierarchy for facet. 

In the example of Financial report, it is logical to have two broader terms of Financial documents and Reports. Both of these broader terms, however, are within the same Term Set or facet, which I might call Document type, so the SharePoint Term Store will not permit this polyhierarchy. Having the term Financial documents appear under a second broader term within any other Term Set or facet, on the other hand, such as the Department or Location facet, is permitted by SharePoint, but this would not be a correct hierarchical structure by taxonomy standards. 

Only one method to create polyhierachy

 

In the SharePoint Term Store, you cannot create a broader term relationship; you can create only narrower term relationships. Thus, you can only create hierarchies from the top down. The normal way to create a polyhierarchy, however, is to add a second broader term relationship, but this is not possible in SharePoint. Instead, the same term has to be made as a narrower term to a second term.

So, if  you have the term Financial report as narrower to Financial documents, and you want to make Reports also a broader term (and Reports exists in another Term Set), you would go to the second term that will be the new broader term (Reports), click on Create Term, and type in the name of an existing term (Financial report). SharePoint, however, does not enforce taxonomy standards and permits you to create a new term with the same name as another term (Financial report), but it will not be the same term. You can see at the bottom of the General information pane, that the duplicate Financial report term’s unique identifier is different from the original Financial reports term., as shown in Figure 2.

Fig. 2 General Information for a selected term


This matters, because terms are used for indexing/tagging. The term with one ID in one location may be indexed to some of the content, and the term with the other ID in the other location will be indexed to other content, and neither term will be indexed to all the content. This would be bad for retrieval. So, this method should not be used to create polyhierarchy.

To create polyhierarchy in SharePoint, go to a second term that is intended to be the additional broader term (Reports), click on Create Term and type in the name of an existing term (Financial report). You will see at the bottom of the screen “Suggestions” with the start of the suggested matching, with yellow highlighted type-ahead matching, to existing terms in another Term Set or even another taxonomy group. If you select one of these suggested terms, then you will indeed be creating a polyhierarchy. After doing so, you will notice that the tag icon preceding the term becomes the “reused tag” icon, as shown in Figure 3, in both locations, under the new broader term and under the existing broader term. You will also notice that when you select the term and view its General details that the data in the box under Member Of shows that the term is a member of both hierarchies.
Fig. 3 Reused tag example for the term Marketing


Importing a taxonomy with polyhierarchy

 

If you import an externally created taxonomy in CSV format as a Term Set via the Term Store’s import feature and that taxonomy has polyhierarchy, the Term Store will not recognize the polyhierarchy, but rather will treat the polyhierarhcy terms as distinct terms with duplicate names, assigning them unique IDs. Thus, they could be used inconsistently in indexing/tagging. Therefore, you should ensure that imported CSV taxonomies should not have any polyhierarchy.

If you import a taxonomy created in an external taxonomy/thesaurus/ontology management system which permits polyhierarchy, and that software has a feature or connector to import to SharePoint Term Store, there are different methods of dealing with the polyhierarchy issue. The default of some software, such as Semaphore Ontology Editor and TopBraid Enterprise Vocabulary Net, is to retain only one of the pair of broader term relationships upon export. For example, in Semaphore, the first hierarchical relationship encountered for a term is retained and any other are not, but the user gets an alert. Wordmap also provides a validation error if there is a polyhierarchy for import into the same Term Set.  Rather than maintaining a random one of more than one broader term relationship, Synaptica strips out all broader term relationships if there are more than one, and then the former polyhierarchy terms show up on the orphan term list for review. In some software, such as TopBraid EVN, the user can define quality/validation rules that would identify polyhierarchy, so the user can remove any before importing into SharePoint. Other software vendors, such as Data Harmony and PoolParty, say they have work-arounds for the SharePoint import to sort of support polyhierarchy, but I have not tested these.

In conclusion, the Term Store’s support of polyhierarchy only across Term Sets (hierarchies or facets) is not very useful, since the majority of time that we would want to create a polyhierarchy, it is within the same Term Set, especially if the Term Set is to be used as a facet. A term with the same name in more than one facet typically would have a slightly different meaning and usage.

Thursday, December 31, 2015

Vocabularies and Controlled Vocabularies

I have long considered a taxonomy as a particular, structured kind of controlled vocabulary. More recently, however, I have been hearing of “vocabularies” without the word “controlled” in front, although still for the purposes of information management and retrieval, which is cause to wonder: are controlled vocabularies and vocabularies the same thing or not?

Controlled Vocabularies


Definition

It’s the standards that drive the definitions and also the scope of meaning. “Controlled vocabularies” have been most authoritatively defined and scoped by ANSI/NISO Z39.19-2005 Guidelines for the construction, format, and management of monolingual controlled vocabularies. The Standard’s glossary defines it as: “A list of terms that have been enumerated explicitly.” Vocabulary control is an important part of the definition of controlled vocabularies, whereby synonyms are linked together, homographs are distinguished, and unambiguous concepts are defined or scoped.

Although not part of the standard’s name, ISO 25964 Thesauri and interoperability with other vocabularies (parts 1 and 2 published in 2011 and 2013) also defines controlled vocabularies in its glossary, where it states that a controlled vocabulary is a “prescribed list of terms, headings or codes, each representing a concept.” It is also noted: “Controlled vocabularies are designed for applications in which it is useful to identify each concept with one consistent label, for example when classifying documents, indexing them and/or searching them.”

Scope
As for what is included within the scope of controlled vocabularies, ANSI/NISO Z39.19-2005 states in its Scope section, on the first page that controlled vocabularies include:
  • Lists of controlled terms
  • Synonym rings
  • Taxonomies
  • Thesauri
In the ISO 25964, the scope of inclusion of controlled vocabularies is less clear. In the glossary definition for controlled vocabulary, it states: “Thesauri, subject heading schemes and name authority lists are examples of controlled vocabularies,” but a complete list of controlled vocabularies is not presented.

What is significant is that ISO 25964 does make a distinction between “controlled vocabulary” and just vocabulary. ISO 25964 describes more kinds of vocabularies, but then addresses the issue of vocabulary control in each.  Types of vocabularies that ISO 25964 discusses as having vocabulary control are:
  • Thesauri
  • Classification schemes
  • Classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Name authority lists
According to ISO 25964 part 2, terminologies and ontologies usually have vocabulary control, but vocabulary control is not a requirement. So, it can be inferred that most but not all terminologies (discussed in my last blog post) or ontologies are controlled vocabularies. Name authority lists are “usually controlled vocabularies” according to ISO 25964 part 2 (section 23.1.1). Synonym rings do not have vocabulary control (section 24.2.3).

Structured Vocabularies


Definition

There is another designation less commonly used of “structured vocabulary.” It appears in the name of the British Standard, BS 8723 Structured vocabularies for information retrieval – Guide. BS 8723 was published in five parts over 2005 – 2008, revising and expanding on the earlier BS and ISO standards for monolingual and multilingual thesauri, and, in turn, became the basis for the current ISO 25964 pair of standards.

ISO 25964 also includes “structured vocabulary” in its glossary, defined as an “organized set of terms, headings or codes representing concepts and their inter-relationships, which can be used to support information retrieval,” and goes on to note: “A structured vocabulary can also be used for other purposes. In the context of information retrieval, the vocabulary needs to be accompanied by rules for how to apply the terms.”  Meanwhile, ANSI/NISO Z39.19-2005 does not mention “structured vocabularies.”

Scope
As for what is included within the scope of structured vocabularies, while that is not so clearly stated, it can be assumed, based on the title of BS 8723 Structured vocabularies for information retrieval – Guide, that the vocabularies included within the standard are all “structured vocabularies.” These are:
  • Thesauri
  • Classification schemes
  • Business classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Authority lists
ISO 25964 seems to use “vocabularies” and “structured vocabularies” somewhat interchangeably. While the standard’s title refers to “thesauri and … other vocabularies,” its foreword states “ISO 25964-2 will cover interoperability between different thesauri and with other types of structured vocabulary, such as classification schemes, name authority lists, ontologies, etc.”

If all the types of vocabularies in part 2 are indeed considered as “structured vocabularies” then the scope of structured vocabularies would cover:
  • Thesauri
  • Classification schemes
  • Classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Terminologies
  • Name authority lists
  • Synonym rings
The last two, however, might not be included as structured vocabularies. ISO 25964 part 2 says that name authority lists “may also be structured vocabularies” (23.1.1), implying that they are not always structured vocabularies, and it also explains that synonym rings are “not hierarchically structured.”

Vocabularies


The simple one-word designation of “vocabulary,” when used in the context of support for information retrieval, comprises all controlled and structured vocabularies, including those at the margin of the definitions or not always meeting their strict requirements of controlled or structured vocabularies, such as ontologies, terminologies, name authority lists, and synonym rings, along with other flat (unstructured) term lists.

Vocabularies, not necessarily controlled or structured, are also what are referred to in other frameworks or web contexts, such as SKOS (simple knowledge organization system) vocabularies, Semantic Web Vocabularies, and Linked Open Vocabularies.

What is interesting to note is what other topics are being discussed when the terms “controlled vocabulary” and “vocabulary” alone are used in ISO 25964 part 2 Interoperability with other vocabularies.  Controlled vocabularies are discussed in context of entry terms, pre-coordination, post-coordination, near synonyms, and indexing. Vocabularies in general are discussed in context of equivalence mapping, interoperability, resources and authorities, registries, multilingual types, and management software/systems.

Conclusions


Taxonomies, thesauri, subject heading schemes, and classification schemes are both controlled vocabularies and structured vocabularies. Most controlled vocabularies are structured vocabularies, and almost all structured vocabularies are controlled vocabularies.  But there are other vocabularies that do not meet the criteria of one definition or another, and to recognize and include them, especially as resources or for the mapping of terms, we refer to them as just vocabularies.

Friday, November 27, 2015

Taxonomies and Terminologies

The current specialties of taxonomy management and terminology management have different histories and serve different purposes, but they are in fact closely related, and taxonomies and terminologies can be linked to share knowledge. At the annual Taxonomy Boot Camp conference in Washington, DC, earlier this month I met a terminologist attendee (Beate Früh of Büro b3) from Germany, who explained to me that the fields are quite similar, and that’s why she was attending a taxonomy conference. Also at the conference I met a vendor of a new software company (Jochen Hummel, CEO of Coreon), whose product provides both taxonomy and terminology management.

As with the field of taxonomies and taxonomy management, there are varying definitions of terminologies and terminology management.  The original meanings of both taxonomy and terminology are as fields of study, with taxonomy being the study of naming and classifying and terminology being the study of terms and their use. More commonly though, we refer to taxonomies and terminologies as sets of terms or concepts for a particular subject area or purpose.

Definitions of terminology include “technical or special terms used in a business, art, science, or special subject” (www.merriam-webster.com), and a “set of designations belonging to one special language” (ISO 1087-1:2000, 3.5.1), with “each designation representing a concept” ISO 25964-2:2013. According to International Information Centre for Terminology (InfoTerm): "The systematic organization and definition of concepts is called terminology management – which also includes classification.” (T.E.R.M.I.N.O.L.O.G.Y. PDF)

Differences


There are several differences between taxonomies and terminologies. The most obvious difference is that taxonomies have hierarchical relationships between the terms/concepts so as to create an overall hierarchical structure, and terminologies generally do not. Other differences are that terminologies contain more detailed terms than are found in a taxonomy for a comparable subject area.  Furthermore, while taxonomies are limited to nouns and noun phrases (including verbal nouns), terminologies may contain some specific adjectives. Terminologies generally include definitions for every term, which is not so typical for taxonomies. Many terminologies are used  to support foreign language translation, so there are usually foreign language equivalents for every term, something found in only a small minority of taxonomies. In general, there is more data for a term in a terminology than in a taxonomy.

The most significant difference between taxonomies and terminologies is how they are used. Taxonomies serve information retrieval, through a combination of indexing/tagging use and browsing/navigation and/or search support. Rather than serve information retrieval, the main purposes of terminologies are to support standard use of terms, especially technical terms, with agreed-upon meaning for creating technical documentation and for foreign language translations. Translation has historically been the field of greatest use of terminologies. As such, many terminologists have a background in translation or linguistics. The co-authors of a leading book in the field of terminology, Handbook of Terminology Management, are both professors of translation.

Another difference is in regional use. Taxonomies are especially widely used in the United States and other English-speaking countries, while growing elsewhere too, whereas terminologies are more widely used in Europe and bilingual countries such as Canada. Member organizations of Infoterm, the independent international association focused on terminology, include numerous organizations in Europe, a few in each of Africa, Asia, Latin America, and Canada, but there are no organizations in the United States.

Finally, there are a greater number of standards for terminologies. There are a large number of currently published standards of ISO committee 37 for Terminology and Other Language and Content Resources, including five standards of the Principles and Methods subcommittee, 14 of the Terminographical and Lexicographical Working Methods subcommittee, and five standards of the Systems to Manage Terminology, Knowledge and Content subcommittee, including ISO 30042:2008 TermBase eXhange (TBX). For taxonomies, on the other hand, standards are fewer, or, if considering specifically taxonomies, there actually are no standards, as the most relevant standards are for thesauri (ISO 25964 or ANSI/NISO Z39.19), ontologies (OWL, based on RDF), or more broadly web-based knowledge organization systems(SKOS).

Similarities


Despite their differences, taxonomies and terminologies both are kinds of vocabularies or controlled vocabularies (depending on how “controlled vocabulary” is defined, the topic of my next blog post). The international standard ISO 25964 Thesauri and interoperability with other vocabularies, (part 1 in 2011 and part 2 in 2013) discusses the following “other” vocabularies (as listed in its table of contents): classification schemes, taxonomies, subject heading schemes, ontologies, terminologies, name authority lists, and synonym rings. Thus, terminologies are listed right along with taxonomies and ontologies. The United States standard ANSI/NISO Z39.19-2005 Guidelines for the Construction, Format and Management of Monolingual Controlled Vocabularies, however, does not include terminologies in its more limited scope: “Controlled vocabularies covered in by this Standard includes lists of controlled terms, synonyms rings, taxonomies, and thesauri.” (Section 2 Scope).

The most important similarity is that both taxonomies and terminologies refer to terms and unique concepts and not to mere words. As such, they often include and bring together synonyms or other variants to disambiguate concepts. While terminologies don’t characteristically have relationships between terms, they sometimes do.

Linkages


Due to these similarities, it is quite feasible to have connections, links, mappings, etc., between terms in a taxonomy and in a terminology.  Taxonomies and terminologies for internal content within the same organization will have a lot of overlap, so it makes sense to leverage the same knowledge bases and either reuse the same terms in taxonomies and terminologies or at least link/map the equivalencies, both to save effort and to ensure consistency of understanding across and organization. ISO 25964-2 Thesauri and interoperability with other vocabularies includes a section on guidelines for the interoperability between thesauri (and, by extension, taxonomies) and terminologies:
  • Concepts may be mapped between a thesaurus and a terminology, and should follow the same methods and best practices as mapping between two thesauri (22.3.2)
  • Terminologies are useful as sources for concept of terms when building or maintaining a thesaurus. They can also be referred to when writing scope notes. (22.3.3)
  • A search thesaurus or synonym ring may be built using a combination of a thesaurus and a terminology. (22.3.4)

Hopefully, more organizations will be developing both taxonomies and terminologies where they are lacking and also build connections between the two.

Find out more about terminologies


Tuesday, October 6, 2015

Taxonomies and Tables of Contents

A table of contents and a hierarchical taxonomy appear to be quite similar. In my last blog post I looked at taxonomies and indexes, and in the end concluded: “A taxonomy serves a purpose that is both, or something in-between, that of a table of contents and a back-of-the-book index. It’s for searching (like in an index) and also for navigating (like in a table of contents), but it points to the subsection level (as in a detailed table of contents), not to a page (as in an index).” Taxonomies, especially the thesaurus kind, have many similarities to indexes when it comes to looking up a topic. Taxonomies, especially the hierarchical kind, are also similar to a table of contents or the navigation aid to a set of content.

Despite the apparent similarities in hierarchical structure and the the purpose of supporting browse navigation, the differences between a table of contents and a hierarchical taxonomy, however, are far greater than the differences between a displayed index and a search-supporting thesaurus.

A table of contents provides navigation, whether for a printed book or large document or for an electronic document or collection. In fact, in a MS Word document with headings, a table of contents that is generated in the left margin pane from those headings is called “Navigation.” Labels in a table of contents or navigation system are arranged like a taxonomy but are not exactly a kind of taxonomy.

Navigation is not a taxonomy

 

Navigation or a table of contents has to perfectly reflect the content that it belongs to. It is completely customized. Two books on the same subject cannot have the same table of contents.  The same taxonomy, however, may be used for more than one content source and typically is. In a table of contents or navigation, each navigation entry, menu label, or heading matches one-to-one to a single, specific section or web page.  Terms in a taxonomy are intended to be used more than once, so each term in a taxonomy is linked to multiple documents or content items.  As such, taxonomy terms need to be somewhat generic, whereas labels or headings in a table of contents or navigation can be specific. Taxonomy terms also need to be created with the anticipation of serving not only current content but also future content, whereas navigation or table of contents entries need only reflect the current content.

Different label wording 

In addition to being more generic, taxonomy terms differ from table of contents entries or navigation labels in other ways.

  • The names of chapters and headings may be longer descriptions (such as “Procedures to Enhance the Accuracy and Integrity of Information Furnished”), whereas taxonomy terms should be concise to aid skimming. A complex topic with a complex heading, can be covered with a combination of taxonomy terms instead of a single complex term, because taxonomy terms do not need to match all content one-to-one (such as the combination of terms: Information accuracy, Information integrity, and Information-gathering procedures).
  • The names of chapters and headings might be question phrases (such as “Why study statistics?”), whereas taxonomy terms should be nouns or adjective-noun phrases and start off with a “keyword” likely to be looked up (not “Why”) to support alphabetical lookup options. Even in a hierarchical taxonomy display, a list of terms at the same hierarchical level tend to be arranged alphabetically.
  • Table of contents entries may be context-specific based on the parent/broader level (such as “Identification and General Terms” or “Special Concerns”), and, in fact, the same sub-heading could repeat under different broader headings. In a taxonomy, each term should be independently unambiguous.
  • Table of contents often start off naming introductory information (such as “Introduction to Identity Theft”) or have sections for Conclusions, neither of which should be terms in a taxonomy. If the same topic is covered three times, in an introduction, body, and conclusions, it will be indexed with the same single taxonomy term, and the end-user will retrieve all indexed results on that topic grouped together.
  • Table of contents or navigation headings can be like titles, which may be “catchy” or enticing to the reader, especially at the top level. Taxonomy terms, by contrast, are clear, concise, and common (based on what most users would call the concept), and not especially creative.

Different structure

 

Tables of contents and taxonomies also differ in their structure. Tables of contents or navigation schemes reflect the organization of content, which may be chronological, pedagogical, from fundamental to detailed, from most important to least important, or the order of perceived user interest. In a taxonomy, the terms at each hierarchical level are arranged alphabetically by default. In a navigation there are no “related terms”, so what appear as subtopics might not be taxonomical narrower terms, but just related terms. Taxonomies, on the other hand, must follow the ANSI/NISO Z39.19 guidelines or ISO 25964 with respect to structuring hierarchical relationships: narrower terms bust be specific types, instances, or integral parts of their broader terms.  By having this standard format, a taxonomy provides organizational predictability for all kinds of users and all kinds of content.

There are certain editorial conventions for content, such as having units of a roughly standard length, which then impact the table of contents or navigation. While there are some variations, one chapter or section is typically not twice as long as another. To achieve balance, a large topic may be spread out over two or more sections, whereas several small topics are grouped together under a heading that is a serial list (such as “Poverty, Inequality, and Mobility”), or under “Other.” Thus, a table of contents topics are based on the amount of material presented. Taxonomy structure, on the other hand, looks at the terms/concepts only, and does not take into consideration the amount of content per term. There is once concept per term, not a list. Rare occurrences of two concepts combined into a single term, such as “Author voice and tone,” are the consequence of two topics being very closely related with overlapping meaning and usage.

Conclusions


While a table of contents or navigation system is not a taxonomy, nor should it be used as a taxonomy, when a legacy print source is converted to units of digital content, a table of contents is still an excellent source for creating a taxonomy.




Monday, August 31, 2015

Taxonomies and Indexes



Taxonomies and indexes are similar in that they both help guide people to find desired information on a selected topic. While they could be searched, they are designed specifically to be browsed. The obvious difference is that taxonomies for end-users are arranged hierarchically (or by facets), and indexes are arranged alphabetically. I have blogged previously on a comparison of index creation and taxonomy/thesaurus creation, but for those who are not already skilled at creating one or the other, let’s step back and further compare taxonomies and indexes themselves.

Taxonomy and Index Similarities and Differences


Taxonomies and indexes were developed for different kinds of media. Modern taxonomies are designed to function well in online implementations (through clicking on hyperlinks to narrower topics or plus signs to expand hierarchical trees), although taxonomies have existed in print as well. Indexes, specifically the back-of-the-book style, are designed to function well in print (through scanning a large number of entries and subentries on a page), although displayed indexes occasionally exist online as site A-Z indexes on small, static websites. Hyperlinked indexes at the end of ebooks are also possible, but the inadequate application of ebook standards have hindered such indexes from becoming commonplace.

Taxonomies and indexes serve different kinds of content. Taxonomies work well for content in a subject area that is easy or logical to categorize: products or product types, industries, geographic areas, occupational areas, media or document types, etc. Indexes work will for content on a subject area that is more abstract and does not lend itself to hierarchical categories: management concepts, history, news, etc. Indexes, since they are arranged alphabetically, are also excellent for browsing names/proper nouns. Taxonomies work well for a defined scope, such as collections of documents of the same type (all resumes, all marketing materials, all legal documents, etc.). Indexes, on the other hand, tend to serve better for content with a less defined scope, such as general encyclopedic information or detailed user manuals. Not surprisingly, book-like content continues to be best served by indexes.

The differences in structure are not as simple as taxonomies being hierarchical and indexes being alphabetical. Taxonomies also have alphabetical aspects, as terms at the same level of a hierarchy are typically (or by default) arranged alphabetically. Indexes, meanwhile, also have hierarchical aspects, as there are main entries with subentries under them. Some large indexes even have a third level of sub-subentries. Then there are kinds of taxonomies, called thesauri, which are structured more around terms and relationships than hierarchical trees, and such thesauri may be arranged alphabetically. In fact, the same thesaurus can be arranged both hierarchically or alphabetically, with the click of a toggle button in a thesaurus management system. But re-sorting a thesaurus alphabetically does not change it into an index. It will still lack the subentry features of an index.

The defining difference between a taxonomy and an index is that an index is not an index unless it is linked to content, as the word “index” means “to indicate” or “to point,” as in to point to content. A taxonomy is still a taxonomy whether or not it is linked to content. (But it is not really useful, unless it is linked to content.)

Where Taxonomies and Indexes Meet


In addition to back-of-the-book indexes, there also exist periodical article indexes, such as the green-bound printed volumes of the Reader’s Guide to Periodical Literature and subsequent online periodical and reference databases accessed through libraries (InfoTrac, ProQuest, EBSCOhost, etc.) What happens is that indexers index the articles with terms from the taxonomy (or thesaurus or controlled vocabulary). The result of the indexing, an alphabetical arrangement of taxonomy terms that were used in the indexing with their links to content, constitutes an index. So, the index comprises terms in the taxonomy that are linked to content and arranged alphabetically. Displayed browsable alphabetical indexes, however, have become less common in online services, as they have been replaced by features that search on the index terms instead.

The trend toward “multi-channel publishing” means that the same original content may appear in different formats and media, such as print and online. Online, however, may mean more than just a PDF or other ebook format of the printed version. Rather, digital text content gets chunked into units of the size or length that could be indexed as a whole with taxonomy terms, and images and new multimedia exist as separate assets that can also be indexed with taxonomy terms.  What this means is that a manual, user guide, or textbook that in print had a back-of-the-book index, in the digital or online medium consists of multiple files for each section or unit and for each media asset, which are indexed and thus retrieved by taxonomy terms instead of using the back-of-the-book index.

Index Entries for Taxonomy Terms?


I have worked on projects were printed content (books, manuals, etc.) were digitized and put into small chunks or files to be indexed with a taxonomy, and the original printed volume had a back-of-the-book index. So, the issue arose: to what extent should the legacy back-of-the-book index be utilized when developing the new digital retrieval taxonomy?  I had access to the index for candidate taxonomy terms and was encouraged to utilize it.

My conclusions have been that the back-of-the-book index serves a slightly different purpose for users than does an indexed taxonomy. A back-of-the-book index serves to locate the page where something was mentioned on a specific topic. Users of a reference work, however, may at other times consult the table of contents to navigate and find the relevant sections and sub-section. A taxonomy serves a purpose that is both, or something in-between, that of a table of contents and a back-of-the-book index. It’s for searching (like in an index) and also for navigating (like in a table of contents), but it points to the subsection level (as in a detailed table of contents), not to a page (as in an index). Also more content is expected to be linked to a taxonomy term (a section unit, and often multiple such units) than content indicated by an index entry (as little as one sentence). So, it would not be right to use all or most of the main entries of a back-of-the-book index to create a taxonomy for the same content.