Monday, April 22, 2013

Capitalization in Taxonomies

The question often comes up: what is the preferred style for the capitalization of taxonomy terms? Other than all proper nouns being capitalized, there is no strict rule for generic terms. In making the determination, it’s important to address the following questions. What kind of taxonomy is it? How will it be used? Who are the users, and what might they be accustomed to or expect?

The ANSI/NISO standard Z39.19-2005 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies states: “predominantly lowercase terms should be used for terms in controlled vocabularies” and continues: “capitals should be used only for the initial letter of proper names, trade names and those components of taxonomic names, such as genus, which are conventionally capitalized.” But remember that ANSI/NISO Z39.19 comprises guidelines and not strict requirements, so the stylistic matter of case does not have to follow ANSI/NISO Z39.19, if a house style dictates otherwise.

Note that there are three options, not just two for non-proper nouns/names, as these explanations themselves illustrate:
1.    all lower case (including the first letter of the first word)
2.    First letter of first word upper case
3.    Title Case (First Letter of the First and Main Words Capitalized)

While the distinctions between “controlled vocabularies,” “thesauri,”, and hierarchical or faceted “taxonomies” can be blurred, these different types do tend to have different practices for capitalization.

A “controlled vocabulary,” as the word “vocabulary” might suggest, is a list of terms (as single words or phrases), similar to what might be found in a glossary, with the possible added feature of synonyms/variants for each preferred term. Capitalization, therefore, could be expected to follow dictionary rules and thus not used except for proper names. A “synonym ring” type of controlled vocabulary, in which no terms are designated as “preferred” and none are even displayed to the user, has no need for any capitalization.

A “thesaurus” is a more complex type of controlled vocabulary with hierarchical and/or associative relationships relating various terms to each other. What are called thesauri tend to be more term-focused than hierarchically focused, and they tend to be large with many detailed terms. The terms can be quite specific, and proper nouns can be mixed in. Thesauri have traditionally been used by indexers to manually index multiple documents consistently over time. The resulting display of terms associated with content for the end-user to browse through is a type of index. Indexes (such as those at the backs of book) often follow the style of lower-case entries for non-proper names, too. If the terms are numerous and specific, they will appear to be and used as “index terms” rather than “categories.” Thus, if it’s called a thesaurus, it will more likely have terms in lower case. The choice of initial capitalization for a thesaurus, though, would not be incorrect, and is probably becoming more common, just as initial capitalization is becoming more common in main entries in back-of-the-book indexes.

A “taxonomy” implies a hierarchical classification or categorization of concepts. When we think of categories we think of labels or headings with subcategories. Headings in general tend to have initial capitalization or title capitalization. Thus, if it’s a strictly hierarchical taxonomy, where all terms are interconnected into a single hierarchy or a limited number of hierarchies, then it will more likely have initial capitalization or title capitalization. Such capitalization is particularly common on the relatively smaller/less detailed taxonomies that are proliferating on websites, intranets, and content management systems. It fits in with the web design style of capitalization on headings and categories.

In faceted taxonomies, which have become more popular in web/online taxonomies, proper names can be separated into their own facet(s), and confusion between proper names and generic terms is reduced. However, I would still recommend only the first letter of the first work capitalized, rather than title case, to minimize any confusion with proper names. The facet name itself, however, could be it title capitalization, since it represents a category heading and not a term for indexing. In fact, it might even be desirable to distinguish the facet labels from the values/terms within each facet by use of a different case style.

A mixed style of different capitalization at different levels is possible in hierarchical taxonomies, too. But I would recommend only the top terms, if any, have a different capitalization style. It would not be a good idea to have only the bottom level terms (“leaf nodes”) in a different case style, because they could change. If you decided that a leaf node should later have narrower terms added, you wouldn't want to have to worry about changing the case of the term. A good application of the mixed capitalization style is if the top level terms were not actually to be used in indexing/tagging but are really just categories/groupings of the actual index terms, which in-turn are arranged hierarchically underneath. (Other typographical methods of distinction could also be used for any non-indexible top-level categories.)

In sum, all-lower case is most appropriate for non-displayed controlled vocabularies, any controlled vocabularies or thesauri that integrate proper nouns into the same hierarchies as generic terms, and large thesauri used to support manual indexing. Initial capitalization is fine for end-user browsable hierarchical taxonomies on the web. Title capitalization is OK for facet labels or the top categories in a hierarchical taxonomy. Whichever style is chosen, however, should be applied consistently.

