Showing posts with label Taxonomy style. Show all posts
Showing posts with label Taxonomy style. Show all posts

Sunday, January 28, 2018

Best Practices for Different Taxonomies

A question was recently posted to a group: “I'm wondering if anyone knows of a standard for designing taxonomies for industrial components (widgets).” So far, no one has replied.

To clarify, taxonomies for different subject areas and different content don’t have different standards. Standards, whether for interoperability, such as SKOS, or for structural design, such as ANSI/NISO Z39.19 or ISO 25964, are the same and just as relevant for taxonomies and thesauri in all subject areas. Taxonomies for different subject areas and content may have different design best practices, though. The published standards don’t spell out everything; there is room for design and style differences for different taxonomies, including those that differ in their subject domain and content.

Areas of taxonomy design best practices that may differ include:
  • Degree of term specificity or granularity
  • Depth of hierarchical levels
  • Number of terms at the same level (i.e. the number of narrower terms a term has)
  • Length of terms
  • Use of parenthetical modifiers and other term label fields
  • Additional attribute details for terms (notes or controlled value fields)
There are also issues of relationships between terms (whether a term may have more than one broader term, and whether there should be associative/related-term relationships) and how extensive alternative labels/synonyms shall be. Best practices for these issues, however, depend more upon the implementation and user interface for the taxonomy than on the subject area of the taxonomy.

In the case of an industrial component taxonomy, best practices for the aforementioned points would likely be of the following:
  • There should be relatively high level of specificity of terms to include all components
  • Depth of hierarchy that accurately reflects standard component categories and subcategories. So this could be deeper than for other, business taxonomies. Also, the levels of depth may vary in different parts of the taxonomy.
  • The number of terms at the same level should also accurately reflect standard component categories and subcategories, so there could be a large number of terms at the same level.
  • The length of the term should be complete and unambiguous, but any component number should be managed in a separate field.
  • It may be desired to use some additional numeric or alphanumeric classification system. If so, the classification code would be another field or component of the term, separate from the term name, for purpose of sorting.
  • Additional attribute details for each term would be desired and expected. These may include a component number, size, price, and other specifications. (Attribute fields may or may not be searchable. They are not for filtering, though, as facets are.)
In contrast, a consumer products ecommerce taxonomy would follow different best practices:
  • Terms should not be too specific, not more specific than what users would be familiar with. Specificity should reflect the number of units (SKUs) covered by the term category. A term that refers to only 1-5 products is probably too specific. If there are additional refinement filters, then a category term may be broad enough to include 10-50 items.
  • Hierarchy should not be too deep, probably no more than 3 levels.
  • Terms per level should be limited, such as 3-12 terms per hierarchy level
  • Term names should be concise, for easy browsing, yet unambiguous, usually 1-3 words
  • Terms should probably not have any other fields/components or parenthetical qualifiers
  • Attribute details would include at minimum product number/SKU and description. Price would be managed as a separate filter, rather than as merely an attribute.
These best practices are not “standards” because they tend not to be shared outside of an organization. Each organization comes up with their own policies and guidelines, just as they have their own taxonomies. The best practices could be considered internal standards, though. Regardless of what they are called, these guidelines should be documented and overseen as part of a taxonomy governance plan.









Monday, April 22, 2013

Capitalization in Taxonomies

The question often comes up: what is the preferred style for the capitalization of taxonomy terms? Other than all proper nouns being capitalized, there is no strict rule for generic terms. In making the determination, it’s important to address the following questions. What kind of taxonomy is it? How will it be used? Who are the users, and what might they be accustomed to or expect?

The ANSI/NISO standard Z39.19-2005 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies states: “predominantly lowercase terms should be used for terms in controlled vocabularies” and continues: “capitals should be used only for the initial letter of proper names, trade names and those components of taxonomic names, such as genus, which are conventionally capitalized.” But remember that ANSI/NISO Z39.19 comprises guidelines and not strict requirements, so the stylistic matter of case does not have to follow ANSI/NISO Z39.19, if a house style dictates otherwise.

Note that there are three options, not just two for non-proper nouns/names, as these explanations themselves illustrate:
1.    all lower case (including the first letter of the first word)
2.    First letter of first word upper case
3.    Title Case (First Letter of the First and Main Words Capitalized)

While the distinctions between “controlled vocabularies,” “thesauri,”, and hierarchical or faceted “taxonomies” can be blurred, these different types do tend to have different practices for capitalization.

A “controlled vocabulary,” as the word “vocabulary” might suggest, is a list of terms (as single words or phrases), similar to what might be found in a glossary, with the possible added feature of synonyms/variants for each preferred term. Capitalization, therefore, could be expected to follow dictionary rules and thus not used except for proper names. A “synonym ring” type of controlled vocabulary, in which no terms are designated as “preferred” and none are even displayed to the user, has no need for any capitalization.

A “thesaurus” is a more complex type of controlled vocabulary with hierarchical and/or associative relationships relating various terms to each other. What are called thesauri tend to be more term-focused than hierarchically focused, and they tend to be large with many detailed terms. The terms can be quite specific, and proper nouns can be mixed in. Thesauri have traditionally been used by indexers to manually index multiple documents consistently over time. The resulting display of terms associated with content for the end-user to browse through is a type of index. Indexes (such as those at the backs of book) often follow the style of lower-case entries for non-proper names, too. If the terms are numerous and specific, they will appear to be and used as “index terms” rather than “categories.” Thus, if it’s called a thesaurus, it will more likely have terms in lower case. The choice of initial capitalization for a thesaurus, though, would not be incorrect, and is probably becoming more common, just as initial capitalization is becoming more common in main entries in back-of-the-book indexes.

A “taxonomy” implies a hierarchical classification or categorization of concepts. When we think of categories we think of labels or headings with subcategories. Headings in general tend to have initial capitalization or title capitalization. Thus, if it’s a strictly hierarchical taxonomy, where all terms are interconnected into a single hierarchy or a limited number of hierarchies, then it will more likely have initial capitalization or title capitalization. Such capitalization is particularly common on the relatively smaller/less detailed taxonomies that are proliferating on websites, intranets, and content management systems. It fits in with the web design style of capitalization on headings and categories.

In faceted taxonomies, which have become more popular in web/online taxonomies, proper names can be separated into their own facet(s), and confusion between proper names and generic terms is reduced. However, I would still recommend only the first letter of the first work capitalized, rather than title case, to minimize any confusion with proper names. The facet name itself, however, could be it title capitalization, since it represents a category heading and not a term for indexing. In fact, it might even be desirable to distinguish the facet labels from the values/terms within each facet by use of a different case style.

A mixed style of different capitalization at different levels is possible in hierarchical taxonomies, too. But I would recommend only the top terms, if any, have a different capitalization style. It would not be a good idea to have only the bottom level terms (“leaf nodes”) in a different case style, because they could change. If you decided that a leaf node should later have narrower terms added, you wouldn't want to have to worry about changing the case of the term. A good application of the mixed capitalization style is if the top level terms were not actually to be used in indexing/tagging but are really just categories/groupings of the actual index terms, which in-turn are arranged hierarchically underneath. (Other typographical methods of distinction could also be used for any non-indexible top-level categories.)

In sum, all-lower case is most appropriate for non-displayed controlled vocabularies, any controlled vocabularies or thesauri that integrate proper nouns into the same hierarchies as generic terms, and large thesauri used to support manual indexing. Initial capitalization is fine for end-user browsable hierarchical taxonomies on the web. Title capitalization is OK for facet labels or the top categories in a hierarchical taxonomy. Whichever style is chosen, however, should be applied consistently.