Monday, July 30, 2018

Taxonomy Hierarchy Levels


A taxonomy comprises a hierarchy of concepts (terms), and those hierarchies can be considered to be in different levels. In actuality, levels are somewhat artificial, and its important not to think of levels too strictly. In some taxonomies the levels are even named (for example: Domain, Category, Subcategory, Topic), but I would caution against such a practice.


Why we may tend to name levels


The most famous taxonomy, the Linnaean taxonomy of organisms, has well-known names for each of its hierarchical levels: Domain, Kingdom, Phylum, Class, Order, Family, Genus, and Species. There are issues, however, with this named-level system, though. In some cases, a Family may contain only a single Genus, and/or a Genus contain only a single Species (such as Homo sapiens). In some cases, a Species may have such variety within it, which we wish to describe, that we have created names for subspecies or other deeper levels (such as for dog breeds). For a digital navigation or information taxonomy of concepts, it would be considered bad style for a term to have only a single narrower term (as Homo sapiens). A term should have no narrower terms or at least two narrower terms, but not just one.

Besides the legacy of the Linnaean taxonomy, we may think of designated levels of a taxonomy, because the most common tool of developing taxonomies is MS Excel. In Excel, each column is used to designate a deeper hierarchical level, broader to more specific, from left to right. People may feel compelled to designate column headers (a typical thing to do in spreadsheets), whether as names or merely as Level 1, Level 2, etc. Excel is not intended to be taxonomy management software, and all dedicated taxonomy management software tools do not support the default naming or numbering of hierarchical levels, since there is no need for it in a taxonomy.


Why we should not name levels


Unlike the Linnaean taxonomy, the goal of a digital navigation or information taxonomy of concepts is not necessarily to classify concepts, but rather to arrange concepts (terms) in logical hierarchical relationships, so as to help guide the user to find the desired concept (which in turn is linked to content). A classification system (such industry classification codes or the Dewy Decimal system), which also has enumerated levels, is often considered a different kind of controlled vocabulary from a taxonomy.

A distinction needs to be made between hierarchical relationships and hierarchies. A good taxonomy or thesaurus design practice is to create hierarchical relationships between terms where they are logical: when one terms is a specific type or an integral part of another term, so users find narrower terms where they expect them. The extension of multiple hierarchical relationships, particularly when terms have both broader-term and narrower-term relationships, naturally results in the manifestation of hierarchies. But the resulting “natural” hierarchies are not consistent. There may be many levels deep in some places and only two levels deep in other places.  Terms that are on the same “level” may have relatively different degrees of specificity. I recently created a taxonomy for a discipline in which terms that were the equivalent of textbook courses ranged everywhere from the top to the fourth level. Fortunately, I was not constrained to have course as the first level.

Sometimes a taxonomy owner wants to set a policy as to how many levels deep the taxonomy should be.  It is understandable to limit the depth of a taxonomy in some cases: a hierarchy of navigation for public site visitors who want to get to content in the fewest clicks, lest they leave the site; a hierarchy of categories whose labels are to be picked up by search engines (supporting search engine optimization); or a hierarchy within a facet with limitations on browsing.  But there is a difference between limiting the total levels of depth and designating what the levels are called and are supposed to represent.


Examples of problems from named levels


Designating the names or types of levels inevitably results in the inaccurate application of level names or terms at inappropriate or inconsistent levels. For example, for a taxonomy of job titles I worked on, the project owner proposed that the top level be called Occupations and the narrower terms to those be called Specializations. This often works, but not always. For example, with the term Electrician and its narrower term Electrician Apprentice. Electrician was called and Occupation, and Electrician Apprentice was called a Specialization. Although an Electrician Apprentice can be a kind of (narrower term of) Electrician, it is not actually a “specialization” of Electrician. Also, a unique specialized job title may not have a broader term type of job title, so it would have to be called an Occupation. For example, Endoscopy Technician was designated as an Occupation, as it lacked a broader term, whereas Nurse Practitioner was a Specialization, since it had the broader term of Registered Nurse.

In another example of a taxonomy of academic areas of study I worked on, I was told that the taxonomy could have only two levels and the top level would be called Discipline and the second level be called Subdiscipline. The levels and designations were based on content management and business needs.  Thus, while Marketing would normally be considered a narrower term to Business, both were Disciplines at the same level. Some of the Disciplines were very specific, such as Real Estate Law (since Law did not exist as a discipline in this case), and some of the Subdisciplines were very broad, such as Computer Science (because it had a broader term of Computing). I resolved that this was not actually a taxonomy, but rather a metadata property with its values structured into two levels.

Taxonomies naturally have hierarchies, but do not naturally have levels, which are an artificial layer that sometimes get imposed.