Thursday, August 30, 2018

Taxonomy Hierarchical Relationship Issues


A common feature of taxonomies is the hierarchical relationship between terms. Terms are linked to each other in a relationship that indicates that one is the broader term (BT) of the other, and in the other direction, one is the narrower term (NT) of the other. You don’t need to be a taxonomist to understand this basic principle. However, even taxonomists can be challenged sometimes in determining whether it’s correct two put two terms in a hierarchical relationship.

Standards for Hierarchical Relationships


There are guidelines for the hierarchical relationship provided by the standards of ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies and ISO 25964-1: Information and Documentation — Thesauri and Interoperability with other Vocabularies — Part 1: Thesauri for Information Retrieval. The standards say that in a correct hierarchical relationship the term that is narrower to the broader term may be a specific type of the generic broader term, a named instance of the generic broader term, or an integral part of the whole broader term.

These standards, however, are for thesauri, not taxonomies. Thesauri have additionally a non-hierarchical associative relationship between terms, known as “related term” (RT). In taxonomies which lack related-term relationships, the conditions under which the hierarchical relationship is permitted need not be followed quite as strictly. Nevertheless, the thesaurus standards for creating the hierarchical relationship should be the starting point and the default for hierarchical relationships in taxonomies.

Challenges in Coming up with Broader Terms


Hierarchical taxonomies may be created from the top down, the bottom up, or a combination of both approaches. The top-down approach involves creating broadest categories first, then adding narrower terms and then adding narrower terms to narrower terms. This approach makes it easier to create good hierarchical relationships. In reality, though, we don’t always create terms based purely on their broader terms. Rather, analysis of content yields specific terms that are needed, so some degree of bottom-up taxonomy creation takes place. In the bottom-up approach there may be the challenge of determining and creating the appropriate broader term.

When I have been completely challenged in coming up with a broader term, I admit I have looked up the term in Wikipedia to see what are named as “Categories” for that term, listed at the bottom of the page. “Categories” implies a broader term, but these are not necessarily good or correct broader terms. An example of Categories that are not exactly broader terms is for the term Stress management: Stress, Management by type, Psychotherapy, and Psychiatric treatments. Stress management is not exclusively done as (is a part of) Psychotherapy or Psychiatric treatments, so those are not suitable broader terms. “Management by type” is definitely not a good taxonomy term, and the term Management alone has a different meaning of its own. As for the term “Stress,” this is more complicated. Technically, Stress management is not a kind of Stress or a part of Stress, so Stress should not be its broader term.  If this were in a thesaurus, they would definitely be related terms. If your controlled vocabulary is not a thesaurus, and the related-term relationship is not supported, then you may ignore the thesaurus rule in this case, and make Stress the broader term of Stress Management. This relationship is likely to be expected and accepted by users.

Challenges in Special Circumstances


Even creating a taxonomy from the top down taxonomists may encounter challenges or confusions with the hierarchical relationships. One challenging case is the concept of membership. Things and their members could be industries and their companies or international organizations and their member countries. It may seem logical to list the affiliate members “under” the industry or organization of which they are a part, but this is based too much on context and time. Companies can change their industries, and countries can change their international organization affiliation. More significantly, the whole-part hierarchical relationship is about integral parts, not participatory taking “part.” Finally, it may be more practical to put each type (companies, industries, companies, organizations) in a separate facet and not establish any relationship between them in a taxonomy (in contrast to a thesaurus or ontology).

Another potentially confusing case involves occupations and job titles. The subordinate nature of narrower terms should not be confused with the subordinate role of one job title to another. Thus, while a marketing specialist reports to a marketing manager, Marketing managers is not a broader term of Marketing specialists. Furthermore, while a marketing manager reports to a marketing director, we might make the hierarchical relationship in the other direction, with Marketing Directors as a narrower term to Marketing Managers, because directors are a kind of manager. Managers include directors.

Perhaps the most confusing case involves specificity which is not taxonomical specificity. For example, the Syllabi (plural of syllabus), as instructional outlines, in a certain sense are more specific than Curricula (plural of curriculum), which are also kind instructional outlines. Syllabi are for individual courses, and curricula are for a series of courses, such as an entire program of study or degree. Thus, it might seem logical that Syllabi would have the broader term of Curricula. But a syllabus is neither a specific type of curriculum, nor is it part of a curriculum. It is something different. So, it would be better not to have Curricula as a broader term of Syllabi, even in a taxonomy that is lacking related-term relationships.

Parent-Child Confusions


Sometimes the hierarchical relationship is referred to as “parent-child.” While it’s correct that a subsidiary company is a narrower term of its parent company, because it is part of the parent company, a biological child is not a narrower term if its parent, because it is not a part of the parent, but rather an offspring. To avoid confusion, it’s better to describe the relationship as broader/narrower, rather than as parent/child.

Monday, July 30, 2018

Taxonomy Hierarchy Levels


A taxonomy comprises a hierarchy of concepts (terms), and those hierarchies can be considered to be in different levels. In actuality, levels are somewhat artificial, and its important not to think of levels too strictly. In some taxonomies the levels are even named (for example: Domain, Category, Subcategory, Topic), but I would caution against such a practice.


Why we may tend to name levels


The most famous taxonomy, the Linnaean taxonomy of organisms, has well-known names for each of its hierarchical levels: Domain, Kingdom, Phylum, Class, Order, Family, Genus, and Species. There are issues, however, with this named-level system, though. In some cases, a Family may contain only a single Genus, and/or a Genus contain only a single Species (such as Homo sapiens). In some cases, a Species may have such variety within it, which we wish to describe, that we have created names for subspecies or other deeper levels (such as for dog breeds). For a digital navigation or information taxonomy of concepts, it would be considered bad style for a term to have only a single narrower term (as Homo sapiens). A term should have no narrower terms or at least two narrower terms, but not just one.

Besides the legacy of the Linnaean taxonomy, we may think of designated levels of a taxonomy, because the most common tool of developing taxonomies is MS Excel. In Excel, each column is used to designate a deeper hierarchical level, broader to more specific, from left to right. People may feel compelled to designate column headers (a typical thing to do in spreadsheets), whether as names or merely as Level 1, Level 2, etc. Excel is not intended to be taxonomy management software, and all dedicated taxonomy management software tools do not support the default naming or numbering of hierarchical levels, since there is no need for it in a taxonomy.


Why we should not name levels


Unlike the Linnaean taxonomy, the goal of a digital navigation or information taxonomy of concepts is not necessarily to classify concepts, but rather to arrange concepts (terms) in logical hierarchical relationships, so as to help guide the user to find the desired concept (which in turn is linked to content). A classification system (such industry classification codes or the Dewy Decimal system), which also has enumerated levels, is often considered a different kind of controlled vocabulary from a taxonomy.

A distinction needs to be made between hierarchical relationships and hierarchies. A good taxonomy or thesaurus design practice is to create hierarchical relationships between terms where they are logical: when one terms is a specific type or an integral part of another term, so users find narrower terms where they expect them. The extension of multiple hierarchical relationships, particularly when terms have both broader-term and narrower-term relationships, naturally results in the manifestation of hierarchies. But the resulting “natural” hierarchies are not consistent. There may be many levels deep in some places and only two levels deep in other places.  Terms that are on the same “level” may have relatively different degrees of specificity. I recently created a taxonomy for a discipline in which terms that were the equivalent of textbook courses ranged everywhere from the top to the fourth level. Fortunately, I was not constrained to have course as the first level.

Sometimes a taxonomy owner wants to set a policy as to how many levels deep the taxonomy should be.  It is understandable to limit the depth of a taxonomy in some cases: a hierarchy of navigation for public site visitors who want to get to content in the fewest clicks, lest they leave the site; a hierarchy of categories whose labels are to be picked up by search engines (supporting search engine optimization); or a hierarchy within a facet with limitations on browsing.  But there is a difference between limiting the total levels of depth and designating what the levels are called and are supposed to represent.


Examples of problems from named levels


Designating the names or types of levels inevitably results in the inaccurate application of level names or terms at inappropriate or inconsistent levels. For example, for a taxonomy of job titles I worked on, the project owner proposed that the top level be called Occupations and the narrower terms to those be called Specializations. This often works, but not always. For example, with the term Electrician and its narrower term Electrician Apprentice. Electrician was called and Occupation, and Electrician Apprentice was called a Specialization. Although an Electrician Apprentice can be a kind of (narrower term of) Electrician, it is not actually a “specialization” of Electrician. Also, a unique specialized job title may not have a broader term type of job title, so it would have to be called an Occupation. For example, Endoscopy Technician was designated as an Occupation, as it lacked a broader term, whereas Nurse Practitioner was a Specialization, since it had the broader term of Registered Nurse.

In another example of a taxonomy of academic areas of study I worked on, I was told that the taxonomy could have only two levels and the top level would be called Discipline and the second level be called Subdiscipline. The levels and designations were based on content management and business needs.  Thus, while Marketing would normally be considered a narrower term to Business, both were Disciplines at the same level. Some of the Disciplines were very specific, such as Real Estate Law (since Law did not exist as a discipline in this case), and some of the Subdisciplines were very broad, such as Computer Science (because it had a broader term of Computing). I resolved that this was not actually a taxonomy, but rather a metadata property with its values structured into two levels.

Taxonomies naturally have hierarchies, but do not naturally have levels, which are an artificial layer that sometimes get imposed.