Saturday, September 26, 2020

Adjectives as Terms in Taxonomies

Taxonomies need not follow strict standards, but rather best practices. There are standards for thesauri (ANSI/NISO Z39.19 and ISO 25964), and as taxonomies are similar to thesauri, it’s a good idea to follow thesaurus standards for taxonomy design to the extent applicable. According to thesaurus standards, terms should be nouns or noun phrases, not verbs or adjectives. Similarly, taxonomies usually comprise terms of only nouns or noun phrases. An exception would be in a faceted taxonomy, where there is a facet for a kind of attribute or characteristic, such as color, and there the terms could be adjectives. In this sense, taxonomies are more flexible and have more applications than thesauri do.

Product taxonomies tend to have adjective terms for some of their facets/attributes, including color, size, style, type, status, etc. These kinds of adjectives are reasonably straight-forward, although there may be nuances among colors and styles that are not generally known among the users of the taxonomy. It is rather other, descriptive adjectives that can be more challenging to include in a taxonomy because their meaning tends to be much more subjective than noun-based terms, and thus it’s difficult to tag/index consistently with them. 

I recently did some work on a taxonomy where descriptive adjectives were included in an “attribute descriptor” term set or facet. This was a taxonomy for images, including photographs, illustrations and graphical design components. Adjective terms included Elegant, Formal, Funny, Ornate, Simple, Modern, Vintage, among others. I also filled in the role of tagging for a short period of time and found how subjective it was to tag with such adjective terms. I was not confident that I was tagging with such adjectives in a consistent manner.  While the adjectives might have seemed like a good idea originally, they were not that practical compared to other components of the taxonomy. Fortunately, the attribute descriptors were not displayed to the user as a dynamic facet but rather supported search, so insufficiencies in adjective tagging were not so obvious.

A recent article in Vogue Business described how adjectives in fashion product ecommerce taxonomies are used, such as by Nordstrom, Rebag, and The Yes. These include terms such as Bright, Chic, Whimsical, Flowy, Billowy, Comfortable, etc. I wouldn’t want to try to tag with those. However, in these cases, the tagging was not manual but automated, using algorithms, hundreds of examples, and machine learning. While auto-categorization is not necessarily more correct than manual tagging, it is more consistent, and when it comes to the subjectivity of adjectives, the challenges are more around consistency than correctness. So, I can see that auto-categorization can be a solution to dealing with the challenges of adjective terms.

Now that it’s established that taxonomies can, in certain circumstances, contain adjective terms, including adjectives in a taxonomy should be done with care. If you will have adjectives as terms, my recommendations are:

  • Keep them separate from other taxonomy terms, by having them in their own term list, vocabulary, or facet.
  •  Ideally, keep the number of adjective terms limited to a few clearly distinguishable terms.
  • Expect to spend more time and possibly expertise in developing, editing, and maintaining adjective terms than noun-based terms.
  • Consider implementing auto-categorization (auto-tagging), if resources permit it.
  • Whether tagging is manual or automated, prepare multiple examples of assets/content items for each adjective term to demonstrate what is the appropriate content for tagging with each adjective.

A thesaurus is more specific than a taxonomy, as a thesaurus has terms for what content is about. A taxonomy has terms for what content is about but other aspects and attributes of content as well. Thus, a taxonomy may include adjectives, whereas a thesaurus does not. Adjective terms, however, should be created with care and special attention to how they will be used in tagging.

Sunday, August 23, 2020

Taxonomy Terms for Different End-Users

The names of taxonomy terms need to be understood by the taxonomy’s users, and all users need to share the same understanding of what the term means. Typically, a taxonomy as two fundamental sets of users: those who tag content with the taxonomy terms and those who retrieve content with the taxonomy terms, the end-users. The taggers can usually be supported by definitions or scope notes for the terms. The end-users rarely have access to such explanatory notes for terms, and even if they did, it would be in some inconvenient collection of documentation that very few end-users would find and read. Therefore, the terms should represent concepts that should be obvious and intuitive the end-users and need no explanation. To this end, it is important to understand the users’ perspective and the terms that they would likely use to describe concepts. User research is thus an important part of taxonomy design.

Some taxonomies have two different end-users, and this is where it can get more complicated. Examples include health information whose end-users include both healthcare providers and patients or their family members; published educational content whose tagging producers are publishers, but the end-users include both students and instructors; marketplace websites who end-users include both sellers and buyers; and job search platforms whose end-users include both employers and job seekers. It is important in these cases that the different kinds of end-users have the same understanding of what a term means, but this sometimes not the case.

Example: The Problem with “Entry Level”

I recently noticed an example of a taxonomy term in the case of job search platforms (LinkedIn, Glassdoor, Indeed, etc.) that seemed to be understood differently by employers and job seekers. There are several controlled vocabularies that can be used in the “advanced” (or faceted) job search features. Job type (Full-time, Part-time, Contract, Temporary, etc.), Location, Company, Industry, and Experience Level. I took an interest in Experience Level (also called Seniority Level), because I wanted to help identify additional jobs for my daughter, who had just graduated from college. So, I selected the filter for "Entry level." The other options include Internship, Entry Level, Associate (in LinkedIn), Mid Senior Level, Director, and Executive.
Experience level options in Glassdoor, LinkedIn, and Indeed
I was dismayed to see so may jobs classified as “Entry level” requiring at least 2 years and sometimes as many as 5 years of experience. That is certainly not entry-level by the definition of a recent college graduate.

Then one day (after my daughter found a job) I noticed a job posting for a taxonomist that on LinkedIn was classified as Entry level. It required at least 2 years of experience designing and managing taxonomies. It was clearly not an entry level for fresh college graduate. This time, however, I was looking at the job differently. I was familiar with the employer, and it was clear that for the employer this was an entry-level professional position in their firm. Even though prior experience was expected, this was the most junior professional position available. So, apparently the human resources representative of the company considered it entry-level compared to other jobs they might hire for and classified it that way. It became obvious that employers and job-seekers do not use the same terms, such as “Entry level” to mean the same thing. 

How to make the term Entry level clear, short of creating a definition or scope not that the users/end-users will never read, might be to replace it with two other terms, one for Recent grad and one for Junior associate, but the exact wording may still have drawbacks and requires more research. Simplicity and elegance may have to be sacrificed for clarity. This is just one of the many trade-offs to deal with when creating taxonomies.

Sunday, July 19, 2020

How Many Facets Should a Taxonomy Have

I’ve given a rule-of-thumb of 3-8 facets to create in a faceted taxonomy, but it’s not that simple, and there are various factors to consider. Creating facets is an assignment in the online taxonomy course I teach, and a student recently submitted good set of facets with sample terms, but there were 12 of them. So, why might that be too many facets?

Schematic diagram of a set of four facets.

Consider the users.

Are the users internal trained employees who deal with content, most or all of the employees of an organization, external but repeat users such as partners or researchers, or the general public? Internal employees, especially those who are content managers or digital asset managers, who receive some training to become familiar with the facets, should be able to handle any number of facets. It is their job to classify and/or retrieve content by their facets, so they should have the time and inclination to go through a long list of facets. A broader cross-section of employees or external repeat users may have access to documentation but not read it, will likely not be trained, and are often more rushed when they deal with content, so a shorter list of facets would be more suitable. Finally, the general public is likely to use only facets that are easy to understand and fit into the window display (not requiring scrolling), so a relatively short list of facets is recommended for them.

Consider the content.

In addition to considering the shared attributes of the content, as you cannot create more facets than conceptually exist for the content, you need to consider the volume of the content. A relatively small collection of content items or assets does not need as many facets as filters than a larger collection of content does. If users select a term from each facet, they should not be getting zero results or just one or two items too often. Remember, the main use of facets is to filter and limit results down to a list that can then be easily browsed. If the user retrieves only one or two results, however, they will likely consider the search as too narrow and try again to broaden it.

Microbial Life Education Resources website facets
Microbial Life Educational Resources
facets are just enough to fill the
length of a computer monitor display.
Consider the user interface.

Sometime the taxonomy has influence over the user interface design, such as when it’s an internally designed research portal, but often content management system do not offer much flexibility in how facets are displayed. The first thing to consider is how many facets will be displayed by default in the initial screen view (without scrolling) in the most commonly used devices. If facets can be collapsed to show only the facet names and not any values/terms within them, then a greater number of facets can more easily be included. Hiding the values, however, might not be desired, since the display of sample values makes it clear to the user what the facets are for. 

Consider what constitutes a facet or filter.

What may be considered a “faceted taxonomy” is only a subset of all the possible metadata properties of the content. Some of the other, default non-taxonomy metadata (such as date, creator, file name or title, or file type) may also be desired as end-user filters alongside the taxonomy facets, which then further increases the number of filters or refinements displayed to the user, who sees no difference between taxonomy facts and non-taxonomy filters.

There is no strict definition of a “taxonomy facet.” I would say it is a facet whose values or terms must be created by a person, such as a taxonomist or metadata architect, rather than those that are system-generated. In addition, taxonomy terms are those that must be tagged to the content, rather than already being a part of content. For example, if File Format is based on the file extension, then it is already part of the content and need not be “tagged,” so it’s not a taxonomy facet by my definition.

A faceted taxonomy is more, though, than a single facet of topics alongside other non-taxonomy metadata. The idea behind creating a faceted taxonomy is to split up what could be a large hierarchical taxonomy into different aspects.  For example, instead of having a term Business service agreements that is in a hierarchy narrower to both Vendor contracts and Business services, you could have just the term Vendor contracts in the Document Type facet and Business services in the Business Type facet, and the combination of the terms from each facet will suffice.

Faceted taxonomies, more so than hierarchical taxonomies or thesauri, need to consider the factors of users, content, and user interface when it comes to their design.

Saturday, June 20, 2020

When a Taxonomy Should not be Hierarchical

The traditional taxonomy is hierarchical. Thus, after it is determined a taxonomy is needed, often it is thought that it should be designed as a hierarchy. However, in practical terms, a hierarchical taxonomy might not be the kind that is appropriate.

A taxonomy provides value (1) as a controlled vocabulary of concepts to support consistent tagging and comprehensive, accurate retrieval of content and (2) by having some organized structure of these concepts to guide users to desired concepts. That structure is traditionally a hierarchy, but increasingly we are seeing a slightly different structure, which is faceted. Facets define different aspects, types, issues, dimensions, etc., by which content may be classified and then organizes the taxonomy concepts (terms) into those facets. Examples of facets could be document type, location, function, department, audience, subject discipline, line of business, etc., as the needs of the content dictate. It is certainly possible to have a hierarchy of concepts within a facet, but with a well-designed set of facets, the addition of hierarchy may no longer be needed.

       Hierarchical Taxonomy Structure    vs.    Faceted Taxonomy Structure

Recently I had a small consulting project where I was asked to make recommendations and improvements on newly created taxonomy, including putting the Topics into a hierarchy. There were only 68 Topics (besides other facets). I made changes that involved over half of the terms, including deletions, additions, name changes, and moving terms from/to the Industries facet, but in the end, there were about the same number of Topic terms. However, although I made significant improvement to the Topics taxonomy, I did not feel it was needed or practical to put the terms into a hierarchy, even though the client had initially made that request. The small size, the type of display, and the nature of the terms were all reasons not to have a hierarchy.

Following are reasons not to have a hierarchy:
  • The term set in question is not that large and can easily be browsed (even with some scrolling) without a hierarchy to organize it.
  • The hierarchy will not display (or not display well) to the end-users, who might, for example just have a small scroll box or a type-ahead or auto-complete search on the taxonomy terms.
  • It is not easy or possible according to hierarchical relationship standards to put most terms into a hierarchy. For example, the term set in question is a collection of common tags/keywords/topics that occur in the content but are not necessarily related to each other, so it would be difficult to include all of them in a hierarchy, and the only way to create a logical hierarchy would be to introduce additional broader/category terms which are not practical to use for tagging.
  • Putting only some terms into hierarchical relationships results in a non-intuitive top-level display comprising both specific terms and categories (of narrower terms) at the same level.
  • Your user research indicates that users (including taggers) prefer type-ahead or auto-complete search on the taxonomy terms, rather than drilling down through hierarchies.
When the taxonomy is displayed to the user through a scroll box, and only a limited number of terms, such as 5-10 may be displayed at once in the scroll box display, it’s easier for a user to scroll and select terms from a list of 50-60 terms, if the terms are in an alphabetical list rather than in if they were in a hierarchy. Actually, hierarchies are not designed to be scrolled but rather to be expanded from top down in their tree structure.  Expanding a hierarchical taxonomy (such as clicking on plus signs next to terms), might be a feature in the taxonomy management system or in the tagging interface, but it is less common in end-user interfaces. Expandable tree hierarchies might not even be desirable in the end-user interface, since it takes the user more time and effort to find a term that way. Most end-users want to get to the content as quickly as possible rather than spend time exploring a taxonomy.

A number of content management systems and the SharePoint Managed Metadata Term Store support the creation of individual terms sets or facets and hierarchies within those facets.  So, for the less experienced taxonomist, it may seem logical to make full use of a system’s feature to support hierarchical taxonomies. Just because a taxonomy can be created as a hierarchy, however, does not mean it always should be created as a hierarchy.  I have seen awkwardly deep hierarchies created by non-taxonomists in content management systems.

Hierarchies should be created if they serve a purpose. Following are some likely purposes for taxonomies:
  • Making it easier for the end user to quickly identify the concept they want for retrieving content.
  • Educating users (such as students) on the hierarchical structure of a subject area.
  • Providing context to terms for manual indexers/taggers so that they apply the correct term. (Such a hierarchy need not be displayed to end-users, though.)
  • Providing the context of a broader concept to aid in auto-classification. 
  • Allowing a term to retrieve not only what was tagged to it, but also what was tagged to each of its narrower terms. (Such a hierarchy need not be displayed to end-users, though.)
Even if a pair of concepts has an inherently hierarchical relationship between each other, according to thesaurus standards (ANSI/NISO Z39.19 or ISO 25964-1), it does not mean that they must be put into a hierarchy in a taxonomy, if you’ve decided to avoid creating hierarchies and especially if what you are creating is a simple taxonomy and not a thesaurus.