The Accidental Taxonomist: 2020

Saturday, December 5, 2020

Differing Definitions of Ontologies

In my last blog post I discussed the different definitions and features of thesauri. Now, I will turn to the next kind of knowledge organization system in the spectrum of complexity: ontologies.

Actually, to consider an ontology as a more (or most) complex type of controlled vocabulary or knowledge organization system, after thesauri, due to additional features, is just one perspective or definition of ontologies, which is not universally shared.

When I first learned about ontologies, coming from my taxonomist perspective, I considered ontologies as merely a more complex type of taxonomy or thesaurus, characterized by customized semantic relationships between concepts (rather than merely hierarchical or associative relationships), more expressive attributes for concepts (rather than mere scope notes), and the grouping of concepts into classes to manage the semantic relationships and attribute types. In fact, I wrote in 2008 for the first edition of my book “An ontology can be considered a type of taxonomy with even more complex relationships than in a thesaurus,” which the following graphic represents.

As my understanding has evolved, I would consider this just to be one kind of understanding or definition of ontology among others. In other words, a controlled vocabulary that has the features of semantic relationships, classes of concepts, and attributes for concepts, can be considered a kind of ontology, but there are other definitions and understanding of ontology within the field of information/knowledge management.

While we usually refer to “controlled vocabularies” as the over-arching category for these things, it is probably better to go up a further level and call an ontology a kind of “knowledge organization system,” rather than a kind of controlled vocabulary. Controlled vocabularies are kinds of knowledge organization systems, where the emphasis is on managed terms or concepts for the purpose of tagging or categorizing and information retrieval. Ontologies, by themselves, are not necessarily for information retrieval, at least not directly. And this is one of the points of differing definitions of ontologies.

Differing definitions and perspective

There are differing definitions of the word ontology: (1) branch of philosophy that studies existence, being, becoming, and reality (Wikipedia: Ontology), and (2) a representation, formal naming, and definition of categories, entities, properties, and relations within a domain (Wikipedia: Ontology (information science)). Of course, we are interested in the second definition, although there are some connections between the two.

The second definition, however, is already multidisciplinary, as it is a concept shared in both information science and computer science. Information scientists (including librarians, taxonomists, and knowledge managers) and computer scientists do not have different definitions of ontologies, but rather different approaches to and perspectives of ontologies and different purposes for the ontologies they create. For computer scientists, modeling data and information helps them design a computer program to perform desired functions. For information scientists, modeling data and information makes it easier to retrieve information with complex queries. Information scientists consider an ontology as a kind of knowledge organization system, whereas computer scientists tend to consider an ontology as a form of knowledge representation.

Yet even among information scientists, who consider ontologies as knowledge organization systems and have the same objectives in developing ontologies, there are different understandings of what exactly constitutes an ontology and how it relates to other knowledge organization systems, such as taxonomies. This is due to (1) different emphasis on various ontology components, (2) the question of adherence to ontology standards, and (3) the way different ontology software tools model ontologies and their relations to taxonomies differently.

Differing understandings of ontology components

There is a shared understanding that ontologies are composed of things, their properties/attributes, and their relationships.

Ontology model example with classes, relations, and attributes

Ontology example with components: classes, relations, and attributes

However, there are differences in understand of the two kinds of “things”: classes and individuals. Classes are categories or groups of things with shared characteristics, whereas individuals are specific instances of things. This seems obvious, but if you approach ontology design from the perspective of taxonomy design it can become less certain. Is an individual the most specific concept (also called “leaf node”) in a hierarchy, or is an individual a named entity/proper noun? The definition of components of ontologies does not answer this question, because ontology structures are meant to model data, not to organize taxonomy concepts that could be either generic (common nouns) named entities (proper nouns). Drawing the line between classes and individuals can be challenging, but whether this matters may depend on what tool you are using.

Furthermore, ontologies may have other components, such as axioms, rules, restrictions, events, and function terms, but ontologies as knowledge organization systems rarely have most of these.

Differing ontology standards or languages

In 2004 the World Wide Web Consortium (W3C) published the Web Ontology Language (OWL) specification, which is based on the Resource Description Framework (RDF), as “a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things,” which has become widely adopted. Now it is common to think that ontologies must follow OWL guidelines. But (information science) ontologies have existed before OWL, and an ontology does not have to follow OWL to be called an ontology. There are other ontology languages besides OWL, but they are not as common. To share and reuse ontologies, it is recommended to follow the OWL standard.

Differing ontology modeling software

While one could design the high-level model of an ontology in a mind-mapping tool, there would be no enforcement of standards or best practices (preventing duplications or incomplete data, etc.), and it’s difficult to scale, so dedicated ontology modeling software is recommended. However, ontology modeling/editing software does not model ontologies all in the same way.

The main difference is probably between stand-alone ontology software (such as Protégé or TopBraid Composer) and software that combines ontology with taxonomy/thesaurus development and editing (such as PoolParty, Semaphore, or Graphite). Stand-alone ontology editing software supports creating a detailed ontology as single model, thus including classes, multiple levels of subclasses, and individuals (instance concepts). In integrated software that combines taxonomy/thesaurus development with ontology development, the taxonomy or thesaurus (or multiple controlled vocabularies) is created in one space with one set of software features, and the ontology is created in another space with a different set of features. The ontology (or even just parts of it) is then applied to the taxonomy, so that concepts in the taxonomy inherit the attribute types and relationships of their associated class, and the taxonomy concepts are like individuals in the ontology. The ontology can be considered a semantic layer in the model, as the following graphic illustrates.

These two different approaches to ontology modeling thus result in different definitions of an ontology. A ontology is likely to be considered as a more complex type of knowledge organization system by users of stand-alone ontology software, whereas an ontology is likely to be considered and expressive semantic layer applied to one more taxonomies by users of integrated taxonomy/ontology software.

Ontology lite or ontology-like

When I was still considering ontologies more akin to thesauri with semantic relationships, and I expressed such views in a discussion forum, someone (whom I don’t remember), referred to this kind of ontology as “ontology lite,” since it has features of an ontology, but does not fully follow an ontology model and standards. This is not necessarily a bad thing. Controlled vocabularies and knowledge organization systems can be considered along a continuum, and you should build what works for your situation.

Another kind of ontology-like structure is when you start linking multiple controlled vocabularies together. My initial experience with working on commercially implemented ontologies had been with such ontology-like systems, which were not actually called ontologies, at a former employer Gale. There we had controlled vocabularies (also called object classes) for subjects, persons, places events, products, companies/organizations, named works, etc., many of which had customized reciprocal relationship pairs between them (such as the relationship pair Creator/Creatby, between person names who were authors, and named works) and many customized term attributes (such as Birthdate, Death date, Birth city/state/country, Death city, state/country for persons).

I also heard this approach recently from a speaker, Ahren Lehnart, at Taxonomy Boot Camp conference, who described the linking of controlled vocabularies with related match (not equivalent match) relationships as “trending toward” creating an ontology.

Sunday, November 22, 2020

What it a Thesaurus and What is it Good For

It is somewhat ironic that in the domain of controlled vocabularies and knowledge organizations systems that there continue to exist differing meanings for “controlled vocabulary,” “taxonomy,” “thesaurus,” “ontology,” and “knowledge graph.” Hopefully, I have provided some clarification regarding what a taxonomy is and is not in my previous posts on taxonomy vs. classification, taxonomy vs. navigation, and when a taxonomy should not be hierarchical. Let’s turn now to thesauri.

Different meanings of thesaurus

I recently attended a webinar on taxonomies, ontologies, and knowledge graphs, in which a thesaurus was described as a set of synonyms for each identified concept in a list. This is not the right definition for this context. A set of synonyms for each of list of concepts is what we taxonomists call a “synonym ring”, and what administrators of enterprise search engines would call a “search thesaurus.” The use of the word “thesaurus” in this case refers to the dictionary-type thesaurus (as the default Thesaurus entry in Wikipedia) such as Roget’s Thesaurus, where synonyms are presented for each word. Synonyms are included to support search, by matching potential words and phrases entered by users into the search box with the words and phrases that likely occur in the text of content, so that content is not missed due to the searcher using a different synonym.

The “search thesaurus” (synonyms ring) differs from the synonym-dictionary thesaurus, however, in several ways, due to their different uses:

A search thesaurus includes phrases, not just single words as in a dictionary thesaurus.
A search thesaurus comprises concepts that are nouns, verbal nouns, or noun phrases, not just any part of speech as a dictionary may include.
The “synonyms” in a search thesaurus are appropriately equivalent terms that can be used interchangeably in all cases for the content repository, not synonyms that may be used in only some cases, as the dictionary suggests.

However, in the context of taxonomies/ontologies (not the context of search administration), the designation thesaurus has a significantly different meaning. Also referred to as in information thesaurus or information-retrieval thesaurus (to distinguish it from the synonym dictionary type), there is a different entry in Wikipedia for Thesaurus (Information Retrieval), which defines it as “a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects.” This is the meaning that relates to taxonomies and ontologies. More significant than the Wikipedia definition, are the published standards/guidelines for how to construct thesauri: ISO 25964 Thesauri and interoperability with other vocabularies and ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. While the latter does not name thesauri in its title (although it did in an earlier version), it is essentially about thesauri and defines, in section 4.1 Definitions, a thesaurus: “A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators.”

So, a thesaurus is a kind of controlled vocabulary or a kind of knowledge organization system which is quite structured and has certain standard features: terms that are noun phrases, hierarchical relationships between terms, associative (related, but not hierarchically) relationships between terms, “synonym” or variants, which are called nonpreferred terms, and scope notes on terms. Other metadata on terms is possible, and variations of hierarchical and associative relationships may also be possible.

Thesaurus usefulness

On the continuum chart of controlled vocabulary (knowledge organization system) types, a thesaurus falls between a taxonomy and an ontology in its level of complexity and support for semantics.

Controlled vocabulary types

Since both taxonomies and ontologies are recognized as useful, it would seem illogical that something that is in between should not be considered at least as a useful. A thesaurus has the benefits of supporting more semantics than a taxonomy while not being as complex as an ontology.

Even if most relationships are hierarchical, there may be times when creating an associative relationship between related subjects seems logical and would be helpful to users, such as relating between a process and agent, action and property, cause and effect, object and origins, discipline and practitioner, etc. Or it might not be subjects. For example, ecommerce may want to recommend “related” product categories, or content on activities could relate activities to products. In an expert people finder, person names can be related to subject areas of expertise, If the scope of “related” types is kept limited, then the generic associative relationships (“related term”) may suffice without getting to level of complexity of an ontology where there are multiple types of defined semantic relationships.

The added associative relationships and comprehensive inclusion of synonyms/nonpreferred terms also supports better (more comprehensive) tagging, whether manual or automated, by providing suggestions to the indexers or providing context for the auto-classification tool.

Finally, the overall structure of a thesaurus is more flexible than that of a taxonomy. A taxonomy groups concepts into categories with a limited number of top concepts (or “top terms”). A concept which has no broader and no narrower concept relationships, sometimes called an “orphan,” is considered an error in a taxonomy. In a thesaurus, on the other hand, where an over-arching hierarchical structure is not required (although may exist) and associative relationships are included, it is OK to have a concept with no broader and no narrower relationships, but at least an associative relationship. Thus, the taxonomist does not always have to force new concept into an existing hierarchy which might not be ideal.

Software for thesaurus management

Software to support the development and maintenance of thesauri has also been available for some time. (Taxobank has a historic list, not updated since 2013.) There actually is no such thing as “taxonomy” management software, because the software used to create taxonomies is really “thesaurus” management software, and the added thesaurus features, such as associative relationships, are just not utilized when creating a simple taxonomy.

As taxonomies have become more popular than thesauri, the software vendors have reflected that by having a hierarchical display (instead of alphabetical) as the default, and by marketing their solutions for taxonomies and ontologies and de-emphasizing or omitting mention of thesauri. For example, the basic core module of the PoolParty Semantic suite is appropriately named Thesaurus Server, since you can easily create thesauri with it, but the default hierarchical display suggests the use for taxonomies, whereas the website's product page says it’s for “Enterprise Taxonomy and Ontology Management.”

Thesauri today

Thesaurus design principles are applicable to both thesauri and taxonomies. Therefore, thesauri continue to be taught in library science and information science degree programs, including courses on information architecture. The book Information Architecture for the Web and Beyond (Rosenfeld, Morville, and Arango)(aka the polar bear book, due to its cover design), even in its 4th edition of 2015, devotes 20 pages, nearly half the chapter “Thesauri, Controlled Vocabularies and Metadata,” to thesauri.

The main impediment to thesauri is that the most common implementations these days, variations of off-the-shelf content management systems (CMS), usually do not support features of thesauri. Associative relationships are rarely supported. Synonyms/nonpreferred terms may be only partially supported (such as in the tagging view but not in retrieval). Thus, we tend to see thesauri implemented only in custom (home-grown) end-user systems, such as those of publishers of information retrieval databases.

Information retrieval thesauri have been around for a long time, and perhaps that is also part of the problem in their acceptance today in business and industry. People may consider thesauri as some kind of legacy knowledge organization system that was more predominant when we only had printed systems, not digital systems. It’s true that thesauri are designed to be useful in print, but their design is also adaptable and relevant to digital implementations. They can also form part of a larger system of interlinked controlled vocabularies.

This brings us to the next topic, ontologies, which can link to thesauri. Next month’s blog post will address the different meanings of ontology.

Sunday, October 25, 2020

Customizing Taxonomy Facets

It has become more common to design faceted taxonomies than purely hierarchical taxonomies, as I discussed in a previous blog post. Faceted taxonomies integrate better with search, serve for filters and sorting, and provide a good user experience for both novices and subject matter experts. The main drawback is that not all kinds of content are suitable for facets. Another reason it seems that most newly created taxonomies created from scratch these days are faceted is that faceted taxonomies really need to be custom created, whereas hierarchical taxonomies, thesauri, and ontologies, can more easily be reused.

A hierarchical taxonomy or a thesaurus represents the topics of a subject area, and the topics are arranged hierarchically (and possibly with additional nonhierarchical, associative interrelationships). In certain subject areas (academic disciplines, medicine, law, economics, engineering, etc.), the same taxonomy or thesaurus may be used in multiple implementations, with perhaps modifications of the level of detail (depth) of the taxonomy used, especially if used for the topics of articles, news, reports, research studies, etc.

A faceted taxonomy is designed to support users’ interaction with the content based on subjects, names, features, and attributes. It’s not simply a matter of finding the most appropriate topics based on drilling down through levels of hierarchy or exploring related topics. The user selects from the faceted taxonomy to filter available content, whether initially or after executing a search to reduce the content set size. Facets for a faceted taxonomy in an enterprise content management system could be for department, document type, business function, market, location, and subject, and the concepts within each facet would be tailored to the content of a specific enterprise. Facets for an ecommerce taxonomy would be customized for the product category, and could be for size, color, user type, technology, and specific categories of product features. Both the facets and the concepts within the facets are custom-designed for the content. The following illustrations show how specific customized facets might be.

Furthermore, the distinction between the narrowest term in a hierarchy and a facet is not always obvious and will depend on the content and search behavior. For example, in an ecommerce taxonomy that includes refrigerators, there would be types for freezer-on-top, freezer-on-bottom, and freezer-on-side. These could be subcategories of refrigerators in a hierarchical taxonomy or they could be facet terms in a facet called Type. Which way to design the taxonomy depends on other factors, such as how many levels deep the hierarchy already is, how many facets there are already, and how many content items would be classified this way.

While hierarchical taxonomies can get quite specific, they are specific in only one area, subjects, and greater detail could be ignored when re-using an existing taxonomy. Facets can get detailed in various areas: subject, document types, events, locations, people, etc., but since facets often don’t have hierarchies within them, you cannot simply ignore greater levels of depth, as you could with a hierarchical taxonomy, if you had hoped to reuse parts of existing facets.

Aside from the matter of reusing taxonomies, there is also the matter of taxonomy design. It is possible, although generally not recommended, to design a hierarchical taxonomy for a subject area without first analyzing the content to be tagged/indexing (although, likely the taxonomy will need revising after tagging begins). It is not possible to design a faceted taxonomy without analyzing the content first. A faceted taxonomy is tied too closely to its content.

So, creating a faceted taxonomy is not an academic exercise, but rather a part of a real-life content management and information retrieval case. I teach a course in taxonomy creation where creating a faceted taxonomy is, in fact, an academic exercise. So, the only to do that properly is to have a clear definition and understanding of the hypothetical content, just as hypothetical users (personas), are needed for user experience design. Designing facets will also be covered in my upcoming half-day virtual conference workshop, "Taxonomy 101," on November 12.

Saturday, September 26, 2020

Adjectives as Terms in Taxonomies

Taxonomies need not follow strict standards, but rather best practices. There are standards for thesauri (ANSI/NISO Z39.19 and ISO 25964), and as taxonomies are similar to thesauri, it’s a good idea to follow thesaurus standards for taxonomy design to the extent applicable. According to thesaurus standards, terms should be nouns or noun phrases, not verbs or adjectives. Similarly, taxonomies usually comprise terms of only nouns or noun phrases. An exception would be in a faceted taxonomy, where there is a facet for a kind of attribute or characteristic, such as color, and there the terms could be adjectives. In this sense, taxonomies are more flexible and have more applications than thesauri do.

Product taxonomies tend to have adjective terms for some of their facets/attributes, including color, size, style, type, status, etc. These kinds of adjectives are reasonably straight-forward, although there may be nuances among colors and styles that are not generally known among the users of the taxonomy. It is rather other, descriptive adjectives that can be more challenging to include in a taxonomy because their meaning tends to be much more subjective than noun-based terms, and thus it’s difficult to tag/index consistently with them.

I recently did some work on a taxonomy where descriptive adjectives were included in an “attribute descriptor” term set or facet. This was a taxonomy for images, including photographs, illustrations and graphical design components. Adjective terms included Elegant, Formal, Funny, Ornate, Simple, Modern, Vintage, among others. I also filled in the role of tagging for a short period of time and found how subjective it was to tag with such adjective terms. I was not confident that I was tagging with such adjectives in a consistent manner. While the adjectives might have seemed like a good idea originally, they were not that practical compared to other components of the taxonomy. Fortunately, the attribute descriptors were not displayed to the user as a dynamic facet but rather supported search, so insufficiencies in adjective tagging were not so obvious.

A recent article in Vogue Business described how adjectives in fashion product ecommerce taxonomies are used, such as by Nordstrom, Rebag, and The Yes. These include terms such as Bright, Chic, Whimsical, Flowy, Billowy, Comfortable, etc. I wouldn’t want to try to tag with those. However, in these cases, the tagging was not manual but automated, using algorithms, hundreds of examples, and machine learning. While auto-categorization is not necessarily more correct than manual tagging, it is more consistent, and when it comes to the subjectivity of adjectives, the challenges are more around consistency than correctness. So, I can see that auto-categorization can be a solution to dealing with the challenges of adjective terms.

Now that it’s established that taxonomies can, in certain circumstances, contain adjective terms, the inclusion of adjectives in a taxonomy should be done with care. If you will have adjectives as terms, my recommendations are:

Keep them separate from other taxonomy terms, by having them in their own term list, vocabulary, or facet.
Ideally, keep the number of adjective terms limited to a few clearly distinguishable terms.
Expect to spend more time and possibly expertise in developing, editing, and maintaining adjective terms than noun-based terms.
Consider implementing auto-categorization (auto-tagging), if resources permit it.
Whether tagging is manual or automated, prepare multiple examples of assets/content items for each adjective term to demonstrate what is the appropriate content for tagging with each adjective.

A thesaurus is more specific than a taxonomy, as a thesaurus has terms for what content is about. A taxonomy has terms for what content is about but other aspects and attributes of content as well. Thus, a taxonomy may include adjectives, whereas a thesaurus does not. Adjective terms, however, should be created with care and special attention to how they will be used in tagging.

Sunday, August 23, 2020

Taxonomy Terms for Different End-Users

The names of taxonomy terms need to be understood by the taxonomy’s users, and all users need to share the same understanding of what the term means. Typically, a taxonomy as two fundamental sets of users: those who tag content with the taxonomy terms and those who retrieve content with the taxonomy terms, the end-users. The taggers can usually be supported by definitions or scope notes for the terms. The end-users rarely have access to such explanatory notes for terms, and even if they did, it would be in some inconvenient collection of documentation that very few end-users would find and read. Therefore, the terms should represent concepts that should be obvious and intuitive the end-users and need no explanation. To this end, it is important to understand the users’ perspective and the terms that they would likely use to describe concepts. User research is thus an important part of taxonomy design.

Some taxonomies have two different end-users, and this is where it can get more complicated. Examples include health information whose end-users include both healthcare providers and patients or their family members; published educational content whose tagging producers are publishers, but the end-users include both students and instructors; marketplace websites who end-users include both sellers and buyers; and job search platforms whose end-users include both employers and job seekers. It is important in these cases that the different kinds of end-users have the same understanding of what a term means, but this sometimes not the case.

Example: The Problem with “Entry Level”

I recently noticed an example of a taxonomy term in the case of job search platforms (LinkedIn, Glassdoor, Indeed, etc.) that seemed to be understood differently by employers and job seekers. There are several controlled vocabularies that can be used in the “advanced” (or faceted) job search features. Job type (Full-time, Part-time, Contract, Temporary, etc.), Location, Company, Industry, and Experience Level. I took an interest in Experience Level (also called Seniority Level), because I wanted to help identify additional jobs for my daughter, who had just graduated from college. So, I selected the filter for "Entry level." The other options include Internship, Entry Level, Associate (in LinkedIn), Mid Senior Level, Director, and Executive.

Experience level options in Glassdoor, LinkedIn, and Indeed

I was dismayed to see so may jobs classified as “Entry level” requiring at least 2 years and sometimes as many as 5 years of experience. That is certainly not entry-level by the definition of a recent college graduate.

Then one day (after my daughter found a job) I noticed a job posting for a taxonomist that on LinkedIn was classified as Entry level. It required at least 2 years of experience designing and managing taxonomies. It was clearly not an entry level for fresh college graduate. This time, however, I was looking at the job differently. I was familiar with the employer, and it was clear that for the employer this was an entry-level professional position in their firm. Even though prior experience was expected, this was the most junior professional position available. So, apparently the human resources representative of the company considered it entry-level compared to other jobs they might hire for and classified it that way. It became obvious that employers and job-seekers do not use the same terms, such as “Entry level” to mean the same thing.

How to make the term Entry level clear, short of creating a definition or scope not that the users/end-users will never read, might be to replace it with two other terms, one for Recent grad and one for Junior associate, but the exact wording may still have drawbacks and requires more research. Simplicity and elegance may have to be sacrificed for clarity. This is just one of the many trade-offs to deal with when creating taxonomies.

Sunday, July 19, 2020

How Many Facets Should a Taxonomy Have

I’ve given a rule-of-thumb of 3-8 facets to create in a faceted taxonomy, but it’s not that simple, and there are various factors to consider. Creating facets is an assignment in the online taxonomy course I teach, and a student recently submitted good set of facets with sample terms, but there were 12 of them. So, why might that be too many facets?

Schematic diagram of a set of four facets.

Consider the users.

Are the users internal trained employees who deal with content, most or all of the employees of an organization, external but repeat users such as partners or researchers, or the general public? Internal employees, especially those who are content managers or digital asset managers, who receive some training to become familiar with the facets, should be able to handle any number of facets. It is their job to classify and/or retrieve content by their facets, so they should have the time and inclination to go through a long list of facets. A broader cross-section of employees or external repeat users may have access to documentation but not read it, will likely not be trained, and are often more rushed when they deal with content, so a shorter list of facets would be more suitable. Finally, the general public is likely to use only facets that are easy to understand and fit into the window display (not requiring scrolling), so a relatively short list of facets is recommended for them.

Consider the content.

In addition to considering the shared attributes of the content, as you cannot create more facets than conceptually exist for the content, you need to consider the volume of the content. A relatively small collection of content items or assets does not need as many facets as filters than a larger collection of content does. If users select a term from each facet, they should not be getting zero results or just one or two items too often. Remember, the main use of facets is to filter and limit results down to a list that can then be easily browsed. If the user retrieves only one or two results, however, they will likely consider the search as too narrow and try again to broaden it.

Microbial Life Education Resources website facets

Microbial Life Educational Resources
facets are just enough to fill the
length of a computer monitor display.

Consider the user interface.

Sometime the taxonomy has influence over the user interface design, such as when it’s an internally designed research portal, but often content management system do not offer much flexibility in how facets are displayed. The first thing to consider is how many facets will be displayed by default in the initial screen view (without scrolling) in the most commonly used devices. If facets can be collapsed to show only the facet names and not any values/terms within them, then a greater number of facets can more easily be included. Hiding the values, however, might not be desired, since the display of sample values makes it clear to the user what the facets are for.

Consider what constitutes a facet or filter.

What may be considered a “faceted taxonomy” is only a subset of all the possible metadata properties of the content. Some of the other, default non-taxonomy metadata (such as date, creator, file name or title, or file type) may also be desired as end-user filters alongside the taxonomy facets, which then further increases the number of filters or refinements displayed to the user, who sees no difference between taxonomy facts and non-taxonomy filters.

There is no strict definition of a “taxonomy facet.” I would say it is a facet whose values or terms must be created by a person, such as a taxonomist or metadata architect, rather than those that are system-generated. In addition, taxonomy terms are those that must be tagged to the content, rather than already being a part of content. For example, if File Format is based on the file extension, then it is already part of the content and need not be “tagged,” so it’s not a taxonomy facet by my definition.

A faceted taxonomy is more, though, than a single facet of topics alongside other non-taxonomy metadata. The idea behind creating a faceted taxonomy is to split up what could be a large hierarchical taxonomy into different aspects. For example, instead of having a term Business service agreements that is in a hierarchy narrower to both Vendor contracts and Business services, you could have just the term Vendor contracts in the Document Type facet and Business services in the Business Type facet, and the combination of the terms from each facet will suffice.

Faceted taxonomies, more so than hierarchical taxonomies or thesauri, need to consider the factors of users, content, and user interface when it comes to their design.