The Accidental Taxonomist: Faceted taxonomy

Showing posts with label Faceted taxonomy. Show all posts

Monday, May 5, 2025

Taxonomies and Attribute Data

In the past (such as my 2021 blog post "Attributes in Taxonomies"), I have explained that “attributes” serve as filters to refine search results on content, results that have already been narrowed by a hierarchical taxonomy concept or category. As such, the attributes available for filtering can vary based on a taxonomy concept or category that had been selected. To the end user, high-level taxonomy facets and attributes both function similarly as filters, and the distinction between facets and attributes may not be apparent. If the distinction is not noticeable to end users, then then facets and attributes may be confused. It’s best to describe attributes for what they are, and not merely by what they can do. That’s that this blog post aims to do.

Attributes

Data is information in the form of specific values that are relevant to something such as an asset, object, product, person, event, or transaction. Since data is relevant to something else, we can refer to data as an “attribute “of something. When attributes are standardized and used in information/data management, then attributes are metadata. Metadata schema are structures to organize data.

Examples of attribute metadata are:

for people: birth date, gender, occupation, nationality, phone number
for products: brand, price, color, size, SKU number
for documents: title, author, publication date, language, word count, publication status, file type

Almost all metadata, both descriptive and administrative, are attributes of something. (Only structural metadata, that which is used to mark up text, would not be an attribute.) Attributes, as metadata, can serve various purposes, including identification, comparison, sorting, filtering, and finding something based on its attributes.

Attribute values may be of different types: text, numbers, dates, or yes/no (also called “Boolean”). As text strings, attribute values may be uncontrolled free text or terms from a controlled list.

Taxonomies

Taxonomies are structures of concepts, which are used primarily for tagging and retrieval of content, although there are secondary uses. The concepts include subjects and named entities. In all cases, the concepts are of controlled vocabularies. The structures may be primarily hierarchical or primarily faceted, although a combination, such as limited hierarchies within a facet, is also possible. The structure of the taxonomy provides context for tagging and supports interaction by users.

When a taxonomy is structured into facets, typically each facet serves also as a metadata property. A hierarchical topical taxonomy can also provide values for a metadata property. Taxonomies are structures to organize controlled vocabulary concepts.

Examples of taxonomy facets include:

Topics
Activities
Industries
Product/service types
Brand names
Companies
Organizations
Names of people
Types of people/Roles
Events/Occasions

Thus, the types of things that are facets are usually not the same types of things that are considered attributes.

Metadata schema are structures to organize data, whereas taxonomies are structures to organize controlled vocabulary concepts that can populate metadata properties.

Where Attributes and Taxonomies Overlap

Considering again the examples of different types of attributes for different things, there are some attributes that could be managed in a “taxonomy” instead of merely as “attributes”:

For people: Name
For products: Product type/category
For documents: Subject/topic

Technically, each of these characteristics is also an attribute, but it is usually more practical to manage them as taxonomies so that they can support the implemented benefits of a taxonomy, such as semantic tagging, searching (including type-ahead search suggest), and browsing.

Thus, when we talk about “attributes” in the context of taxonomies, we mean those characteristics of something that are better managed as attributes and not managed as taxonomies. The decision is one of knowledge modeling.

For example, to support the refinement of searches, a taxonomy of expert people for an organization may have the following taxonomy facets:

Name
Subject of expertise
Organizational unit
Location

Then in addition to the facets, the taxonomy may have the following attributes associated with each record of a person:

Job title
Academic degree
Email address
Phone number
URL of headshot image

This is selected data of interest, but not values that are used in initial search or browsing for finding and retrieving content. Attributes are metadata, and taxonomy facets are also metadata, but that does not mean that they are the same, because different metadata can have different functions or purposes.

Ontologies: Bridging Taxonomies and Attributes

When we enrich a taxonomy with features of an ontology, not only can we add semantic relationships, but we can also add attributes to taxonomy concepts. Usually, when taxonomists first learn about ontologies, they think primarily of the addition of customized relationships between concepts, and they might not be aware of the importance of the addition of attributes.

In ontologies, semantic relationships are formally called “object properties,” and attributes are called “datatype properties.” Both are equally important. Meanwhile, the feature of “classes” in an ontology typically corresponds to taxonomy concept schemes or facets.

To add attributes to a taxonomy, the best way to do it is through adding an ontology, which can be very simple and not even include semantic relationships. As the availability of different attributes may vary based on a hierarchy branch of concepts, this can be managed by creating classes, which are assigned to hierarchical branches, facets, or concept schemes. Then, attributes (datatype properties) are applied and used with concepts based on the class the concept belongs to.

Conclusion

The following table summarized the differences between taxonomy facets and attributes.

Taxonomy Facets	Attributes
Basic structure of many taxonomies	Additional data added to taxonomies
Controlled vocabularies	Controlled or uncontrolled terms, text, numbers, dates, Boolean options, etc.
Concepts as nouns or noun phrases	If text, any kind of text string
Top organizational level of a taxonomy	Values relevant to any taxonomy concept
Concept Schemes in SKOS, or Classes in an OWL ontology	Metadata on a concept, or datatype properties in an OWL ontology

Monday, September 30, 2024

Topical Taxonomies for Filtering Searches

PoolParty GraphSearch

We taxonomists have long been advocating how a taxonomy of disambiguated concepts tagged to content retrieves more accurate results than search algorithms alone. But if users prefer simply entering text strings into a search box and not browsing taxonomies, how best to support users with a taxonomy can be a challenge.

A faceted taxonomy with taxonomy aspects as filters for refining search results has become a common taxonomy solution, especially for intranets, partner portals, and knowledge bases. For these purposes, certain facets, such as Content type, Product/Service, Location, and Department, are common and logical. When it comes to the designating “Topics,” however, it’s not so easy.

Specific Terms Gathered from Analysis

When gathering information and sources for terms, most sources will yield highly specific terms. These include terms arising from search log analysis, brainstorming sessions with sample users, automated text analytics term extraction from a large corpus of content and manual review a representative sample of documents/pages. These are all standard methods for taxonomy design, which I conduct as a consultant.

The difficulty is that there are often so many specific topics, so the new topical taxonomy could potentially have many hundreds of terms. Some may be relevant to only one or two documents or occurred in only a couple of searches out of thousands. They would not serve the purpose to refine searches.

Another problem is that many of the terms suggested from these methods are not even topical. Often, the top searches found in search logs of enterprise/intranet searches are for commonly used named tools, platforms, or services.

The main issue, however, in deriving terms for a topical facet/filter based on search terms is that the objective of the topical facet, like all facets, is to limit searches, not to duplicate searches. What is really needed in the topical facet are topical categories that are broader than the search terms. How to identify these broader topical categories can be more challenging.

Identifying Broader Topical Categories

Identifying broader terms or categories for topic filters is not as simple as identifying specific search terms, nor as straightforward as identifying the set of facets. Typical methods of obtaining candidate terms from both users and from the content need to be done, but with a focus on identifying broader terms or categories.

Categories from Stakeholder Engagement

Engaging stakeholders or other sample users in activities to brainstorm taxonomy terms will result in a mix of specific and broad terms. It is then the task of the taxonomist-facilitator to help guide the participants to identify which terms are broader and which are narrower within the same topical facet. Involving stakeholders/sample users is important, because if a single taxonomist or an external consulting team tries to do this on their own, their designated broader terms, while hierarchically correct, might not suit the intended users. The taxonomist-facilitator may suggest broader terms and then obtain immediate validation from the participants of the appropriateness of those suggestions.

Categories from Content Analysis

Analyzing content for broad topics is more effectively done manually than with automated methods. Manual content analysis will yield both specific and potentially broader concepts. A taxonomist or content strategist experienced in content analysis for identifying meaning will be able to determine the main concept for a piece of content.

Automated methods, based on text analytics technologies, tend to focus on term extraction, and will extract terms even more specific and less useful than search log results. However, if a list of derived search terms is large enough (as may search logs or automated term extraction lists tend to be), another, newer option is to make use of LLM and generative AI technologies to categorize the specific terms and thus generate broader terms. The LLMs should be trained on the same or similar content, which is internal enterprise content, not the public web, to provide the correct context. Even then, the identified broader terms or categories will not always be correct and will require an experienced taxonomist to review.

Saturday, February 24, 2024

Faceted Classification and Faceted Taxonomies

I have argued before that a taxonomy is not the same as a classification system, despite the original meaning of the word taxonomy as a system for classification. (See the blog post Classification Systems vs. Taxonomies.) Modern taxonomies that are used to support information management and findability are more similar to information retrieval thesauri and subject heading schemes than they are to classification systems. Another type of classification, the method of “faceted classification,” however, does apply to types of taxonomies. I would not consider “faceted classification” as exactly a synonym, though, to “faceted taxonomy,” as not all faceted taxonomies are the same.

What is faceted classification?

Facets for jobs

Facet means face, side, dimension, or aspect. In this sense, facets are meant to mean aspects of classification. A diamond, an object, or a digital content item is multi-faceted. A digital content item (text document, presentation, image, video, etc.) has multiple informational dimensions or aspects to it and thus multiple ways to be classified.

Classification is about putting an item, such as a content item (document, page, or digital asset) into a class or category. If it’s a physical object (a book) it goes into a shelf of its class. In faceted classification, an item cannot physically be in more than one place, but it can still be “assigned to” more than one class. So, while the book itself can be on only one shelf, the record about the book can be assigned to more than one class.

Faceted classification assigns classes/categories/terms/concept from each of multiple facets to a content item, allowing users to find the item by choosing the concepts from any one of the facets they consider first. Different users will consider different classification facets first. Users then narrow the search results by selecting concepts from additional facets in any order they wish, until they get a targeted result set meeting the criteria of multiple facet selections. The user interface of faceted classification is sometimes referred to as faceted browsing.

History of faceted classification

The idea of faceted classification as a superior alternative to traditional hierarchical classification, whereby an item (such as book or article) can be classified in multiple different ways instead of in just a single classification class/category, is not new. The first such faceted classification was developed and published by mathematician/librarian S.R. Ranganathan in 1933, as an alternative to the Dewey Decimal System for classifying books, called Colon Classification (since the colon punctuation was originally used to separate the multiple facets). In addition to subject categories, it has the following facets:

Personality – topic or orientation
Matter – things or materials
Energy – actions
Space – places or locations
Time – times or time periods

Although it was not adopted widely internationally due to its complexities in the pre-digital era, colon classification has been used by libraries in India.

In the late 20^th century, digital library research systems based on databases enabled faceted classification and search, with different fields of a database record represented in different search facets. Users interacted with through an “advanced search” form of multiple fields. Faceted classification and browsing gained widespread adoption with the advancement of interactive user interfaces on websites and in web applications in the late 1990s and early 2000s. Thus, facets started being displayed in more user-friendly ways that were no longer “advanced.”

Structure of facets

It’s not necessary to follow Ranganathan’s suggested five facets, but that’s a good way to get thinking about faceted classification. Another way to look at faceted classification is to consider a facet for each of various question words: What, Who, Where, When

What kind of thing is it – content type
What is it primarily about - subject
Who is it for or concerns – audience or user group
Where is it for/applicable, or where it depicts (media) – geographic region
When it is about – event or season (not date of creation, which is administrative metadata, instead of a taxonomy concept)

The additional question words of “why” and “how” are relevant in some cases, but less common. An individual content item typically does not address all of these questions, but usually addresses more than one. When creating facets, most of the facet types should be applicable to most of the content types.

Another good way to think about faceted classification is to put the word “by” after each facet, to suggest classification and filtering “by” the aspect type. A logical and practical number of facets tends to be in the range of three to seven.

A standard feature of facets is that they are mutually exclusive. A concept/type belongs to only one facet. This is typical practice for the design of classification systems. The difference is that in faceted classification it is merely the concept/type/term that belongs to just one facet, not the content item or thing itself that would belong to only one classification in traditional classification systems.

When a faceted taxonomy is not for classification

The design, implementation and use of facets to construct or refine searches has become so popular that it is no longer used just for classification aspects. Rather, a faceted taxonomy design may be used for any faceted grouping of concepts for search or metadata types that are relevant for the content and users.

Faceted classification is intended to classify things that share all the same facets. For example, all technical documentation content has a product, feature, issue, and content type, so these are faceted classifications. But with more heterogeneous content, facets are not universally shared. While the facets may still be useful tool, it would be best not call it faceted classification when facets are applicable to only some content types.

While faceted classification tends to be quite limited in the number of its facets, non-classification faceted taxonomies, whether based on subject types or separate controlled vocabularies, could result in a rather large number of facets.

Faceted taxonomies that would not be considered faceted classification include those where multiple facets are created for organizing and breaking down subjects or when multiple facets are created for reflecting multiple different controlled vocabularies. These faceted taxonomies stretch the meaning of “facet,” since the facets are not necessarily faces, dimensions, or aspects, but simply “types” suitable for filtering.

Facets for organizing subjects

In faceted classification we assign an object or content item to multiple different classes. However, for classification, these classes are relevant to the content item as a whole. This contrasts with indexing or tagging for subjects or names of relevance that occur within a text or are depicted within a media asset. These names and subjects can be grouped into facets for filtering/limiting search results, without being about the “classification” of the content item. This is common for specialized subject areas. Faceted taxonomies provide a form of guided navigation and are easier to browse and use than deep hierarchical taxonomies, so a large “subject” taxonomy could be broken down into specific subject-type facets.

Examples of specific subject-type facets include:

Organization types
Product types
Technologies
Activities
Industries
Disciplines
Job roles
Event types
Topics

The “Topics” facet is then used for the leftover generic subject concepts that do not belong in any of the other specialized facets. Unlike faceted classification, each facet is applicable to only some content items.

Any content item could be tagged with any number of concepts from any number of these facets. The facets make it easier for user to find taxonomy concepts and combine them. But the facets are not for “classifying” the content.

While faceted taxonomies should also ideally be mutually exclusive, in contrast to the principle of faceted classification, the occasional exception of a concept belonging to more than one subject-type facet (question word of “What”) does not create a problem in search. For example, the same concept Data catalogs, could be in the facet Product Types and Technologies, as long as this type of polyhierarchy is kept to a minimum to avoid confusion. This would not be considered a case of classic polyhierarchy, because it’s not simply a matter of different broader concepts, but rather different facets or concept schemes. It is an attempt to address a different focus or approach to the topic that results it being in more than one facet, offering an additional starting point for searchers.

Facets for organizing controlled vocabularies

Faceted filters/refinement may be based on different controlled vocabulary types: one or more of term lists, name authorities, and subject thesauri/taxonomies. The “facets” are based on how the set of multiple controlled vocabularies is organized rather than based on “aspects” of the content.

Facets could be used for any controlled vocabulary filters that are logical, such as:

Named people (mentioned/discussed)
Organizations (mentioned/discussed)
Products/brands (mentioned/discussed)
Divisions, departments, units (mentioned/discussed)
Named works/document titles (mentioned/discussed)
Places (mentioned/discussed)
Topics (mentioned/discussed)

Because these facets reflect controlled vocabularies of concepts used to tag content for relevant occurrences of the subject/name and not for classification of the content, this kind of faceted taxonomy would not be considered faceted classification. There could, however, be additional faceted classification types, such as content type.

The Topics facet could contain a large hierarchical taxonomy or thesaurus. As such, this faceted search/browse structure, may not even be considered a “faceted taxonomy,” but rather merely a faceted search interface to a set of taxonomies. Thus, there is even a nuanced difference between a faceted browse UI that utilizes at taxonomy (among other controlled vocabularies), and a “faceted taxonomy.”

Facets for heterogeneous content

Finally, whether a faceted taxonomy is considered an implementation of faceted “classification” or not may depend on the context and type of content. If the content is homogeneous and all items share the same facets, then it may be considered faceted classification, but if the content is heterogeneous, and the facets are only relevant to some content, then it would not be considered classification.

Consider the following example of specialized subject-based facets for the field of medicine:

Diseases or conditions
Body parts (anatomy)
Sign and symptoms
Treatments
Patient population types

If all the content comprised just clinical case studies, then these facets actually could be considered faceted classification, since they all apply to nearly all the content and are aspects of the content. The content is classified by these facets. On the other hand, if the content dealt with all kinds of documents that had something to do with health or medicine, then these facets would not be for classification of the content but rather just for grouping of subjects for search filters.

When faceted classification is not a taxonomy

Attributes for computers

Finally, I would not consider all faceted structures to be faceted taxonomies.

Taxonomies are primarily for subjects and may include named entities. Content types/document types may also be included in the scope of taxonomy. There exists additional metadata that may be desired for filtering/refining searches that is out of scope of a definition of taxonomy. This includes date published/uploaded, file format, author/creator, document/approval status, etc. If it is important to the end users, these additional metadata properties could be included among the browsable facets and be considered classification aspects.

Attributes are a form of faceted classification, but a set of attributes is not really a faceted taxonomy. Often ecommerce taxonomies are presented as examples of faceted taxonomies. In fact, ecommerce taxonomies tend to be hierarchical, as they present categories and subcategories of types of products for the users to browse. At lower, more specific levels of the hierarchy, the user then has the additional option to narrow the results further by selecting values from various attributes that are shared among the products within the same product category. These include color, size/dimensions, price range, and product-specific features. I would not consider numeric values to be a taxonomy, but some attributes, such as for features, are more within the realm of taxonomies. Whether these should be called facets or attributes is a matter of debate. More about attributes is discussed in my past blog post “Attributes in Taxonomies.”

Conclusions

Not all faceted taxonomies are faceted classifications, but some are. Not all faceted classifications are taxonomies, but some are. The differences are nuanced, and end-users may not care nor need to know these naming distinctions, as long as the taxonomist should. Having a deep understanding of facets helps taxonomists and information architects design the facets better. The goal is to serve the users with the most suitable faceted design to serve their needs and accommodate the set of content.

Tuesday, October 31, 2023

Taxonomies for Learning and Training Content

Taxonomies are primarily for tagging digital content to make it more easily found when users search or browse on taxonomy concepts. Content can be of various kinds: articles and research reports, policies and procedures, technical documentation, product information, contracts and other legal documents, marketing content, etc. A growing area of digital content is instructional or training content, especially corporate training for employees.

The need for taxonomies for training content

When an organization offers its employees a large number of training courses, it can be difficult for employees to find desired training. Having the training content tagged with controlled terms from a taxonomy makes it easier to find.

The training content may come from different sources and thus may come with different, inconsistent metadata already applied to it. An organization may have generic training (such as on diversity and information security) produced by a corporate training company, industry-specific training (such as anti-money laundering for financial services and retail industries) produced by a different training company, and company-specific training which is internally produced. An organization may also subscribe to an offering of business skills and technical skills training offered by one ore more third party, such as LinkedIn Learning. It may be very difficult to search across all these different sources.

Furthermore, simply searching on words in training course titles might not be effective, if topics are broad or the course titles are vague. For example, a search on “communication” may yield far too many results to sort through. A search on “writing” might miss a training course with a title of “Bringing out Your Voice” or “Use Plain Language.” Tagged with the concept of “Writing,” these courses can then be found.

Faceted taxonomies for training content

Sample faceted taxonomy for
training content in PoolParty

For the complexities of training content, a single topical taxonomy is not enough. There could be ambiguity as to the skill level or between training topic and training format. For example, the topic of “Manager training” is not clear as to whether it is for new managers or all managers. The topic of “Presentation slides” is not clear as to whether it is training on how to create presentation slides or if presentation slides is the training format/medium. This is where a faceted taxonomy can help. Facets are different aspects of content which can be combined as search filters.

Training content is especially well suited for facets. Examples of possible facets for training content are: Content type, Level, Role, Skill, Training Program, and Topic. An example of taxonomy terms in each facet are as follows:
•   Content type: Video training
•   Level: Intermediate
•   Role: Customer support
•   Skill: Written communication
•   Training program: Upskilling
•   Topic: Timeliness

It’s important to keep in mind that facets should be mutually exclusive, so the same concept, such as “Customer support,” cannot exist in both the Role and the Skill facets. Distinguishing a role and a skill can sometimes be difficult. It important to separate out Role, though, because then there is the possibility to recommend training courses based on one’s Role.

Taxonomy facets are based on metadata properties, but there likely exist many more metadata properties than needed for the end-user to filter train content searches. Additional, administrative metadata properties should not be implemented on the front-end for course searches. These might include Organizational unit, Original source, Region, Access Level, etc.

Skills taxonomy sources and challenges

Developing a skills taxonomy facet has its own challenges. First of all, there are multiple goals of skills taxonomies. Enabling employees or their managers to find appropriate training is just one goal. Other purposes may be to describe job openings to found by candidates with matching skills, to find an expert with a desired skill to ask question of or have work on a project, or to map roles and skills to identify gaps and improve human resources strategies and professional development programs.

There are also varied sources for skills taxonomies. Managers and subject matter experts would list certain skills, which might differ from a list of skills proposed by human resources staff. A taxonomist, metadata specialist, or information architect working on a taxonomy would come up with a slightly different list of skills, probably not as detailed. Finally, there are external sources, but these might not be appropriate to a specific organization. The largest, best known published taxonomy of skills is ESCO (European Skills, Competences, Qualifications, and Occupations), but with 13,890 skills, it is much too large and detailed for any one organization. It might be best to start with any skills list that the HR department has and build it out further with recommendations from managers, but not as detailed as some subject matter experts might suggest. External sources could be consulted to fill in some gaps.

There is the potential to get too detailed in creating a hierarchy of skills, and some of the narrower concepts may end up being specific topics and not exactly skills. For example, a skill of project management could get narrower concepts for different project management methodologies and then various components of each methodology. This is would not be appropriate for a skills taxonomy, although, if important, these narrower concepts could be included in a Topics facet instead.

Presentations on taxonomies for corporate training content

My most recent conference presentation and my next conference presentation are both about taxonomies for corporate training content. On October 16, I presented at the LavaCon content strategy conference in San Diego “Leveraging Semantics to Provide Targeted Training Content: A Case Study,” which was jointly presented with PoolParty software proof-of-concept project customer Esther Yoon of Google gTech. In addition to some of the issues described in this blog post, I also discussed how facets can be customized and how roles and skills can be linked for recommendation, and Esther presented how the POC improved the discovery of training content for those in roles related to customer support.

On November 6, at Taxonomy Boot Camp conference in Washington, DC, I will present “Challenges in Creating Taxonomies for Learning & Development,” which will be jointly presented with Amber Simpson of Walmart’s Walmart Academy, also a PoolParty software customer. In addition to issues described here, I will also provide specific examples of challenges in creation a Skills taxonomy facet. The slides will also be made available afterwards.