Monday, September 30, 2024

Topical Taxonomies for Filtering Searches

PoolParty GraphSearch
We taxonomists have long been advocating how a taxonomy of disambiguated concepts tagged to content retrieves more accurate results than search algorithms alone. But if users prefer simply entering text strings into a search box and not browsing taxonomies, how best to support users with a taxonomy can be a challenge.

A faceted taxonomy with taxonomy aspects as filters for refining search results has become a common taxonomy solution, especially for intranets, partner portals, and knowledge bases. For these purposes, certain facets, such as Content type, Product/Service, Location, and Department, are common and logical. When it comes to the designating “Topics,” however, it’s not so easy.

Specific Terms Gathered from Analysis

When gathering information and sources for terms, most sources will yield highly specific terms. These include terms arising from search log analysis, brainstorming sessions with sample users, automated text analytics term extraction from a large corpus of content and manual review a representative sample of documents/pages. These are all standard methods for taxonomy design, which I conduct as a consultant.

The difficulty is that there are often so many specific topics, so the new topical taxonomy could potentially have many hundreds of terms. Some may be relevant to only one or two documents or occurred in only a couple of searches out of thousands. They would not serve the purpose to refine searches.

Another problem is that many of the terms suggested from these methods are not even topical. Often, the top searches found in search logs of enterprise/intranet searches are for commonly used named tools, platforms, or services.

The main issue, however, in deriving terms for a topical facet/filter based on search terms is that the objective of the topical facet, like all facets, is to limit searches, not to duplicate searches. What is really needed in the topical facet are topical categories that are broader than the search terms. How to identify these broader topical categories can be more challenging.

Identifying Broader Topical Categories

Identifying broader terms or categories for topic filters is not as simple as identifying specific search terms, nor as straightforward as identifying the set of facets. Typical methods of obtaining candidate terms from both users and from the content need to be done, but with a focus on identifying broader terms or categories.

Categories from Stakeholder Engagement

Engaging stakeholders or other sample users in activities to brainstorm taxonomy terms will result in a mix of specific and broad terms. It is then the task of the taxonomist-facilitator to help guide the participants to identify which terms are broader and which are narrower within the same topical facet. Involving stakeholders/sample users is important, because if a single taxonomist or an external consulting team tries to do this on their own, their designated broader terms, while hierarchically correct, might not suit the intended users. The taxonomist-facilitator may suggest broader terms and then obtain immediate validation from the participants of the appropriateness of those suggestions.

Categories from Content Analysis

Analyzing content for broad topics is more effectively done manually than with automated methods. Manual content analysis will yield both specific and potentially broader concepts. A taxonomist or content strategist experienced in content analysis for identifying meaning will be able to determine the main concept for a piece of content.

Automated methods, based on text analytics technologies, tend to focus on term extraction, and will extract terms even more specific and less useful than search log results.  However, if a list of derived search terms is large enough (as may search logs or automated term extraction lists tend to be), another, newer option is to make use of LLM and generative AI technologies to categorize the specific terms and thus generate broader terms. The LLMs should be trained on the same or similar content, which is internal enterprise content, not the public web, to provide the correct context. Even then, the identified broader terms or categories will not always be correct and will require an experienced taxonomist to review.

Other Topical Facets

Topical terms, however, do not all have to be in a single “Topics,” facet. Depending on the use case, there could be other topical facets, which are not the usual named entities, departments, locations, or product/service types. These could be for Function, Activity, Issue Type, Technology, Research Field/Discipline, etc. If and how to break out these facets can be a challenge and should involve extensive discussions or other research with stakeholders and user representatives.

Finally, a topical facet for filtering search results could even be based on the existing navigation menu’s top levels, especially on an intranet or an enterprise content management system. Facets as filters are available to refine searches only, but if users choose instead to navigate the site menu, then they have no options to use other facets/aspects to help restrict what they are looking for. By duplicating the navigation menu’s one or two top levels into a facet, perhaps called “Topic Area,” users can limit a search with the categories for the areas with which they are familiar, and they can also restrict the search further by filtering on terms selected from any of the other facets.

I will be discussing the wider activity of coming up with terms for a taxonomy in my upcoming Taxonomy Boot Camp presentation, “The Complete Guide to Sourcing Terms” November 18, in Washington, DC. 


No comments:

Post a Comment