Monday, May 20, 2024

Tagging with a New Taxonomy


The benefits to information users of having content tagged with a taxonomy are great. They include increased accuracy and comprehensiveness of search results, speed and efficiency in obtaining results, the ability filter search results, the opportunity to explore and discover related information, greater confidence in the completeness of results, and an overall better user experience. The benefits are worth the challenges of creating a taxonomy, and the benefits should be worth the challenges of properly tagging with a taxonomy as well.


Often the greatest challenge to taxonomy adoption is the ability to tag all of the content with the taxonomy terms as intended. Issues include allocating resources for tagging, implementing a new content management workflow, establishing criteria and quality control for tagging, and tagging a large volume of legacy
untagged content.

Tagging Resources

While taxonomy development has one-time project expenses (such as the hours of consultant or contractor), the ongoing tagging with a taxonomy requires an annual budget on top of some startup expenses, whether tagging is manual or automated. Manual tagging requires budgeting for the working hours, while auto-tagging typically requires an annual software license. Automated also requires some human involvement for quality checks and refinements of tagging parameters.

Which method, manual or automated, to choose depends on the volume and speed of tagging required, the nature of the content, and the need for accuracy. Automated methods are more cost effective for large volumes of content tagging and can tag more quickly. Automated (AI) methods can tag text or images, but the same tool/technology does not do both, so for mixed content, manual tagging may be a more practical and affordable option. Automated methods are also better for content of a consistent type (e.g. all resumes, all news, all technical support articles), whereas a diversity of content (e.g. everything on the intranet or on the public website), can be tagged more accurately if done manually. Manual tagging may not be as consistent as automated methods, but unlike automated tagging, it is rarely wrong. If 10-15% mis-tagged content cannot be tolerated, then manual tagging may be preferred.

Automated tagging is not free from manual labor. If tagging is done by machine learning, then the machine needs to learn from examples, and sample tagged content may need to be prepared and submitted to the system as such examples. If tagging is done by rules, then rules need to be written for most of the taxonomy concepts. Prebuilt starter taxonomies may be pre-trained or have tagging rules included, though, but they likely will need refinement. In fact, any auto-tagging needs to be tuned and refined as the content and the taxonomy evolve.

Tagging Workflow

Whether manual or automated, tagging content requires setting up new content management workflows. It needs to be determined who does the tagging: the author, the editor, or someone else. Unless trained professional indexers tag the content, tagging review by an editor may be desired.

While manual tagging can be done within the same system (some kind of content management system) where the content is stored, these systems usually don’t have the functionality of auto-tagging built in. Automated tagging is typically done by establishing an integration between the auto-tagging tool (which may be a module of a taxonomy management system) and the content management system and the setting up of a data “pipeline” for the tagging tool. Setting this up may require some additionally billed services of the software vendor.

Also as part of the tagging workflow should be a method for taggers or those who review automated tagging to be able to suggest new terms to add to the taxonomy, as they see new concepts in the content.

Tagging Standards

Establishing criteria and quality control for tagging begins with setting tagging policy and guidelines. This includes setting the policy regarding to what detail to tag, how many terms of each type may be tagged to a single piece of content, whether a certain taxonomy term type is required or not for tagging, and whether the tagging of certain terms should trigger the additional tagging of another term (such as a broader term). These policies can be set as parameters for auto-tagging. For manual tagging, some of the tagging policies can be system enforced, but other policies cannot be.

Tagging has both policies (rules) and guidelines (best practices/recommendations).  A policy, for example, would be the minimum and maximum number of tags permitted, whereas a guideline would be a suggested narrower range of tags.

Whether manual or automated, tagging should be occasionally checked for accuracy, as a periodic quality control function. Based on the results, revisions may be needed for the taxonomy, and/or the tagging guidelines/policies may need to be revised.

Legacy Content Tagging

Even if there is an established workflow for tagging newly added content, there is the challenge of tagging all the legacy content that is already in the system. It’s rare that a taxonomy is implemented before any content is already collected and made available for searching.

Automated tagging may be a good way to handle the backlog of untagged content. However, software is intended to be licensed for at least a year and be a part of the regular workflow, rather than for a one-time backlog tagging project. So, the long-term use of auto-tagging software needs to be considered.

If manual tagging only will be the selected method for the long-term, then you should consider the tagging services of a freelancer, contractor, temp, or intern (library science student) to take care of tagging the initial backlog of content. Freelance indexers can be found through the American Society for Indexing and indexing societies in other countries. They prefer to call the activity “indexing,” rather than “tagging.”

While taxonomy creation is a project, taxonomy management and maintenance are an on-going program, and it’s the same with tagging. Backlog tagging will be a project, but ongoing tagging is a related program, and should be related to taxonomy management and maintenance. Tagging should be an important part of an information and content management strategy and not an afterthought.

No comments:

Post a Comment