The Accidental Taxonomist

Thursday, September 30, 2021

Taxonomies for Human Resources

I just attended HR Technology Conference this week, my first time at an industry or functional specialty conference, so it was interesting to learn how taxonomies could be positioned within this specialized sector. I usually speak or write about taxonomies as useful in general knowledge and information management, with the only specialization discussed in ecommerce.

Human resources technology is a broad category, which includes software for such functions as benefits, compensation, engagement and recognition, learning management, onboarding, payroll, recruitment, screening, time and attendance, wellness, etc. Taxonomies are not particularly relevant for most of these areas, but are for some, such as talent management systems, job boards, and intranets. Performance management and training management may also benefit from taxonomies.

In the conference opening keynote “HR Technology Reinvented: The Big Shift Towards Work Tech” presented by Josh Bersin, I was pleased to hear that this HR technology industry analyst had as #3 among his industry trends: “Skills taxonomies are the next big thing,” and he had a slide illustrating how a “taxonomy is more complex than you think.” Reasons Bersin gave for the complexity: a skill is not well defined, skills differ even in the same industry, and companies cannot trust black box skills.

Taxonomies may also be implemented as part of a knowledge graph solution that links data in multiple applications, systems, and repositories, which is a typical scenario for HR technology, despite the existence of some degree of integration of functions within HR management systems (HRMS) or human capital management (HCM) software.

Another point that Bersin made was that the talent marketplace has become a category. It’s become more important to recruit and hire internally, so an internal marketplace for employees and jobs can be created. I find this also an interesting application for taxonomies. Taxonomies in business and industry are well established and known for ecommerce, which is B2C, but more recently taxonomies have been implemented in B2B and C2C marketplaces, such as Etsy. In an employee-job marketplace, taxonomies can be used to tag employee skills, interests, and locations, along with the job openings.

The talent marketplace was also discussed by the second day’s keynote spaker Ravin Jesuthan, who additionally explained how the internal talent marketplace can connect workers to projects, assignments, and tasks, rather than simply job openings. He also referred to a market relationship and to matchmaking. On the subject of matchmaking, I found a vendor of a platform to match employees to coaches or mentors an interesting use case for a taxonomy.

Another trend is that employee learning or training has a more important role in the flow of people to work. There is also a potential for taxonomies to support this endeavor. Depending on the volume, the findability of training materials could benefit by being tagged with terms from a taxonomy. A taxonomy can also support the recommendation of appropriate training courses to employees.

Finally, there is a lot of emphasis placed on employee experience, which was the number one trend in Bersin’s keynote. One way to improve the employee experience, which was not mentioned in the keynote, is to have a single user-interface that, with a single, consistent taxonomy, links content and data in different systems. So, the users have only a single place to go to find answers to all of their employment-related questions.

Tuesday, August 31, 2021

Knowledge Engineering and Taxonomies

My next conference workshop (at SEMANTiCS September 7) on taxonomies and ontologies has in its title “knowledge engineering.” I figured this may resonate more with the audience of computer scientists, data scientists, and Semantic technology and AI experts. People come (often accidentally) to the field of designing taxonomies, ontologies, and knowledge organization systems in general from different backgrounds, and may work in different disciplines or departments. They may have very different training, job titles, job descriptions.

Also, my job title now, at Semantic Web Company, is knowledge engineer, although there is not much agreement on what that job title means. I had once before, over 10 years ago, applied for a position with the job title knowledge engineer, and the role focused on writing rules for rules-based auto-classification. This involves using a taxonomy with logic rules and regular expressions for each taxonomy term to support automated indexing, rather than using training sets and machine learning. My current job, however, involves designing taxonomies and ontologies, often in combination.

Creating a taxonomy or thesaurus alone is not knowledge engineering. This is because a taxonomy does not describe all aspects of a knowledge domain, just the concepts and their hierarchical relationships, or in the case of a thesaurus, some additional nonspecific related (See also) relationships. Furthermore, there already exist published guidelines/standards for taxonomies and thesauri, as ANSI/NISO Z39.19 and ISO 25964-1 that specify best practices for design.

It makes more sense to call ontology design a form of knowledge engineering. Ontologies have a much higher level of semantics or expressiveness, which needs to be defined by the ontologist or knowledge engineer. There are customized, semantic relationships (such as “is located in” and “contains”), which are to be applied between designated classes (such as organizations and places), any number of customized attributes (such as address or latitude/longitude) that can be specified for a class. Standards for ontologies, such as OWL, which are from the World Wide Web Consortium (W3C), are only for machine readability and interoperability, but not for best practices, so there is more room for interpretation and innovation when it comes to designing an ontology, than there is for a taxonomy or thesaurus

Knowledge engineering may involve more than designing on ontology but may include all the various kinds of controlled vocabularies for the content and data of an organization. This includes determining what kind of vocabularies are needed and how they are related to each other.

Knowledge engineering is also very similar to knowledge modeling, which I blogged about before in the post "Knowledge Modeling." Knowledge engineering is a more general function, whereas knowledge modeling is a more specific activity.

Knowledge engineering also goes beyond taxonomy/ontology design and creation to include the follow-through application, which is namely the management of tagging or classification of content with the taxonomy. This is, after all, how a useful knowledge base is created, with content tagged and available for retrieval. Definitions of knowledge engineering sometimes refer to it as a field within artificial intelligence (AI) to build knowledge bases. While I might not agree that this is always part of the definition of knowledge engineering, AI is used for automated tagging of content with a taxonomy.

It's probably better to define knowledge engineering more broadly as methods to support the development and transmission of knowledge, specifically by by transforming data to information and information to knowledge, as the frequently depicted pyramid on the right suggests. This transformation is specifically done by designing and creating links between data, which is supported by taxonomies and ontologies.

Saturday, July 31, 2021

Taxonomies and Sitemaps

I was recently asked if a website’s sitemap of company’s website could serve as the start of a taxonomy for an organization. The sitemap, after all, includes all the relevant topics pertaining to an organization’s business offerings, and they are arranged in a hierarchy. I have previously blogged on the subject of why a website’s navigation is not a taxonomy in Navigation Schemes and Taxonomies. A sitemap is similar to a website’s navigation, but it goes deeper by including the titles or topics of web pages which are not included in the website’s menu, and it is not necessarily intended for user browsing. A sitemap may go five or six levels deep, whereas the website menu navigation menus are usually only two levels. Therefore, a sitemap may seem as if it’s a taxonomy. However, just because a sitemap is as large and detailed as a taxonomy needs to be does not make it suitable as a taxonomy.

Different purposes

We need to understand what a taxonomy is for. It’s to aid users in locating desired content by topic-terms, which reflect both the terminology use of the users and of the content. Taxonomy terms are tagged/indexed to content that is relevant to the term. The starting point when creating a taxonomy is to identify the topics of the content and identify the topics of user interest or search, and then merge those topics into a taxonomy by bringing together different names for the same concept. The concepts are then structurally arranged to show the relationships between the terms, especially hierarchical relationships. The primary purpose of the hierarchy of terms in a taxonomy is to aid the users in finding the appropriate term. When browsing the taxonomy, they may find a broader term or narrower term that better describes their search goals. Then they can select that term to retrieve content that was tagged with the term.

A sitemap, on the other hand, lists all or most pages of a website, usually by page title and organized in the hierarchical structure of the website. The hierarchical structure of the website was designed to organize information in a logical manner for users to browse and explore, as considered by the information architect who designed the website. The sitemap thus reflects pages, which are often topics but not always. A page may have multiple topics of interest that a user might want to look up. A page is sometimes for performing a function or activity and not necessarily just a topic of information.

A sitemap is typically automatically generated from the page titles, and its primary purpose is not for user but for machines: they tell search engines about pages that are available for crawling on websites and can thus support search engine optimization (SEO). Sitemap are useful in planning the further development or organizational improvement of a website. Whether a sitemap should even be displayed to end users as a tool to find information on a website is questionable. If automatically generated, it's not designed for that purpose, but users could find it helpful, especially users who understand that it is merely the aggregation of page titles organized in the file structure of the website. Some website make it available, and some do not. Some websites have displayed a simplified sitemap instead that is designed to be a guide to the users, but then it do not include all pages.

Different labels

The title names of pages and thus of sitemap entries often do not correspond to taxonomy terms. They could start out with verb for an activity, they could be commands or questions, or they could be complete sentences. Taxonomy terms are topics or names only represented by nouns or noun phrases, or proper nouns. Examples of sitemap entries that are not good taxonomy terms may include:

How to use…
Get started with…
Help with…
Pay a bill
Shop for…

As with navigation, the entries of a sitemap reflect pages in a one-to-one relationship, in contrast to taxonomy terms, each of which may retrieve multiple pages or content sources, and each page or content item can be tagged with multiple taxonomy terms. As such, entries in a sitemap may actually be more specific than would be needed in a taxonomy. The user’s selection of multiple taxonomy terms in combination, through filters/refinements, achieves the result of obtaining an appropriate list of relevant content.

Conclusions

Sitemaps should not be used as taxonomies, but their topics (not their labels) may be considered as a good source for a taxonomy. Sitemaps might not even be suitable as a basis or starting point for a taxonomy, but rather as a source for developing taxonomy terms. Rather, it is recommended that a taxonomy be created separately from a sitemap based on a review of content, search log data, and stakeholder and user interviews, and the sitemap is yet one other source for consideration when taxonomy terms. The hierarchy of the sitemap should also not be too closely followed, although parts of its hierarchical structure may be taken into consideration for creating taxonomy relationships.

Wednesday, June 30, 2021

Taxonomy Management

As taxonomies become more common for information management and retrieval in all kinds of organizations and in various applications, the task of creating new taxonomies from scratch is less needed than the task of managing existing taxonomies. What is required for taxonomy management, however, might not be completely clear. I’ve written several posts on this blog which I tagged with the topic “Taxonomy maintenance,” but none tagged with “Taxonomy management.” That needs to be corrected. Taxonomy maintenance is part of the larger responsibility of taxonomy management.

Taxonomy management includes the following:

Taxonomy maintenance: adding concepts, merging concepts, editing select labels, adding alternative labels, adding relationships, etc. on an individual concept basis, to keep the taxonomy up to date, as new content and new concepts are introduced and terminology changes. These changes may arise from suggestions from those doing tagging, proactive review of new content and new trends, periodic review of search logs, and periodic text analytics of content. This is an on-going task, that can be done by one ore more taxonomy editors, including those who are subject matter experts. In such cases, the taxonomy-editing work of non-taxonomists should be reviewed by a taxonomist.

Taxonomy governance: developing taxonomy maintenance policies and documentation. This comprises documenting the taxonomy type, features, purpose, ownership, use, etc., and documenting how the taxonomy should be updated to keep its style consistent, including the criteria for adding new concepts to the taxonomy. Taxonomies should be documented when they are created, but sometimes they are not and need to be. Documentation may need to be updated from time to time.

Taxonomy tagging management: developing and updating tagging rules or policies, ensuring tagging quality (comprehensiveness and correctness), and updating or improving the taxonomy if tagging issues indicate it. Tagging can be manual, automated, or automated with human review. Periodic review of the tagging is a necessary task. Even when managing tagging is another individual’s responsibility, managing taxonomies is not completely separate from managing tagging, and this is an ongoing responsibility of the taxonomist who manages the taxonomy.

Taxonomy integration with end-user applications: including websites and web content management systems (CMSs), enterprise content management systems, digital asset management systems, search software, and other custom applications such as recommendation, personalization, and question answering. A taxonomy may be managed within an application, such as a specific CMS or SharePoint, but then it is usable only for that single application. As organizations increase the number of their information management systems, it eventually becomes clear that separate siloed taxonomies are not a good idea, and a single taxonomy should be centrally managed and ported or synced with the taxonomy management components of each tool. Taxonomy application integration involves both technical aspects, such as integrations with APIs, and nontechnical aspects related to user experience, such as considering how the taxonomy displays to the end-users and how they interact with it. Often, an existing taxonomy needs to be adapted to a new application.

Taxonomy review and revision: reviewing a taxonomy for quality standards and against best practices guidelines and checklists, and making general widespread improvements, such as: ensuring that concepts and their labels are clear and unambiguous and that concepts are sufficiently distinct in their meaning, adding alternative sufficient labels (synonyms), ensuring that hierarchical relationships always follow the standards, adding polyhierarchy and associative relationships, changing the capitalization and plural style, ensuring that the hierarchy is not too detailed and deep in some areas. This task is undertaken by a taxonomist or taxonomy consultant only occasionally, especially if the taxonomy will undergo an extension or will be migrated to a new system.

Taxonomy extension: merging redundant taxonomies, integrating complementary taxonomies mapping/linking taxonomies or other vocabularies in the same domain to extend their use, or translating taxonomies to add additional languages. This could include merging or linking a taxonomy and a glossary or terminology or linking the custom taxonomy to an industry standard classification scheme that is familiar to users. Taxonomy extension could also involve adding semantics of an ontology model with custom relationships and attributes. This task is also undertaken by a taxonomist or taxonomy consultant only occasionally.

The inclusion of all of these tasks of taxonomy management requires a dedicated taxonomy/thesaurus management tool, as spreadsheets are insufficient, and the taxonomy editing module of a single application not only tends to lack certain taxonomy management features but will not serve the needs of enterprise-wide taxonomy management.

I will discuss this all in more detail in an upcoming Pool Party webinar “Taxonomy Management 101” on August 4.

Sunday, May 30, 2021

Taxonomy Design Research

https://unsplash.com/photos/WC6MJ0kRzGw?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

I recently wrote an article “Taxonomies: Connecting Users to Content” for an online publication, Boxes and Arrows, on information architecture (IA) and user experience (UX). As I was working with the editors on the section of gathering information from users, I realized that IA and UX have very formalized researcher roles. There is a job title for “UX Researcher” with career guides and resources on what skills are needed, and many more jobs on job board sites posted for “UX researcher” than for “taxonomist.” Meanwhile, there is no such job as a “taxonomy researcher.” But designing and developing taxonomies, which are often part of information architecture or UX, does require research, including user research.

Taxonomy research is not as formalized and does not involve standard tools, as UX research does, but it is still important. There is not nearly as much published about taxonomy research as there is for UX research. However, certain research practices, I have found, are common in the taxonomy consulting industry. It’s a matter of best practices. Even when taxonomies are designed internally and not with an external taxonomy consultant’s assistance, research is still part of the process. The type of research may vary based on the background and experience of the person leading the effort.

Taxonomy design research includes:

Interviewing sample users and other stakeholders
Gathering input from brainstorming sessions
Analyzing content to be tagged
Analyzing existing vocabularies of all kinds
Analyzing any search log reports
Taxonomy testing

While UX research is a form of user research, taxonomy research involves both user research and content research (or content analysis), because a taxonomy needs to consider both user needs and content suitability.

Interviewing stakeholders

The primary method of gaining user input on a taxonomy is through interviews and questionnaires, ideally both in combination, where a conversation follows up on a list of questions sent to the person being interviewed. It’s important to ask different kinds of questions tailored to the different kinds of users, questions dealing with tagging vs. questions dealing with retrieval of content. The input gathered from users in these interviews and questionnaires can be used to better design and the taxonomy and its user interface, to obtain use cases to later test the taxonomy, to identify possible facets for a faceted taxonomy, and also to collect some concepts for the taxonomy.

Brainstorming sessions

Another method of obtaining input from users is through a brainstorming session. This method is particularly useful for internal enterprise taxonomies. Representative users from different departments can contribute their ideas by suggesting sample terms, which are written down on a white board, flipchart, or sticky notes, and then working with a facilitator, the brainstorming group can remove outliers, bring together synonyms and similar terms, and come up with categories or facets to group the terms. PoolParty is the only taxonomy management software that has an integrated brainstorming module called CardSorting.

Analyzing content

After determining the scope of content inclusion, content analysis should be performed on a representative sample of content of each of the different types and subject areas of content that will be tagged and retrieved, to identify topics and named entities relevant to the content. This form of content analysis is similar to indexing without a controlled vocabulary. The taxonomist assumes the role of an indexer or someone tagging the content and notes what index terms or tags would best describe the content.

Automatic term extraction involves using text analytics software (which may be incorporated into taxonomy management software, such as in PoolParty) to extract candidate taxonomy terms based on their frequency and relevancy within a body (corpus) of text content. The suggested terms need to be analyzed for the context of their usage before determining whether they should be added to the taxonomy.

Analyzing existing vocabularies

If an organization already has some controlled vocabularies (taxonomies, thesauri, term lists, terminologies, glossaries, etc.), whether currently in use or not, these should be analyzed as sources of terms for incorporation into the new taxonomy. Assuming the project is to create a new taxonomy, any existing controlled vocabularies may have been for a different purpose, so only some of the terms would be relevant. Glossaries tend to have too many detailed terms that are not needed for information retrieval, but these and any other vocabularies are good sources for synonyms/alternative labels.

Analyzing any search log reports

When creating or editing a taxonomy, it’s always useful to look at search logs, which indicate what users have been typing into the search box. Search log reports can be sorted by search string frequency, so that the most frequently used search strings are considered for inclusion into the taxonomy. The search strings should be edited to confirm with taxonomy style and policy, but the exact search strings should be included as synonyms/alternative labels to support future searches.

Taxonomy testing

Near the completion of a taxonomy project, there should be some activity of taxonomy testing. Taxonomy use testing should test a taxonomy’s suitability for tagging content by manually test-tagging sample documents and determining if the desired terms are available in the taxonomy. Taxonomy use testing should also test the retrieval capabilities of the taxonomy. This is done by attempting to retrieve pre-identified documents with searches conducted by sample users with the search terms of their choice.

Other test on taxonomies, such as card sorting and A-B testing, which are also used in UX navigation testing, may be used in taxonomy development to test the preferences of the top two levels of a hierarchical taxonomy, but such tests are less suitable for multiple-level hierarchical taxonomies or for faceted taxonomies. More details are in my previous blog post on Testing Taxonomies.

Conclusions

Creating a taxonomy involves many research-related tasks, which can take up as much time or more than actually creating terms in a taxonomy. While there is a creative aspect to developing a taxonomy, a taxonomy also has to be based on research and analysis, with the emphasis on analysis. The research is more qualitative than quantitative, though.

Thursday, September 30, 2021

Taxonomies for Human Resources

Tuesday, August 31, 2021

Knowledge Engineering and Taxonomies

Saturday, July 31, 2021

Taxonomies and Sitemaps

Different purposes

Different labels

Conclusions

Wednesday, June 30, 2021

Taxonomy Management

Sunday, May 30, 2021

Taxonomy Design Research

Interviewing stakeholders

Brainstorming sessions

Analyzing content

Analyzing existing vocabularies

Analyzing any search log reports

Taxonomy testing

Conclusions

Subscribe to The Accidental Taxonomist Blog