The Accidental Taxonomist

Monday, May 5, 2025

Taxonomies and Attribute Data

In the past (such as my 2021 blog post "Attributes in Taxonomies"), I have explained that “attributes” serve as filters to refine search results on content, results that have already been narrowed by a hierarchical taxonomy concept or category. As such, the attributes available for filtering can vary based on a taxonomy concept or category that had been selected. To the end user, high-level taxonomy facets and attributes both function similarly as filters, and the distinction between facets and attributes may not be apparent. If the distinction is not noticeable to end users, then then facets and attributes may be confused. It’s best to describe attributes for what they are, and not merely by what they can do. That’s that this blog post aims to do.

Attributes

Data is information in the form of specific values that are relevant to something such as an asset, object, product, person, event, or transaction. Since data is relevant to something else, we can refer to data as an “attribute “of something. When attributes are standardized and used in information/data management, then attributes are metadata. Metadata schema are structures to organize data.

Examples of attribute metadata are:

for people: birth date, gender, occupation, nationality, phone number
for products: brand, price, color, size, SKU number
for documents: title, author, publication date, language, word count, publication status, file type

Almost all metadata, both descriptive and administrative, are attributes of something. (Only structural metadata, that which is used to mark up text, would not be an attribute.) Attributes, as metadata, can serve various purposes, including identification, comparison, sorting, filtering, and finding something based on its attributes.

Attribute values may be of different types: text, numbers, dates, or yes/no (also called “Boolean”). As text strings, attribute values may be uncontrolled free text or terms from a controlled list.

Taxonomies

Taxonomies are structures of concepts, which are used primarily for tagging and retrieval of content, although there are secondary uses. The concepts include subjects and named entities. In all cases, the concepts are of controlled vocabularies. The structures may be primarily hierarchical or primarily faceted, although a combination, such as limited hierarchies within a facet, is also possible. The structure of the taxonomy provides context for tagging and supports interaction by users.

When a taxonomy is structured into facets, typically each facet serves also as a metadata property. A hierarchical topical taxonomy can also provide values for a metadata property. Taxonomies are structures to organize controlled vocabulary concepts.

Examples of taxonomy facets include:

Topics
Activities
Industries
Product/service types
Brand names
Companies
Organizations
Names of people
Types of people/Roles
Events/Occasions

Thus, the types of things that are facets are usually not the same types of things that are considered attributes.

Metadata schema are structures to organize data, whereas taxonomies are structures to organize controlled vocabulary concepts that can populate metadata properties.

Where Attributes and Taxonomies Overlap

Considering again the examples of different types of attributes for different things, there are some attributes that could be managed in a “taxonomy” instead of merely as “attributes”:

For people: Name
For products: Product type/category
For documents: Subject/topic

Technically, each of these characteristics is also an attribute, but it is usually more practical to manage them as taxonomies so that they can support the implemented benefits of a taxonomy, such as semantic tagging, searching (including type-ahead search suggest), and browsing.

Thus, when we talk about “attributes” in the context of taxonomies, we mean those characteristics of something that are better managed as attributes and not managed as taxonomies. The decision is one of knowledge modeling.

For example, to support the refinement of searches, a taxonomy of expert people for an organization may have the following taxonomy facets:

Name
Subject of expertise
Organizational unit
Location

Then in addition to the facets, the taxonomy may have the following attributes associated with each record of a person:

Job title
Academic degree
Email address
Phone number
URL of headshot image

This is selected data of interest, but not values that are used in initial search or browsing for finding and retrieving content. Attributes are metadata, and taxonomy facets are also metadata, but that does not mean that they are the same, because different metadata can have different functions or purposes.

Ontologies: Bridging Taxonomies and Attributes

When we enrich a taxonomy with features of an ontology, not only can we add semantic relationships, but we can also add attributes to taxonomy concepts. Usually, when taxonomists first learn about ontologies, they think primarily of the addition of customized relationships between concepts, and they might not be aware of the importance of the addition of attributes.

In ontologies, semantic relationships are formally called “object properties,” and attributes are called “datatype properties.” Both are equally important. Meanwhile, the feature of “classes” in an ontology typically corresponds to taxonomy concept schemes or facets.

To add attributes to a taxonomy, the best way to do it is through adding an ontology, which can be very simple and not even include semantic relationships. As the availability of different attributes may vary based on a hierarchy branch of concepts, this can be managed by creating classes, which are assigned to hierarchical branches, facets, or concept schemes. Then, attributes (datatype properties) are applied and used with concepts based on the class the concept belongs to.

Conclusion

The following table summarized the differences between taxonomy facets and attributes.

Taxonomy Facets	Attributes
Basic structure of many taxonomies	Additional data added to taxonomies
Controlled vocabularies	Controlled or uncontrolled terms, text, numbers, dates, Boolean options, etc.
Concepts as nouns or noun phrases	If text, any kind of text string
Top organizational level of a taxonomy	Values relevant to any taxonomy concept
Concept Schemes in SKOS, or Classes in an OWL ontology	Metadata on a concept, or datatype properties in an OWL ontology

Monday, March 31, 2025

Customizing Taxonomy Hierarchies

Taxonomies need to be custom-created for their purposes to be most effective. Basically, a taxonomy comprises the concepts or terms that reflect the subject domain of the content that will be tagged and retrieved with the aid of that taxonomy. Taxonomies must also be customized to the requirements (or limitations) of the implemented search technology and the user interface, and ideally the taxonomy is also customized to the needs and preferences of the users. This includes taxonomy design aspects of size, degree of detail, use of synonym/variants, use of hierarchy, and implementation as facets.

Taxonomy customization usually focuses on the concepts/terms/labels and not so much on the exact hierarchy of grouping narrower concepts under broader concepts, other than perhaps limiting the number of hierarchical levels. While the selection and definition of concepts depends on the context of the content, the hierarchical relationships between concepts are typically independent of any specific content and are usually dependent only on the context of the taxonomy itself. Such a context-independent hierarchy is what enables a single taxonomy to be used for multiple different content items of different content creators. This is also the approach used in designing classification systems, which are intended for broad, generic use.

Why Customize Hierarchy

However, a customized taxonomy may be designed for a rather specific body of content, and then the hierarchy may depend on the context of that overall body of content, if not the specific content items. For example, the concept “Piano” is often considered narrower to “Musical instruments”, but in certain contexts it may be narrower to “Furniture,” such as for the contexts of interior design, furnishing a bar or restaurant, or for moving and storage services. Furthermore, I would not always recommend that “Piano” be narrower to both broader concepts in the same taxonomy (a taxonomy feature known as “polyhierarchy”), because the same taxonomy might not be used for both contexts. It depends.

When structuring a taxonomy hierarchy, the use and purpose of the hierarchy needs to be considered. A hierarchy is not created simply because it’s a taxonomy and thus traditionally has hierarchy. Possible uses of hierarchy include:

Supporting browsing and navigation to guide users to the desired concept.
Providing context for concepts to support tagging, whether manual or automated.
Enabling “recursive” or “rolled up” retrieval, so that a user’s selection of a concept retrieves not only what was been tagged to that concept but also what has been tagged to all of its narrower concepts, too.
Enabling expansion of a search, so that if there are too few or no results for a specific concept, the retrieval set can be expanding to content tagged with the broader concept and/or other narrower concepts of it.
Instructing users on the appropriate classification and organization of information

Usually, the same hierarchy can support all of the above goals, although occasionally there are conflicting needs.

Customizing Hierarchy Example

The need for customizing hierarchy became especially clear to me in a recent taxonomy consulting project I did for the business of event venue space rentals. Types of spaces (structures, rooms, etc.) were grouped under broader concepts by their potential use, rather than by structural type. To a lesser extent, events or activities for spaces were also sometimes grouped by the type of space that might be suitable. For example, a generic taxonomy might include “Dance class” and “Technical training” both under the same broader concept for “Classes/training,” but because these different types of classes need different kinds of spaces, in this taxonomy they were put in different parts of the taxonomy hierarchy. “Dance class” was made narrower to “Dance event,” and “Technical training” was made narrower to “Training.”

The hierarchy of concepts used in a taxonomy to tag images may also be structured differently than a taxonomy for tagging text content. In this case, for example, broader concepts for grouping others had been created of “Small meeting” and “Large event,” which may not seem logically needed when the range in number of guests was an additional search attribute/filter. However, these concepts are quite useful for tagging images that may depict a small or large event but do not utilize counts of people. Another example is grouping together under the same broader concept the activities of music rehearsals/practices along with music performance events under the same broader concept of “Music events.” Although the activities of organizing rehearsals and organizing performances are quite different from each other, the venues that are suitable for each and their images are similar.

Despite their similarities in scope and concepts, a taxonomy for venue rentals should not be the same as a taxonomy for real estate of long-term lease or sale of properties (focusing on the space but agnostic to the use), nor for events management (focusing on the details of events and less so on space), nor equipment sales and rentals (focusing on the equipment and less on the use). Even when the concepts are the same, the hierarchy may differ. While the inclusion of concepts and their labels should consider the content, the design of the hierarchy should consider the taxonomy’s use.

Thursday, February 20, 2025

Getting Work as a Taxonomist

Occasionally, people whom I don't know ask me for career advice in the field of taxonomies, but this is not easy to answer. For taxonomy work, career paths and prior experiences vary, employers span all industries and organization types, job titles and descriptions are not named consistently, and remote jobs are very competitive.

Two chapters in my book, The Accidental Taxonomist, 3rd ed., can help answer career questions, Chapter 2 “Who are Taxonomists” and Chapter 13 “Taxonomy Work and Profession.” However, I have some additional thoughts, which I am sharing here.

Varied taxonomy career paths

When someone asks me for advice on getting into taxonomy work, especially based on my own experience, I am somewhat dismissive, since no one will repeat my career path. I got into controlled vocabulary/taxonomy management work starting out as an indexer using the controlled vocabularies at a periodical article publisher. Not only is such a company rare and industry unusual, but now there are extremely few manual periodical/database indexers, since the task is increasingly done automatically (auto-tagging, auto-classification, text analytics, AI, etc.)

The following are some of the common paths towards taxonomy careers I have seen, and there are many others that are less common.

Library/information science > cataloging > metadata
Arts, photography, film, media > digital asset management > asset metadata
Technical writing > technical content management > content strategy
Marketing > web content management > content strategy
Languages > linguistics > natural language processing > auto-tagging
Languages > translation > terminology management
Business management > knowledge management

Of course, in any of the above career paths, one does not have to change careers to become a taxonomist but could merely add taxonomy tasks to an existing job or career. This is especially the case of the following career backgrounds, in which people may add taxonomy work/projects to an existing technical role:

Science/engineering > technical terminology and glossary management
Computer science/data science > ontologies
Information technology > content management system/SharePoint administration

Taxonomy job search challenges

It’s typical to search for taxonomy jobs on the major job search websites, such as LinkedIn and Indeed. But not all taxonomy jobs have “taxonomist” or “taxonomy” in the job title. They could have job titles instead for ontology/ontologist, information architecture/architect, metadata, content management/manger, data governance, etc. So, then a search could be on “taxonomy” in the job description rather than limited to the job title, but this results in many more irrelevant jobs that merely touch on taxonomies but don’t involve developing/managing taxonomies.

Taxonomist jobs are relatively rare compared to traditional jobs. Limiting a job search to a specific metropolitan area will yield few, if any, relevant results. The exceptions, where taxonomist jobs are more frequent tend to be Seattle, San Francisco Bay Area, Austin, New York, and Washington, DC. Taxonomist jobs in other countries exist but are less common than in the United States. Expanding a job search to all jobs mentioning “taxonomy” in the description, not just the job title, and expanding it to all of the United States will retrieve too many results, but this is a good approach to take in other countries. There is the added complication that “taxonomist” job searches can retrieve jobs postings for biologist-taxonomists.

Fortunately, many taxonomist jobs are remote. The downside to this, though, is that fully remote taxonomist job postings attract a high number of applicants, so the competition for such jobs is great. Where LinkedIn indicates the number of people who click on an application link on a job post, remote taxonomist jobs have received over 100 applicant clicks in just a couple of days.

A significant number of taxonomist jobs are temporary contracts, which are hired through recruiting firms. This is an option for someone not currently employed, but, obviously it's not a good idea to leave a permanent job for a temporary one.

Networking, which is always important for job searching, is especially valuable in the unusual field of taxonomies. Joining professional associations, attending conferences and meetups, and developing a large network and posting on LinkedIn are all recommended.

Taxonomy skills and skills acquisition

There is not a standard set of skills for a taxonomist, other than prior taxonomy experience. Positions may ask for additional skills in varied areas:

experience with content management systems, digital asset management systems, or product information management systems
familiarity with AI, machine learning, natural language processing, auto-classification, etc.
experience working with large datasets
experience designing ontologies and working with knowledge graphs
technical skills with using SPARQL, SQL, and Python

Furthermore, positions may also ask for experience with specific taxonomy management software or specific subject domain knowledge (e.g. finance or healthcare). As a result, it’s rare for one applicant to meet all the experience and skills required. Applicants understand this and may apply anyway.

Taxonomy jobs and the skills expected in such jobs vary. Thus, to become a highly competent taxonomist generally requires experience from multiple different employers. I have learned a great deal having done different kinds of taxonomy work for different companies. It can be difficult to get the first taxonomist job, though. The best approach is to obtain taxonomy work, such as through a project, while in a role that is not a dedicated taxonomist. A lot of taxonomy work is done as part of a job that has other duties.

However, a single taxonomy project as part of a job is often not enough experience to jump to a dedicated taxonomist position. Some training to round out one’s knowledge and to fill in the gaps is highly beneficial. In addition to the information in my book, The Accidental Taxonomist, I teach various taxonomy training workshops.

Coming up next, I will teach a full-day in-person workshop “Connecting Users to Content through Taxonomies: An Introduction to Taxonomy Design & Creation” on Tuesday, April 29, 2025, 9:00 am - 5:00 pm in Philadelphia, as a pre-conference workshop to the Information Architecture Conference (with separate registration, not requiring full conference attendance).

Wednesday, January 29, 2025

Talking about Taxonomies in India

I was thrilled to bring together my passions of my taxonomy profession, connecting with people, and international travel on my visit to India this month, my first time to this fascinating country.

I travel to speak about taxonomies at conferences and other events. I like to travel: to meet colleagues in this specialized field, in which I don’t have regular in-person interactions, and to see and learn about new places. Usually for me business travel is the primary purpose and seeing new places (museums or a walking tour of parts of a city) is secondary. However, for January 2025, I decided to choose a new country destination, India, primarily as a tourist, and then to add on some professional events.

Why visit India

India is now the most populous country of the world, and I have met many Indians living and working in the U.S. and in Europe, especially in technology roles. So, I wanted to understand the country and culture better. India also has a long rich history and impressive historical structures to visit, tasty food, and different religions and traditions to learn about.

I have many professional connections in India, especially through LinkedIn, more than any other country outside North America and Europe. A few are taxonomists, some have taken my course, some have bought my book, and many have a significant number of shared contacts in my field. I had also made contacts through conferences.

Finally, the use of the English language in professional activities makes it easier for me to participate in events in India: giving presentations and listening to the presentations of others. I cannot simply give a presentation in English in any country.

Multiple presentations and meetings

Taxonomies are relevant to multiple disciplines: library and information science, content and document management, information architecture, knowledge management, and ontologies. To interact with professionals in these different fields, I had to arrange multiple presentations or meetups.

Library and information science students

I have occasionally been asked to give guest lectures on about taxonomies to library/information science school classes. Close to two years ago, a graduate student of library and information science in Bengaluru (Bangalore), Soumyakanta Barik, who had read my book, asked if I would give a guest lecture (remote) to his class of master’s degree students, which I did. Afterwards informed Soumyakanta that I was thinking of coming to India, so perhaps I might present again in person. Even though Soumyakanta had since graduated, he facilitated the contacts to make such a lecture possible, so I gave an update of my prior presentation “Tidbits of Taxonomies.”

Heather Hedden with LIS master's degree students at the Documentation Research and Training Centre of the Indian Statistical Institute, Bangalore

It turned out that this school of library and information science, the Documentation Research and Training Centre at the Indian Statistical Institute, Bangalore, had been founded by Dr. S. R. Ranganathan, the developer of the first major faceted classification system in the world (whom I mention in my book and in a prior blog post on faceted classification) and the father of library science in India.

Taxonomists and ontologists

On LinkedIn, I had over 25 connections with the keyword “taxonomy” and 15 with “ontology” in their profiles located in Bengaluru India, so I didn’t want to limit my presentation in that city to just current students. At my request, the Documentation Research and Training Centre organized a second presentation for me to give later the same day to be open to the public. I presented on a slightly more advanced topic, “From Taxonomy to Ontology,” based on a recent presentation that I gave at the Henry Stewart Semantic Data conference. Although the day I chose to present turned out to be a (minor) holiday, I still had a good audience of close to 30 people.

While I did not give that presentation again in Delhi, I did meet two ontologists two days later in the Delhi area (Noida), Dr. Sanju Tiwari, who had been involved in the Knowledge Graph Conference, and Harish Betrabet, an ontologist at Bechtel.

Knowledge managers

Taxonomy work often falls under knowledge management, especially in the area of consulting.
I had noticed that one of my prominent LinkedIn contacts in India (with over 140 shared connections) was a leading knowledge management professional, Ved Prakash. Ved met with me and Soumyakanta for lunch my very first day in India. Ved and I have both been involved in Stan Garfield’s SIKM group of knowledge managers, and Ved invited me to now to join the KMGN (Knowledge Management Global Network) group on LinkedIn, which he leads. Knowledge management in India is more mature than the smaller field of taxonomies.

Academic librarians

Heather Hedden with Nabi Hasan and others at the Indian Instittue of Technology, Delhi

I interact with librarians through my membership in the Special Libraries Association (SLA), which has an active Taxonomy Community. At last year's annual SLA conference at the University of Rhode Island, several academic librarians from India, who have been very involved in SLA, participated in the conference and also celebrated the 25^th anniversary of the SLA Asia chapter with an event which I attended. The director of the Central Library of the Indian Institute of Technology, Delhi, Nabi Hasan, invited to give a presentation, and then organized a full-day “International Workshop on Open Accessing Publishing” at IIT Delhi around my schedule. To tie taxonomies into the theme, I gave a new presentation “Semantic Standards and Methods for Information Linking.” The audience was not familiar with Semantic Web technologies, so I was pleased to present something new to them, which I hope they will take advantage of.

Former SLA president Seema Rampersad (working at the British Library in London) introduced me, at my request, to another library science professor at the University of Rajesthan in Jaipur, with whom I met on short notice the evening I was visiting that city as a tourist, and we discussed the state of library/information science study.

Technical writers and content managers

Heather Hedden presenting at the STC India event in Bengaluru

With the growth of technology industries and applications of technology in other manufacturing sectors in India, there are now many technical writers along with content/document managers. The Society for Technical Communications (STC) (of which I had previously been a member) has an active chapter in India, so I contacted STC India about organizing a speaking event for me, and I was very pleased that the STC volunteers organized events in both Bengaluru and the greater Delhi area (Noida) to fit my schedule.

Heather Hedden and other speakers and organizers of the STC India event in Noida

The events also each included additional different speakers. I gave the presentation “Indexes, Search, and Taxonomies: Path to Findability,” which I had presented as an STC webinar (not in a suitable time zone for India) in 2023. Taxonomies and indexing are new concepts to many technical writers, whether in the U.S. or India. (My STC contact, Manisha Sardana, will be happy to arrange an event for other visitors to Delhi who want to give an educational presentation.)

Finally, I even met a freelance indexer, a member of the American Society for Indexing, another organization I have belonged to, who attended the STC event in Noida at my invitation.

Summary

I gave more presentations than I initially intended on this trip, but that is partly due to the fact that taxonomies cross over into multiple fields. I then got to meet more people, build and strengthen relationships, and reflect on the field and applications of taxonomies more. The professional activities took three days, while sightseeing took 10 days of my two-week trip. I hope to add on a professional speaking event on future international tourist trips, although I cannot imagine any other country besides India that would offer so many opportunities.

Thursday, December 19, 2024

Ontologies vs. Knowledge Graphs

At the Connected Data London (CDL) conference I attended last week, ontologies were humorously referred to as the “O” word. The thought was that, until recently, experts preferred not to mention “ontology,” lest they alienate their audience, customers, or stakeholders. The word comes across as too technical. It is a term from philosophy, after all, and it does not help that it sounds very similar to “oncology” (as “taxonomy” has been confused with “taxidermy”). The term “knowledge graph” on the other hand, is more user friendly, and even if it is not perfectly understood, its general meaning can be guessed. Thus, people would refer to knowledge graphs regardless of whether they meant a knowledge graph or an ontology.

At the conference, however, it was discussed that there is a growing acceptance of the word “ontology,” not just among experts but also among varied stakeholders who need to implement them. This was noted by several conference speakers, especially in the wrap-up panel session for the Data Modeling track, which was titled “The ‘O’ Word: How Ontologies Drive Interoperable Data and Business Innovation.” The panel moderator Katariina Kari explained that this recent shift has happened because of LLMs, explaining: “We need a reliable natural language repository. LLMs works on a network of mimicking language, LLMs are primed for language.” So, now use of the word ontology can even help a startup get funding from venture capitalists, she observed.

However, there remains some confusion over what an ontology is. At one end there is the difference between ontologies and taxonomies, and at the other end the difference between ontologies and knowledge graphs. I clarified the distinction between taxonomies and ontologies in a prior blog post, “Taxonomies vs. Ontologies” (January 2023). While knowledge graphs are a relatively new concept, and ontologies have existed for much longer, it is the varied understanding of ontologies that has given rise to confusion.

An ontology is defined as a model of a domain of knowledge, which comprises classes (sets of things), attributes (types of characteristics of things) and relationships between classes. According to this definition, an ontology is a somewhat generic model of a domain, and it does not include all of the individual members or instances of each class (such as the names of individual companies in the class called Company) nor the specific attributes of each attribute type (such as the address of each specific company for the attribute type called Address).

However, the W3C recommendation for ontologies, OWL (Web Ontology Language) includes the designation “individuals,” and ontology software tools, such as Protégé, support the inclusion of individuals and their specific attributes. Thus, it is easy to think that an ontology, by definition, includes all specific individuals. But just because OWL covers the recommendation for how to include instances of a class, and software supports the inclusion of instances of classes does not necessarily mean that the instances or individuals are actually a component of an ontology. The ontology experts on this CDL conference panel confirmed that an ontology is the upper-level semantic model.

Then, what do we call an ontology plus all of the individual members (instances) of classes and their specific attributes? That is essentially what a knowledge graph is. This is especially true when individuals are specific to an organization or enterprise, such as names of individual customers, products, employees, etc., and we call that an “enterprise knowledge graph.”

The first applications of ontologies in information/data science were in biomedicine, in which individuals included such things as names organisms (including bacteria and viruses) and chemicals, etc. Thus, the notion of an individual in science is not quite the same as in business, which has also been a source of confusion over what an individual is and the inclusion of individuals in an ontology. In enterprise knowledge graphs, the instances can be very numerous and specific, including individual “events,” such as interactions or transactions.

In conclusion, an ontology is typically a defining feature and component of a knowledge graph, but it is not all of what goes into a knowledge graph. A knowledge graph also includes individuals, which may be named entity instances or they may be specific taxonomy concepts (abstract things that are not unique named entities, such as the concepts “Data ethics” or “Performance measurement”), and a knowledge graph also includes specific attributes of individuals. It may be said that a knowledge graph is the instantiation of an ontology, and an ontology is the knowledge model. Katariina further explained: “knowledge graphs that actually follow an ontology will have an LLM perform better than just a KG that is unharmonized, not yet adhering to a clear ontology.”