Thursday, February 20, 2025

Getting Work as a Taxonomist

Occasionally, people whom I don't know ask me for career advice in the field of taxonomies, but this is not easy to answer. For taxonomy work, career paths and prior experiences vary, employers span all industries and organization types, job titles and descriptions are not named consistently, and remote jobs are very competitive.

Two chapters in my book, The Accidental Taxonomist, 3rd ed., can help answer career questions, Chapter 2 “Who are Taxonomists” and Chapter 13 “Taxonomy Work and Profession.” However, I have some additional thoughts, which I am sharing here.

Varied taxonomy career paths

When someone asks me for advice on getting into taxonomy work, especially based on my own experience, I am somewhat dismissive, since no one will repeat my career path. I got into controlled vocabulary/taxonomy management work starting out as an indexer using the controlled vocabularies at a periodical article publisher. Not only is such a company rare and industry unusual, but now there are extremely few manual periodical/database indexers, since the task is increasingly done automatically (auto-tagging, auto-classification, text analytics, AI, etc.)

The following are some of the common paths towards taxonomy careers I have seen, and there are many others that are less common.

  • Library/information science > cataloging > metadata
  • Arts, photography, film, media > digital asset management > asset metadata
  • Technical writing > technical content management > content strategy
  • Marketing > web content management > content strategy
  • Languages > linguistics > natural language processing > auto-tagging
  • Languages > translation > terminology management
  • Business management > knowledge management

Of course, in any of the above career paths, one does not have to change careers to become a taxonomist but could merely add taxonomy tasks to an existing job or career. This is especially the case of the following career backgrounds, in which people may add taxonomy work/projects to an existing technical role:

  • Science/engineering > technical terminology and glossary management
  • Computer science/data science > ontologies
  • Information technology > content management system/SharePoint administration

Taxonomy job search challenges

It’s typical to search for taxonomy jobs on the major job search websites, such as LinkedIn and Indeed. But not all taxonomy jobs have “taxonomist” or “taxonomy” in the job title. They could have job titles instead for ontology/ontologist, information architecture/architect, metadata, content management/manger, data governance, etc. So, then a search could be on “taxonomy” in the job description rather than limited to the job title, but this results in many more irrelevant jobs that merely touch on taxonomies but don’t involve developing/managing taxonomies.

Taxonomist jobs are relatively rare compared to traditional jobs. Limiting a job search to a specific metropolitan area will yield few, if any, relevant results. The exceptions, where taxonomist jobs are more frequent tend to be Seattle, San Francisco Bay Area, Austin, New York, and Washington, DC. Taxonomist jobs in other countries exist but are less common than in the United States.  Expanding a job search to all jobs mentioning “taxonomy” in the description, not just the job title, and expanding it to all of the United States will retrieve too many results, but this is a good approach to take in other countries. There is the added complication that “taxonomist” job searches can retrieve jobs postings for biologist-taxonomists.

Fortunately, many taxonomist jobs are remote.  The downside to this, though, is that fully remote taxonomist job postings attract a high number of applicants, so the competition for such jobs is great. Where LinkedIn indicates the number of people who click on an application link on a job post, remote taxonomist jobs have received over 100 applicant clicks in just a couple of days. 

A significant number of taxonomist jobs are temporary contracts, which are hired through recruiting firms. This is an option for someone not currently employed, but, obviously it's not a good idea to leave a permanent job for a temporary one. 

Networking, which is always important for job searching, is especially valuable in the unusual field of taxonomies. Joining professional associations, attending conferences and meetups, and developing a large network and posting on LinkedIn are all recommended.

Taxonomy skills and skills acquisition

There is not a standard set of skills for a taxonomist, other than prior taxonomy experience. Positions may ask for additional skills in varied areas:  

  • experience with content management systems, digital asset management systems, or product information management systems
  • familiarity with AI, machine learning, natural language processing, auto-classification, etc.
  • experience working with large datasets
  • experience designing ontologies and working with knowledge graphs
  • technical skills with using SPARQL, SQL, and Python

Furthermore, positions may also ask for experience with specific taxonomy management software or specific subject domain knowledge (e.g. finance or healthcare). As a result, it’s rare for one applicant to meet all the experience and skills required. Applicants understand this and may apply anyway.

Taxonomy jobs and the skills expected in such jobs vary. Thus, to become a highly competent taxonomist generally requires experience from multiple different employers. I have learned a great deal having done different kinds of taxonomy work for different companies. It can be difficult to get the first taxonomist job, though. The best approach is to obtain taxonomy work, such as through a project, while in a role that is not a dedicated taxonomist. A lot of taxonomy work is done as part of a job that has other duties.

However, a single taxonomy project as part of a job is often not enough experience to jump to a dedicated taxonomist position. Some training to round out one’s knowledge and to fill in the gaps is highly beneficial. In addition to the information in my book, The Accidental Taxonomist, I teach various taxonomy training workshops

Coming up next, I will teach a full-day in-person workshop “Connecting Users to Content through Taxonomies: An Introduction to Taxonomy Design & Creation” on Tuesday, April 29, 2025, 9:00 am - 5:00 pm in Philadelphia, as a pre-conference workshop to the Information Architecture Conference (with separate registration, not requiring full conference attendance).


Wednesday, January 29, 2025

Talking about Taxonomies in India

I was thrilled to bring together my passions of my taxonomy profession, connecting with people, and international travel on my visit to India this month, my first time to this fascinating country.

I travel to speak about taxonomies at conferences and other events. I like to travel: to meet colleagues in this specialized field, in which I don’t have regular in-person interactions, and to see and learn about new places. Usually for me business travel is the primary purpose and seeing new places (museums or a walking tour of parts of a city) is secondary. However, for January 2025, I decided to choose a new country destination, India, primarily as a tourist, and then to add on some professional events.

Why visit India

Heather Hedden at the Taj Mahal

India is now the most populous country of the world, and I have met many Indians living and working in the U.S. and in Europe, especially in technology roles. So, I wanted to understand the country and culture better. India also has a long rich history and impressive historical structures to visit, tasty food, and different religions and traditions to learn about.

I have many professional connections in India, especially through LinkedIn, more than any other country outside North America and Europe. A few are taxonomists, some have taken my course, some have bought my book, and many have a significant number of shared contacts in my field. I had also made contacts through conferences.

Finally, the use of the English language in professional activities makes it easier for me to participate in events in India: giving presentations and listening to the presentations of others. I cannot simply give a presentation in English in any country.

Multiple presentations and meetings

Taxonomies are relevant to multiple disciplines: library and information science, content and document management, information architecture, knowledge management, and ontologies. To interact with professionals in these different fields, I had to arrange multiple presentations or meetups.

Library and information science students

I have occasionally been asked to give guest lectures on about taxonomies to library/information science school classes. Close to two years ago, a graduate student of library and information science in Bengaluru (Bangalore), Soumyakanta Barik, who had read my book, asked if I would give a guest lecture (remote) to his class of master’s degree students, which I did. Afterwards informed Soumyakanta that I was thinking of coming to India, so perhaps I might present again in person. Even though Soumyakanta had since graduated, he facilitated the contacts to make such a lecture possible, so I gave an update of my prior presentation “Tidbits of Taxonomies.”

Heather Hedden with LIS master's degree students at the Documentation Research and Training Centre of the Indian Statistical Institute, Bangalore

It turned out that this school of library and information science, the Documentation Research and Training Centre at the Indian Statistical Institute, Bangalore, had been founded by Dr. S. R. Ranganathan, the developer of the first major faceted classification system in the world (whom I mention in my book and in a prior blog post on faceted classification) and the father of library science in India.

Taxonomists and ontologists

On LinkedIn, I had over 25 connections with the keyword “taxonomy” and 15 with “ontology” in their profiles located in Bengaluru India, so I didn’t want to limit my presentation in that city to just current students. At my request, the Documentation Research and Training Centre organized a second presentation for me to give later the same day to be open to the public. I presented on a slightly more advanced topic, “From Taxonomy to Ontology,” based on a recent presentation that I gave at the Henry Stewart Semantic Data conference. Although the day I chose to present turned out to be a (minor) holiday, I still had a good audience of close to 30 people.

Heather Hedden with Harish Betrabet and Dr. Sanju Tiwari in Noida

While I did not give that presentation again in Delhi, I did meet two ontologists two days later in the Delhi area (Noida), Dr. Sanju Tiwari, who had been involved in the Knowledge Graph Conference, and Harish Betrabet, an ontologist at Bechtel.

Knowledge managers

Taxonomy work often falls under knowledge management, especially in the area of consulting. Heather Hedden with Soumyakanta Barik and Ved Prakash in Bengaluru
I had noticed that one of my prominent LinkedIn contacts in India (with over 140 shared connections) was a leading knowledge management professional, Ved Prakash. Ved met with me and Soumyakanta for lunch my very first day in India. Ved and I have both been involved in Stan Garfield’s SIKM group of knowledge managers, and Ved invited me to now to join the KMGN (Knowledge Management Global Network) group on LinkedIn, which he leads. Knowledge management in India is more mature than the smaller field of taxonomies.

Academic librarians

Heather Hedden with Nabi Hasan and others at the Indian Instittue of Technology, Delhi

I interact with librarians through my membership in the Special Libraries Association (SLA), which has an active Taxonomy Community. At last year's annual SLA conference at the University of Rhode Island, several academic librarians from India, who have been very involved in SLA, participated in the conference and also celebrated the 25th anniversary of the SLA Asia chapter with an event which I attended. The director of the Central Library of the Indian Institute of Technology, Delhi, Nabi Hasan, invited to give a presentation, and then organized a full-day “International Workshop on Open Accessing Publishing” at IIT Delhi around my schedule. To tie taxonomies into the theme, I gave a new presentation “Semantic Standards and Methods for Information Linking.” The audience was not familiar with Semantic Web technologies, so I was pleased to present something new to them, which I hope they will take advantage of.

Former SLA president Seema Rampersad (working at the British Library in London) introduced me, at my request, to another library science professor at the University of Rajesthan in Jaipur, with whom I met on short notice the evening I was visiting that city as a tourist, and we discussed the state of library/information science study.

Technical writers and content managers

Heather Hedden presenting at the STC India event in Bengaluru

With the growth of technology industries and applications of technology in other manufacturing sectors in India, there are now many technical writers along with content/document managers. The Society for Technical Communications (STC) (of which I had previously been a member) has an active chapter in India, so I contacted STC India about organizing a speaking event for me, and I was very pleased that the STC volunteers organized events in both Bengaluru and the greater Delhi area (Noida) to fit my schedule.

Heather Hedden and other speakers and organizers of the STC India event in Noida
The events also each included additional different speakers. I gave the presentation “Indexes, Search, and Taxonomies: Path to Findability,” which I had presented as an STC webinar (not in a suitable time zone for India) in 2023. Taxonomies and indexing are new concepts to many technical writers, whether in the U.S. or India. (My STC contact, Manisha Sardana, will be happy to arrange an event for other visitors to Delhi who want to give an educational presentation.)

Finally, I even met a freelance indexer, a member of the American Society for Indexing, another organization I have belonged to, who attended the STC event in Noida at my invitation.

Summary

I gave more presentations than I initially intended on this trip, but that is partly due to the fact that taxonomies cross over into multiple fields. I then got to meet more people, build and strengthen relationships, and reflect on the field and applications of taxonomies more. The professional activities took three days, while sightseeing took 10 days of my two-week trip. I hope to add on a professional speaking event on future international tourist trips, although I cannot imagine any other country besides India that would offer so many opportunities.

 

Thursday, December 19, 2024

Ontologies vs. Knowledge Graphs

At the Connected Data London (CDL) conference I attended last week, ontologies were humorously referred to as the “O” word. The thought was that, until recently, experts preferred not to mention “ontology,” lest they alienate their audience, customers, or stakeholders. The word comes across as too technical. It is a term from philosophy, after all, and it does not help that it sounds very similar to “oncology” (as “taxonomy” has been confused with “taxidermy”). The term “knowledge graph” on the other hand, is more user friendly, and even if it is not perfectly understood, its general meaning can be guessed. Thus, people would refer to knowledge graphs regardless of whether they meant a knowledge graph or an ontology.

At the conference, however, it was discussed that there is a growing acceptance of the word “ontology,” not just among experts but also among varied stakeholders who need to implement them. This was noted by several conference speakers, especially in the wrap-up panel session for the Data Modeling track, which was titled “The ‘O’ Word: How Ontologies Drive Interoperable Data and Business Innovation.” The panel moderator Katariina Kari explained that this recent shift has happened because of LLMs, explaining: “We need a reliable natural language repository. LLMs works on a network of mimicking language, LLMs are primed for language.” So, now use of the word ontology can even help a startup get funding from venture capitalists, she observed.

However, there remains some confusion over what an ontology is. At one end there is the difference between ontologies and taxonomies, and at the other end the difference between ontologies and knowledge graphs. I clarified the distinction between taxonomies and ontologies in a prior blog post, “Taxonomies vs. Ontologies” (January 2023). While knowledge graphs are a relatively new concept, and ontologies have existed for much longer, it is the varied understanding of ontologies that has given rise to confusion.

An ontology is defined as a model of a domain of knowledge, which comprises classes (sets of things), attributes (types of characteristics of things) and relationships between classes. According to this definition, an ontology is a somewhat generic model of a domain, and it does not include all of the individual members or instances of each class (such as the names of individual companies in the class called Company) nor the specific attributes of each attribute type (such as the address of each specific company for the attribute type called Address).

However, the W3C recommendation for ontologies, OWL (Web Ontology Language) includes the designation “individuals,” and ontology software tools, such as Protégé, support the inclusion of individuals and their specific attributes. Thus, it is easy to think that an ontology, by definition, includes all specific individuals. But just because OWL covers the recommendation for how to include instances of a class, and software supports the inclusion of instances of classes does not necessarily mean that the instances or individuals are actually a component of an ontology. The ontology experts on this CDL conference panel confirmed that an ontology is the upper-level semantic model.

Then, what do we call an ontology plus all of the individual members (instances) of classes and their specific attributes? That is essentially what a knowledge graph is. This is especially true when individuals are specific to an organization or enterprise, such as names of individual customers, products, employees, etc., and we call that an “enterprise knowledge graph.”

The first applications of ontologies in information/data science were in biomedicine, in which individuals included such things as names organisms (including bacteria and viruses) and chemicals, etc. Thus, the notion of an individual in science is not quite the same as in business, which has also been a source of confusion over what an individual is and the inclusion of individuals in an ontology. In enterprise knowledge graphs, the instances can be very numerous and specific, including individual “events,” such as interactions or transactions.

In conclusion, an ontology is typically a defining feature and component of a knowledge graph, but it is not all of what goes into a knowledge graph. A knowledge graph also includes individuals, which may be named entity instances or they may be specific taxonomy concepts (abstract things that are not unique named entities, such as the concepts “Data ethics” or “Performance measurement”), and a knowledge graph also includes specific attributes of individuals. It may be said that a knowledge graph is the instantiation of an ontology, and an ontology is the knowledge model. Katariina further explained: “knowledge graphs that actually follow an ontology will have an LLM perform better than just a KG that is unharmonized, not yet adhering to a clear ontology.”

Thursday, October 31, 2024

The Semantic Data Conference

I was honored to be accepted to speak at the first “Semantic Data” conference in New York, a one-day event held on October 23, following the inaugural event held in London on June 27. Semantic Data, organized by Henry Stewart (HS) Events, is co-located with its better-known DAM (Digital Asset Management) conference, which has been running for over 20 years in New York, London, and Los Angeles.

The full name of the conference was “Semantic Data: Taxonomy, Ontology, and Knowledge Graphs,” so the conference was less focused on data then on what you can do with data and content when combined with the semantics of taxonomies and ontologies. There was no presentation dedicated to knowledge graphs this time, with only sessions in the single-day one-track event. Less of a focus on knowledge graphs was fine, since the Knowledge Graph Conference, held in New York in May covers that topic very thoroughly over multiple days. The emphasis on “semantics,” though, is welcome, since there is no conference dedicated to that subject in the United States. (There is the SEMANTiCS conference in Europe, but it is semi-academic.)

 

Presentations at Semantic Data, New York

The topics of the sessions for the “Semantic Data” included: securing taxonomy and ontology strategy buy-in, why and how to connect taxonomies and ontologies, use of MS Copilot in taxonomy development, a use case in leveraging an LLM-based for content integration and a consumer-based semantic layer, and how to apply semantic models (taxonomies and ontologies) that reduce biases, especially for machine learning models. The opening keynote by Lulit Tesfaye was on realizing the semantic layer keynote, and the closing keynote by Gary Carlison and Bramm Wessel of the lead sponsor, Factor, was on building an organization semantic mindset. Additional sponsored talks were on how ontologies accelerate innovation in the life sciences, as done by the sponsor SciBite, and how semantics enhances modern data platforms, such as the sponsor Datavid.

I presented “Taxonomies to Ontologies: How When and Why to Connect or Extend.” I summarized the benefits of taxonomies and ontologies, including what you could or could not do with each alone, but what you could do with both combined. The fact that both taxonomies and ontologies are now based on compatible Semantic Web standards, which are supported by many tools, makes it easy to combine or extend them. Whether you are “combining” a taxonomy with an ontology or “extending” a taxonomy into an ontology depends merely on your starting point and definition of ontology. Now that I am again vendor neutral, I included screenshots from four different commercial tools for combined taxonomy/ontology management.

About the Semantic Data Conference 2024

Semantic Data New York was similar to Semantic Data Europe (London) in its format and organization. Both provided a combination of session types: instructional talks, industry use cases, round table participant discussions, and thought leadership panels. Both events were chaired by Madi Weland Solomon and featured the same keynote presentation by Lulit Tesfaye on the subject of the semantic layer. The rest of the speakers were different at both events, and each event had different sponsors, based on geographic location. While there were only three sponsors of Semantic Data in New York and only two in London, they shared the same exhibit hall with the main DAM (digital asset management) and thus reached a wider audience.

Attendees of both the London and New York events had a similar number of registrants, about 50. Although the larger co-located DAM conference had separate registration, some registrants of the DAM conference were also seen in Semantic Data sessions. Registrants of Semantic Data represented diverse industries, including financial services, healthcare, software/technology, media, entertainment, publishing, travel and tourism, education, government, and consulting. Roles were also diverse, including company leadership, project and program managers, IT, and content/DAM/taxonomy/information architecture practitioner roles.

I find that the distinction between the roles and activities of taxonomists, ontologists, information architects, digital asset managers, etc. overlaps, so a conference dedicated to semantics brings them together for shared knowledge sharing. This way, their projects can also be broadened and shared within their organizations. I hope the Semantic Data conference can grow in the future to fill this need, and I look forward to next year.

Monday, September 30, 2024

Topical Taxonomies for Filtering Searches

PoolParty GraphSearch
We taxonomists have long been advocating how a taxonomy of disambiguated concepts tagged to content retrieves more accurate results than search algorithms alone. But if users prefer simply entering text strings into a search box and not browsing taxonomies, how best to support users with a taxonomy can be a challenge.

A faceted taxonomy with taxonomy aspects as filters for refining search results has become a common taxonomy solution, especially for intranets, partner portals, and knowledge bases. For these purposes, certain facets, such as Content type, Product/Service, Location, and Department, are common and logical. When it comes to the designating “Topics,” however, it’s not so easy.

Specific Terms Gathered from Analysis

When gathering information and sources for terms, most sources will yield highly specific terms. These include terms arising from search log analysis, brainstorming sessions with sample users, automated text analytics term extraction from a large corpus of content and manual review a representative sample of documents/pages. These are all standard methods for taxonomy design, which I conduct as a consultant.

The difficulty is that there are often so many specific topics, so the new topical taxonomy could potentially have many hundreds of terms. Some may be relevant to only one or two documents or occurred in only a couple of searches out of thousands. They would not serve the purpose to refine searches.

Another problem is that many of the terms suggested from these methods are not even topical. Often, the top searches found in search logs of enterprise/intranet searches are for commonly used named tools, platforms, or services.

The main issue, however, in deriving terms for a topical facet/filter based on search terms is that the objective of the topical facet, like all facets, is to limit searches, not to duplicate searches. What is really needed in the topical facet are topical categories that are broader than the search terms. How to identify these broader topical categories can be more challenging.

Identifying Broader Topical Categories

Identifying broader terms or categories for topic filters is not as simple as identifying specific search terms, nor as straightforward as identifying the set of facets. Typical methods of obtaining candidate terms from both users and from the content need to be done, but with a focus on identifying broader terms or categories.

Categories from Stakeholder Engagement

Engaging stakeholders or other sample users in activities to brainstorm taxonomy terms will result in a mix of specific and broad terms. It is then the task of the taxonomist-facilitator to help guide the participants to identify which terms are broader and which are narrower within the same topical facet. Involving stakeholders/sample users is important, because if a single taxonomist or an external consulting team tries to do this on their own, their designated broader terms, while hierarchically correct, might not suit the intended users. The taxonomist-facilitator may suggest broader terms and then obtain immediate validation from the participants of the appropriateness of those suggestions.

Categories from Content Analysis

Analyzing content for broad topics is more effectively done manually than with automated methods. Manual content analysis will yield both specific and potentially broader concepts. A taxonomist or content strategist experienced in content analysis for identifying meaning will be able to determine the main concept for a piece of content.

Automated methods, based on text analytics technologies, tend to focus on term extraction, and will extract terms even more specific and less useful than search log results.  However, if a list of derived search terms is large enough (as may search logs or automated term extraction lists tend to be), another, newer option is to make use of LLM and generative AI technologies to categorize the specific terms and thus generate broader terms. The LLMs should be trained on the same or similar content, which is internal enterprise content, not the public web, to provide the correct context. Even then, the identified broader terms or categories will not always be correct and will require an experienced taxonomist to review.

Other Topical Facets

Topical terms, however, do not all have to be in a single “Topics,” facet. Depending on the use case, there could be other topical facets, which are not the usual named entities, departments, locations, or product/service types. These could be for Function, Activity, Issue Type, Technology, Research Field/Discipline, etc. If and how to break out these facets can be a challenge and should involve extensive discussions or other research with stakeholders and user representatives.

Finally, a topical facet for filtering search results could even be based on the existing navigation menu’s top levels, especially on an intranet or an enterprise content management system. Facets as filters are available to refine searches only, but if users choose instead to navigate the site menu, then they have no options to use other facets/aspects to help restrict what they are looking for. By duplicating the navigation menu’s one or two top levels into a facet, perhaps called “Topic Area,” users can limit a search with the categories for the areas with which they are familiar, and they can also restrict the search further by filtering on terms selected from any of the other facets.

I will be discussing the wider activity of coming up with terms for a taxonomy in my upcoming Taxonomy Boot Camp presentation, “The Complete Guide to Sourcing Terms” November 18, in Washington, DC.