Wednesday, January 29, 2025

Talking about Taxonomies in India

I was thrilled to bring together my passions of my taxonomy profession, connecting with people, and international travel on my visit to India this month, my first time to this fascinating country.

I travel to speak about taxonomies at conferences and other events. I like to travel: to meet colleagues in this specialized field, in which I don’t have regular in-person interactions, and to see and learn about new places. Usually for me business travel is the primary purpose and seeing new places (museums or a walking tour of parts of a city) is secondary. However, for January 2025, I decided to choose a new country destination, India, primarily as a tourist, and then to add on some professional events.

Why visit India

Heather Hedden at the Taj Mahal

India is now the most populous country of the world, and I have met many Indians living and working in the U.S. and in Europe, especially in technology roles. So, I wanted to understand the country and culture better. India also has a long rich history and impressive historical structures to visit, tasty food, and different religions and traditions to learn about.

I have many professional connections in India, especially through LinkedIn, more than any other country outside North America and Europe. A few are taxonomists, some have taken my course, some have bought my book, and many have a significant number of shared contacts in my field. I had also made contacts through conferences.

Finally, the use of the English language in professional activities makes it easier for me to participate in events in India: giving presentations and listening to the presentations of others. I cannot simply give a presentation in English in any country.

Multiple presentations and meetings

Taxonomies are relevant to multiple disciplines: library and information science, content and document management, information architecture, knowledge management, and ontologies. To interact with professionals in these different fields, I had to arrange multiple presentations or meetups.

Library and information science students

I have occasionally been asked to give guest lectures on about taxonomies to library/information science school classes. Close to two years ago, a graduate student of library and information science in Bengaluru (Bangalore), Soumyakanta Barik, who had read my book, asked if I would give a guest lecture (remote) to his class of master’s degree students, which I did. Afterwards informed Soumyakanta that I was thinking of coming to India, so perhaps I might present again in person. Even though Soumyakanta had since graduated, he facilitated the contacts to make such a lecture possible, so I gave an update of my prior presentation “Tidbits of Taxonomies.”

Heather Hedden with LIS master's degree students at the Documentation Research and Training Centre of the Indian Statistical Institute, Bangalore

It turned out that this school of library and information science, the Documentation Research and Training Centre at the Indian Statistical Institute, Bangalore, had been founded by Dr. S. R. Ranganathan, the developer of the first major faceted classification system in the world (whom I mention in my book and in a prior blog post on faceted classification) and the father of library science in India.

Taxonomists and ontologists

On LinkedIn, I had over 25 connections with the keyword “taxonomy” and 15 with “ontology” in their profiles located in Bengaluru India, so I didn’t want to limit my presentation in that city to just current students. At my request, the Documentation Research and Training Centre organized a second presentation for me to give later the same day to be open to the public. I presented on a slightly more advanced topic, “From Taxonomy to Ontology,” based on a recent presentation that I gave at the Henry Stewart Semantic Data conference. Although the day I chose to present turned out to be a (minor) holiday, I still had a good audience of close to 30 people.

Heather Hedden with Harish Betrabet and Dr. Sanju Tiwari in Noida

While I did not give that presentation again in Delhi, I did meet two ontologists two days later in the Delhi area (Noida), Dr. Sanju Tiwari, who had been involved in the Knowledge Graph Conference, and Harish Betrabet, an ontologist at Bechtel.

Knowledge managers

Taxonomy work often falls under knowledge management, especially in the area of consulting. Heather Hedden with Soumyakanta Barik and Ved Prakash in Bengaluru
I had noticed that one of my prominent LinkedIn contacts in India (with over 140 shared connections) was a leading knowledge management professional, Ved Prakash. Ved met with me and Soumyakanta for lunch my very first day in India. Ved and I have both been involved in Stan Garfield’s SIKM group of knowledge managers, and Ved invited me to now to join the KMGN (Knowledge Management Global Network) group on LinkedIn, which he leads. Knowledge management in India is more mature than the smaller field of taxonomies.

Academic librarians

Heather Hedden with Nabi Hasan and others at the Indian Instittue of Technology, Delhi

I interact with librarians through my membership in the Special Libraries Association (SLA), which has an active Taxonomy Community. At last year's annual SLA conference at the University of Rhode Island, several academic librarians from India, who have been very involved in SLA, participated in the conference and also celebrated the 25th anniversary of the SLA Asia chapter with an event which I attended. The director of the Central Library of the Indian Institute of Technology, Delhi, Nabi Hasan, invited to give a presentation, and then organized a full-day “International Workshop on Open Accessing Publishing” at IIT Delhi around my schedule. To tie taxonomies into the theme, I gave a new presentation “Semantic Standards and Methods for Information Linking.” The audience was not familiar with Semantic Web technologies, so I was pleased to present something new to them, which I hope they will take advantage of.

Former SLA president Seema Rampersad (working at the British Library in London) introduced me, at my request, to another library science professor at the University of Rajesthan in Jaipur, with whom I met on short notice the evening I was visiting that city as a tourist, and we discussed the state of library/information science study.

Technical writers and content managers

Heather Hedden presenting at the STC India event in Bengaluru

With the growth of technology industries and applications of technology in other manufacturing sectors in India, there are now many technical writers along with content/document managers. The Society for Technical Communications (STC) (of which I had previously been a member) has an active chapter in India, so I contacted STC India about organizing a speaking event for me, and I was very pleased that the STC volunteers organized events in both Bengaluru and the greater Delhi area (Noida) to fit my schedule.

Heather Hedden and other speakers and organizers of the STC India event in Noida
The events also each included additional different speakers. I gave the presentation “Indexes, Search, and Taxonomies: Path to Findability,” which I had presented as an STC webinar (not in a suitable time zone for India) in 2023. Taxonomies and indexing are new concepts to many technical writers, whether in the U.S. or India. (My STC contact, Manisha Sardana, will be happy to arrange an event for other visitors to Delhi who want to give an educational presentation.)

Finally, I even met a freelance indexer, a member of the American Society for Indexing, another organization I have belonged to, who attended the STC event in Noida at my invitation.

Summary

I gave more presentations than I initially intended on this trip, but that is partly due to the fact that taxonomies cross over into multiple fields. I then got to meet more people, build and strengthen relationships, and reflect on the field and applications of taxonomies more. The professional activities took three days, while sightseeing took 10 days of my two-week trip. I hope to add on a professional speaking event on future international tourist trips, although I cannot imagine any other country besides India that would offer so many opportunities.

 

Thursday, December 19, 2024

Ontologies vs. Knowledge Graphs

At the Connected Data London (CDL) conference I attended last week, ontologies were humorously referred to as the “O” word. The thought was that, until recently, experts preferred not to mention “ontology,” lest they alienate their audience, customers, or stakeholders. The word comes across as too technical. It is a term from philosophy, after all, and it does not help that it sounds very similar to “oncology” (as “taxonomy” has been confused with “taxidermy”). The term “knowledge graph” on the other hand, is more user friendly, and even if it is not perfectly understood, its general meaning can be guessed. Thus, people would refer to knowledge graphs regardless of whether they meant a knowledge graph or an ontology.

At the conference, however, it was discussed that there is a growing acceptance of the word “ontology,” not just among experts but also among varied stakeholders who need to implement them. This was noted by several conference speakers, especially in the wrap-up panel session for the Data Modeling track, which was titled “The ‘O’ Word: How Ontologies Drive Interoperable Data and Business Innovation.” The panel moderator Katariina Kari explained that this recent shift has happened because of LLMs, explaining: “We need a reliable natural language repository. LLMs works on a network of mimicking language, LLMs are primed for language.” So, now use of the word ontology can even help a startup get funding from venture capitalists, she observed.

However, there remains some confusion over what an ontology is. At one end there is the difference between ontologies and taxonomies, and at the other end the difference between ontologies and knowledge graphs. I clarified the distinction between taxonomies and ontologies in a prior blog post, “Taxonomies vs. Ontologies” (January 2023). While knowledge graphs are a relatively new concept, and ontologies have existed for much longer, it is the varied understanding of ontologies that has given rise to confusion.

An ontology is defined as a model of a domain of knowledge, which comprises classes (sets of things), attributes (types of characteristics of things) and relationships between classes. According to this definition, an ontology is a somewhat generic model of a domain, and it does not include all of the individual members or instances of each class (such as the names of individual companies in the class called Company) nor the specific attributes of each attribute type (such as the address of each specific company for the attribute type called Address).

However, the W3C recommendation for ontologies, OWL (Web Ontology Language) includes the designation “individuals,” and ontology software tools, such as Protégé, support the inclusion of individuals and their specific attributes. Thus, it is easy to think that an ontology, by definition, includes all specific individuals. But just because OWL covers the recommendation for how to include instances of a class, and software supports the inclusion of instances of classes does not necessarily mean that the instances or individuals are actually a component of an ontology. The ontology experts on this CDL conference panel confirmed that an ontology is the upper-level semantic model.

Then, what do we call an ontology plus all of the individual members (instances) of classes and their specific attributes? That is essentially what a knowledge graph is. This is especially true when individuals are specific to an organization or enterprise, such as names of individual customers, products, employees, etc., and we call that an “enterprise knowledge graph.”

The first applications of ontologies in information/data science were in biomedicine, in which individuals included such things as names organisms (including bacteria and viruses) and chemicals, etc. Thus, the notion of an individual in science is not quite the same as in business, which has also been a source of confusion over what an individual is and the inclusion of individuals in an ontology. In enterprise knowledge graphs, the instances can be very numerous and specific, including individual “events,” such as interactions or transactions.

In conclusion, an ontology is typically a defining feature and component of a knowledge graph, but it is not all of what goes into a knowledge graph. A knowledge graph also includes individuals, which may be named entity instances or they may be specific taxonomy concepts (abstract things that are not unique named entities, such as the concepts “Data ethics” or “Performance measurement”), and a knowledge graph also includes specific attributes of individuals. It may be said that a knowledge graph is the instantiation of an ontology, and an ontology is the knowledge model. Katariina further explained: “knowledge graphs that actually follow an ontology will have an LLM perform better than just a KG that is unharmonized, not yet adhering to a clear ontology.”

Thursday, October 31, 2024

The Semantic Data Conference

I was honored to be accepted to speak at the first “Semantic Data” conference in New York, a one-day event held on October 23, following the inaugural event held in London on June 27. Semantic Data, organized by Henry Stewart (HS) Events, is co-located with its better-known DAM (Digital Asset Management) conference, which has been running for over 20 years in New York, London, and Los Angeles.

The full name of the conference was “Semantic Data: Taxonomy, Ontology, and Knowledge Graphs,” so the conference was less focused on data then on what you can do with data and content when combined with the semantics of taxonomies and ontologies. There was no presentation dedicated to knowledge graphs this time, with only sessions in the single-day one-track event. Less of a focus on knowledge graphs was fine, since the Knowledge Graph Conference, held in New York in May covers that topic very thoroughly over multiple days. The emphasis on “semantics,” though, is welcome, since there is no conference dedicated to that subject in the United States. (There is the SEMANTiCS conference in Europe, but it is semi-academic.)

 

Presentations at Semantic Data, New York

The topics of the sessions for the “Semantic Data” included: securing taxonomy and ontology strategy buy-in, why and how to connect taxonomies and ontologies, use of MS Copilot in taxonomy development, a use case in leveraging an LLM-based for content integration and a consumer-based semantic layer, and how to apply semantic models (taxonomies and ontologies) that reduce biases, especially for machine learning models. The opening keynote by Lulit Tesfaye was on realizing the semantic layer keynote, and the closing keynote by Gary Carlison and Bramm Wessel of the lead sponsor, Factor, was on building an organization semantic mindset. Additional sponsored talks were on how ontologies accelerate innovation in the life sciences, as done by the sponsor SciBite, and how semantics enhances modern data platforms, such as the sponsor Datavid.

I presented “Taxonomies to Ontologies: How When and Why to Connect or Extend.” I summarized the benefits of taxonomies and ontologies, including what you could or could not do with each alone, but what you could do with both combined. The fact that both taxonomies and ontologies are now based on compatible Semantic Web standards, which are supported by many tools, makes it easy to combine or extend them. Whether you are “combining” a taxonomy with an ontology or “extending” a taxonomy into an ontology depends merely on your starting point and definition of ontology. Now that I am again vendor neutral, I included screenshots from four different commercial tools for combined taxonomy/ontology management.

About the Semantic Data Conference 2024

Semantic Data New York was similar to Semantic Data Europe (London) in its format and organization. Both provided a combination of session types: instructional talks, industry use cases, round table participant discussions, and thought leadership panels. Both events were chaired by Madi Weland Solomon and featured the same keynote presentation by Lulit Tesfaye on the subject of the semantic layer. The rest of the speakers were different at both events, and each event had different sponsors, based on geographic location. While there were only three sponsors of Semantic Data in New York and only two in London, they shared the same exhibit hall with the main DAM (digital asset management) and thus reached a wider audience.

Attendees of both the London and New York events had a similar number of registrants, about 50. Although the larger co-located DAM conference had separate registration, some registrants of the DAM conference were also seen in Semantic Data sessions. Registrants of Semantic Data represented diverse industries, including financial services, healthcare, software/technology, media, entertainment, publishing, travel and tourism, education, government, and consulting. Roles were also diverse, including company leadership, project and program managers, IT, and content/DAM/taxonomy/information architecture practitioner roles.

I find that the distinction between the roles and activities of taxonomists, ontologists, information architects, digital asset managers, etc. overlaps, so a conference dedicated to semantics brings them together for shared knowledge sharing. This way, their projects can also be broadened and shared within their organizations. I hope the Semantic Data conference can grow in the future to fill this need, and I look forward to next year.

Monday, September 30, 2024

Topical Taxonomies for Filtering Searches

PoolParty GraphSearch
We taxonomists have long been advocating how a taxonomy of disambiguated concepts tagged to content retrieves more accurate results than search algorithms alone. But if users prefer simply entering text strings into a search box and not browsing taxonomies, how best to support users with a taxonomy can be a challenge.

A faceted taxonomy with taxonomy aspects as filters for refining search results has become a common taxonomy solution, especially for intranets, partner portals, and knowledge bases. For these purposes, certain facets, such as Content type, Product/Service, Location, and Department, are common and logical. When it comes to the designating “Topics,” however, it’s not so easy.

Specific Terms Gathered from Analysis

When gathering information and sources for terms, most sources will yield highly specific terms. These include terms arising from search log analysis, brainstorming sessions with sample users, automated text analytics term extraction from a large corpus of content and manual review a representative sample of documents/pages. These are all standard methods for taxonomy design, which I conduct as a consultant.

The difficulty is that there are often so many specific topics, so the new topical taxonomy could potentially have many hundreds of terms. Some may be relevant to only one or two documents or occurred in only a couple of searches out of thousands. They would not serve the purpose to refine searches.

Another problem is that many of the terms suggested from these methods are not even topical. Often, the top searches found in search logs of enterprise/intranet searches are for commonly used named tools, platforms, or services.

The main issue, however, in deriving terms for a topical facet/filter based on search terms is that the objective of the topical facet, like all facets, is to limit searches, not to duplicate searches. What is really needed in the topical facet are topical categories that are broader than the search terms. How to identify these broader topical categories can be more challenging.

Identifying Broader Topical Categories

Identifying broader terms or categories for topic filters is not as simple as identifying specific search terms, nor as straightforward as identifying the set of facets. Typical methods of obtaining candidate terms from both users and from the content need to be done, but with a focus on identifying broader terms or categories.

Categories from Stakeholder Engagement

Engaging stakeholders or other sample users in activities to brainstorm taxonomy terms will result in a mix of specific and broad terms. It is then the task of the taxonomist-facilitator to help guide the participants to identify which terms are broader and which are narrower within the same topical facet. Involving stakeholders/sample users is important, because if a single taxonomist or an external consulting team tries to do this on their own, their designated broader terms, while hierarchically correct, might not suit the intended users. The taxonomist-facilitator may suggest broader terms and then obtain immediate validation from the participants of the appropriateness of those suggestions.

Categories from Content Analysis

Analyzing content for broad topics is more effectively done manually than with automated methods. Manual content analysis will yield both specific and potentially broader concepts. A taxonomist or content strategist experienced in content analysis for identifying meaning will be able to determine the main concept for a piece of content.

Automated methods, based on text analytics technologies, tend to focus on term extraction, and will extract terms even more specific and less useful than search log results.  However, if a list of derived search terms is large enough (as may search logs or automated term extraction lists tend to be), another, newer option is to make use of LLM and generative AI technologies to categorize the specific terms and thus generate broader terms. The LLMs should be trained on the same or similar content, which is internal enterprise content, not the public web, to provide the correct context. Even then, the identified broader terms or categories will not always be correct and will require an experienced taxonomist to review.

Other Topical Facets

Topical terms, however, do not all have to be in a single “Topics,” facet. Depending on the use case, there could be other topical facets, which are not the usual named entities, departments, locations, or product/service types. These could be for Function, Activity, Issue Type, Technology, Research Field/Discipline, etc. If and how to break out these facets can be a challenge and should involve extensive discussions or other research with stakeholders and user representatives.

Finally, a topical facet for filtering search results could even be based on the existing navigation menu’s top levels, especially on an intranet or an enterprise content management system. Facets as filters are available to refine searches only, but if users choose instead to navigate the site menu, then they have no options to use other facets/aspects to help restrict what they are looking for. By duplicating the navigation menu’s one or two top levels into a facet, perhaps called “Topic Area,” users can limit a search with the categories for the areas with which they are familiar, and they can also restrict the search further by filtering on terms selected from any of the other facets.

I will be discussing the wider activity of coming up with terms for a taxonomy in my upcoming Taxonomy Boot Camp presentation, “The Complete Guide to Sourcing Terms” November 18, in Washington, DC. 


Sunday, August 18, 2024

Taxonomies and Ontologies as Semantic Models

In describing what taxonomies and ontologies are and what they can do, we are hearing the word “semantics” more often. “Semantics” means “meaning,” which is nothing new, and taxonomies and ontologies are not new. What is new is that taxonomies and ontologies are now combined more, and we need a way to describe them together, and that involves the description of “semantic.” Furthermore, taxonomies and ontologies are being implemented in new and expanded applications, where the word semantic(s) has significance.

Semantics in Taxonomies and Ontologies

Taxonomies have semantics in their concepts. A taxonomy is not just a term base or a term list, but rather is an organized set of concepts, each with its own unambiguous meaning. The concepts bring together different labels, like “synonyms” for the same thing, and their meaning and usage is further clarified by their arrangement in a hierarchy. It’s often said that a taxonomy comprises “things” (concepts), not mere “strings” (of text).

Ontologies have a higher level of semantics than taxonomies. Even if they don’t contain synonyms, the relationships between concepts (entities) and sets
(classes) of entities have additional semantics. The relationships in an ontology are convey meanings beyond mere hierarchy or a generic “related term.” For example, relationships between entities may be “is located in,” “has customer,” and “sells product.” Furthermore, entities in an ontology may have various types of attributes, such as contact information for offices and people, which is another application of semantic data.

Bringing Together Taxonomies and Ontologies

Taxonomies and ontologies have different origins, but now they are increasingly based on shared Semantic Web data models and guidelines, which enables them to be integrated seamlessly. Taxonomies have their origins in library science structures, including thesauri, subject headings, and classification schemes. Ontologies have their origins in computer science and data science with a focus on data models.

Combining them brings the benefits of both: the linguistic aspect of controlled terminology and their synonyms with hierarchical structure in taxonomies and the custom semantic relationships and other additional properties provided by ontologies. This allows users to search for concepts/things, not just text strings while also linking to others things related in a specific way and being able to create complex multi-step queries.

Taxonomies are considered a kind of “controlled vocabulary” or “knowledge organization system.” Ontologies are considered a kind of “knowledge model,” and as a knowledge
representation system, rather than a knowledge organization system. When we combine taxonomies and ontologies or speak of them collectively, it’s logical to use the word “semantic,” whether as semantic structures or semantic models, because they both involve semantics and both are usually based on Semantic Web guidelines.

Taxonomies are increasingly based on the Semantic Web recommendation (published by the World Wide Web Consortium) of SKOS (Simple Knowledge Organization System), which is based on RDF (Resource Description Framework). Most ontologies are based on RDF-Schema, an extension of RDF, and OWL (Web Ontology Language), another Semantic Web recommendation. The data models of SKOS, RDF, RDF-S, and OWL may all be integrated into the same knowledge model for a combined taxonomy-ontology. Most software for dedicated taxonomy-ontology management uses these data models.

Semantic Search and Semantic Tagging


Taxonomies support semantic search and tagging. “Semantic search” is the third-ranked autocomplete suggested search phrase in a Google search I did recently on “semantic,” so this is clearly a popular application of semantics. Semantic search refers to search that focuses on concepts and meaning rather than just strings of text. This is not new, but since search that is based on text strings and statistical algorithms is so common, improving search results with the focus on semantics is getting more attention.

Semantic search is best enabled with the tagging of taxonomy concepts, which we may call “semantic tagging” (which I first heard of when asked to write a article on it in 2008). Advanced text analytics technologies, going beyond entity recognition and natural language processing to include natural language understanding so as to analyze sentence structure, syntax, and sentiment, can also yield search results based somewhat on meaning and not just words.

Semantic Data

Taxonomies are traditionally for tagging and retrieving content, whereas ontologies are traditionally for exploring and retrieving data. The combination of a taxonomy and an ontology enables users to retrieve both content and data that are related to each other. Semantics for content is a given, because content (whether text, image, or other media), by its very nature, has meaning. Data by itself may not have much meaning, unless it is related to other data and that relationship has meaning, too. Thus, “semantic data” is significant. We hear reference to “semantic data” much more often than to “semantic content.

You don’t need to add a taxonomy to content to make it “semantic” and understood (rather a taxonomy helps you find the content). However, depending on how data is presented, you may need to add an ontology or at least a semantic data model (a method to describe objects in a database and their relationship to one another) to make data “semantic.” Experts can analyze raw data, but the data is more valuable if non-experts can understand it, too, and that’s why “semantic data” is important. There is also a lot of attention on “semantic data models.”

Semantic Layer

The idea of a “semantic layer” as a framework or approach to make an organization’s information, both data and content, more structured, findable, and actionable, has been gaining popularity recently. Whether the “semantic layer” is new or just a new way of describing something is arguable.

A semantic layer is a standardized framework that organizes and abstracts organizational data and serves as a connector for all knowledge assets. It’s a method to bridge content and data silos through a structured and consistent approach to connecting instead of consolidating data, which data warehouses do. The idea of a “layer” is that it is part of an enterprise-wide architecture of information, data and content, that connects horizontally across siloed content and data repositories. Taxonomies and ontologies, in addition to potentially other knowledge organization systems, such as a business glossary, are key components of a semantic layer.

More Talk of Semantics with Taxonomies and Ontologies

I’ve definitely been hearing of “semantics” more in the world of taxonomies and ontologies, and now I am bringing the word more into my own presentations. Following are some past and future examples.