The Accidental Taxonomist

Saturday, September 30, 2023

SEMANTiCS Conference 2023: Taxonomies, Knowledge Graphs, and LLMs

The most recent conference I participated in was SEMANTiCS, September 20-22, in Leipzig Germany. This was the 19th year of this European conference focused on the application of semantic technologies and systems. This was also my fourth year presenting a workshop/tutorial on taxonomies and ontologies at the conference. The widespread value of taxonomies across different areas of specialization is indicated by the fact that taxonomy workshops are repeatedly a part of conferences on various subjects, including semantics, knowledge management, library and information science, information architecture, content strategy, and digital asset management.

Semantics and taxonomies

Semantics means “meaning,” so semantic systems utilize standards to support the encoding of meaning of things/resources and their relations, making the semantics machine-readable. Various standards, guidelines, and data models for semantic systems were developed for what is called the Semantic Web. The Semantic Web goes beyond the simple hyperlinks of the World Wide Web to label shared metadata, specify the kinds of relations. This supports linked data, and the linking of taxonomies to other taxonomies and ontologies and their tagged content or data, which are stored on different servers.

Just as World Wide Web protocols have been adapted within enterprises (“behind the firewall”), so have Semantic Web standards. You don’t have to share your data publicly to reap the benefits of the Semantic Web: open standards to enable the migration of taxonomies and related data between systems, sharing of data with partners, extracting and transforming data from within silos across the enterprise into a standard format, and the ability to link to data on the Web to bring in new content even if not sharing content out on the Web.

Taxonomies, as controlled vocabularies, have always been about concepts, each with unique understood meaning, not just words or strings of text. So, using taxonomies is using semantics. The Semantic Web standard SKOS (Simple Knowledge Organization System) specifies a data model to make taxonomies and other knowledge organization systems (thesauri, classification systems, etc.) machine-readable and interchangeable on the Web. Semantic Web standards also cover ontologies with RDF-Schema and OWL. By following Semantic Web Standards, taxonomies can easily be linked to and extended with ontologies, and then by linking to data stored in a graph database, knowledge graphs can be built.

The SEMANTiCS conference

The SEMANTiCS conference is somewhat unique by being semi-academic and semi-industry. It has separate academic track and industry track chairs and additional tutorials and workshops. It’s good to bring academia and industry together in a field like this, where research topics can be applied and partnerships can be developed. The location of the conference varies, and it partners with a local higher education institution for logistical support, with graduate students volunteering to help in exchange to getting access to sessions.

This was the second year that SEMANTiCS combined its conferences with the Language Technology Industry Association, which organized a Language Intelligence track, dealing with technologies for the management of terminology, multilingual content, and machine translation. The conference also includes a one-day track focused on DBpedia, which is not the same first day as the tutorials and workshops. The entire conference lasts three full days, and has a social event one evening, and a dinner on the second evening.

The conference has industry vendor sponsors, about eight of which were exhibiting, and a few more which did not exhibit. There are also slightly more organizations which are “partners,” including DBpedia, The Alan Turing Institute, and a number of institutes of higher education in Europe which have programs in semantic technologies. Additional organizers include Semantic Web Company, Institut für Angewandte Informatik and the Vjije Universities Amsterdam, representing the three countries where SEMANTiCS has been taking place: Austria, Germany, and Netherlands.

SEMANTiCS 2023

The 2023 conference was held September 20-22 in Leipzig, Germany, under the leadership of a new chair Sahar Vahdati of Technical University Dresden. There were about 285 participants in person and about one-third as many online. The conference has been hybrid since 2021. There were often six simultaneous sessions. Themed tracks or sessions of multiple speakers included Knowledge Graphs, Reasoning & Recommendation, Natural Language Processing and Large Language Models, Legal & Data Governance, Ontologies Data Management, and Environmental-Social-Governance (ESG). While there was not a life sciences track like last year, there was a themed subject track on cultural heritage. LLMs and ESG were both new topics this year. Poster presentations also covered the range of topics.

Knowledge graphs is a regular theme at this conference, but this time there was the addition of LLMs. The opening keynote was “Generations of Knowledge Graphs: The Crazy Ideas and the Business” presented by Xin Luna Dong of Meta. She spoke of three generations of knowledge graphs: entity-based knowledge graphs, text-rich knowledge graphs, and dual neural knowledge graphs, using an ontology and LLMs. The second day’s keynote was “Knowledge Graphs in the Age of Large Language Models,” presented by Aiden Hogan of the University of Chile. LLMs and AI topics were also presented in the Knowledge Graphs track, such as in Andreas Blumauer’s talk “Responsible AI and LLMs.” Finally, the moderated closing panel was “Large Language Models and Knowledge Graphs: Status Quo - Risks - Opportunities” with panelists, Andreas Blumauer and Jochen Hummel from software vendors and Kristina Podnar, a digital policy consultant, who were not completely in agreement.

In addition to my 3-hour tutorial, “Knowledge Engineering of Taxonomies and Ontologies,” only slightly updated from last year, I also contributed, along with Lutz Krüger, to Andreas Blumauer’s new 3-hour tutorial “They Key to Sustainable Enterprises: ESG, KNowledge Graphs, and Digitalization.” Adopting an ESG program and complying with upcoming ESG directives requires connecting a lot of information and data and aligning it with requirements and disclosure categories, and this is where a knowledge graph can be extremely helpful. Other tutorials and workshops dealt with data spaces, ontology reasoning, healthcare NLP, NLP for knowledge graph construction, and FAIR ontologies.

Past and future

Semantic technologies were very new when the conference was first launched in 2005 by Semantic Web Company, even before launching its product PoolParty Semantic Suite. But it’s never been a vendor product-based conference. The main purpose was and still is to promote the understanding and advancement of semantic technologies. Competitor software vendors sponsor and exhibit, and Semantic Web Company has stepped back from a lead organizational role. The conference is not one where sponsors make business in selling their products or services, but rather for raising awareness, making and reinforcing partnerships, exchanging ideas, and general networking, including looking for work. It is more of a community conference than anything else, but it is an open welcoming community, with new people coming every year.

The next SEMANTiCS, celebrating its 20th year, will be September 16 - 18, 2024, in Amsterdam.

Thursday, August 24, 2023

Taxonomies for Digital Asset Management (DAM)

Taxonomies, with their origin in thesauri and library subject heading systems, have traditionally been associated with the tagging and retrieving of text content. The management and retrieval of multimedia content (images, video, audio, or other graphics files), on the other hand, has traditionally been served by metadata schema, reflecting the various attributes of the content, including digital rights.
Metadata for text content has become increasingly important to make it “structured” and easier to manage. Meanwhile, taxonomies, with their richness in topical detail, hierarchical structure, and synonyms, have become increasingly important in making multimedia content, especially digital assets, easier to identify and retrieve.

However, the features and uses of taxonomies and descriptive metadata have somewhat converged, now that faceted taxonomies have become common. A facet is an aspect or attribute, by which the user may limit, filter, or refine a search or initiate a search selection. (Several of my past blog posts discuss facets, including "Customizing Taxonomy Facets.")

Why taxonomies for multimedia content and digital assets

There is considerable overlap between multimedia content and digital assets, although they are not identical. A digital asset is something that is created and stored in a digital form that has value. The word “asset” implies it has value. So, not everything that is in digital form is an asset. Creative works in digital form, whether by in-house producers or licensed, are considered digital assets. Multimedia content tends to have value, so it tends to be considered as digital assets. If it needs to be managed and made available for retrieval and reuse, it can probably be considered a digital asset. If it needs to be managed and made available for retrieval and reuse, then assigning metadata and taxonomy terms is probably important.

1. Growing volume of digital assets

The main reason to move beyond simple controlled lists of terms/values in metadata properties (such as Type, Location name, Location type, Event/Occasion, Person type, Season, etc.) and include relatively large topical taxonomies for digital assets is to provide the ability to better limit search results in large volumes of content. The number of digital assets owned or managed by organizations has grown immensely, as varied media sources have become more common, not just for brand content but also for marketing, instructional, and technical content. Limiting search results from only a few broad topic categories is often not sufficient, and too many digital assets are retrieved.

A taxonomy provides further granularity of subjects which a digital asset depicts or describes. A granular hierarchical taxonomy could provide the terms for a single metadata property, such as “Subject,” or there could detailed taxonomies in more than one metadata property, to also include “Activity,” “Product category,” or “Occasion,” depending on the use case.

2. Varied audience for digital assets and the use of synonyms

Another reason to use taxonomies for digital assets is to better suit a varied audience of users. While it is digital asset managers who rely on metadata to manage the digit assets, various other users need to find the same assets: product and brand managers, web content editors, art designers, partnership and licensing specialists, and perhaps even customers. Assets are most valuable when they have wider uses, but in order to be reused by different people and departments, a detailed taxonomy helps.

A taxonomy is not only more detailed than a list of a few categories, but it is also usually enriched with synonyms (also called alternative labels or variant terms). This way, different people who may describe the same thing by different names will find the same concept and its tagged content. For example, synonyms could be “Bridal” and “Wedding”; “Infant” and “Baby”; “Botanical” and “Plants”; “DIY” and “How to.” Internal users and external users often have different preferred names for things.

3. Connecting both text and multimedia content across the enterprise

Applying a taxonomy to tag digital assets can also allow digital assets to be retrieved along with other content, text content, in other content management systems (CSMs). This would require that the taxonomy be a centrally managed enterprise taxonomy, and not just a siloed taxonomy within a single DAM system, and that more than one system are connected to each other (such as through APIs or integrations) or that a dedicated front-end enterprise search application is linked to content in their source repositories.

While users often look only for digital assets that they know are located within a specific DAM system, other times users want to conduct a more exhaustive search on a subject. While most images and videos are expected to be in the DAM, along with some PDF files, other PDF files, presentations, and documents, and even some images and videos from other sources may be located in other systems. Taxonomies that can be linked to each other or a single master taxonomy managed centrally in a dedicated taxonomy management system, such as PoolParty, serving as "middleware," connected to the content in each of the systems, can enable comprehensive search and retrieval across the organization, especially if all the data is managed in a knowledge graph (explained in my last blog post "Knowledge Graphs and Taxonomies").

Tagging or keywording multimedia content and digital assets

Finally, there is the tagging component of taxonomies, which is often called keywording with respect to images. Digital asset managers must assign descriptive metadata to the assets they manage, which is not difficult if the controlled lists of available values are short. A taxonomy, however, may be large, so it can be a challenge to determine which subject terms to tag.

For text-only content, the technologies of text analytics, including entity extraction and natural language processing, can be applied to enable auto-tagging. Image, video, and audio content had previously been considered unsuitable for auto-tagging, and thus less suitable for large taxonomies, but this is no longer the case.

There are new technologies and methods to enable auto-tagging of digital assets. Audio-to-text technologies enable transcripts to be created from audio and video files, and these texts can automatically analyze and tagged. Improvements in image recognition technology can enable images to be auto-tagged for their subjects. Human review of auto-tagging is still recommended, but that’s easier than tagging from scratch.

Taxonomy is what powers DAM

DAM systems do support taxonomies, so you should not hold back from creating a suitable taxonomy for your DAM content. To learn more about creating taxonomies for digital assets, attend the session “Taxonomy is What Powers DAM” on September 14, 2023, at the HS Events DAM New York conference. I will join three other panelists to discuss taxonomies for digital asset management: what taxonomies are, how to develop a taxonomy, how to do research for a taxonomy, and how to manage a taxonomy, especially for DAM applications. Register with the code SPEAKER100 for $100 off.

Monday, July 31, 2023

Knowledge Graphs and Taxonomies

Knowledge graphs have recently emerged as an additional and growing use of taxonomies. A knowledge graph comprises data extracted and stored typically in a graph database with an ontology to semantically link types of data, but usually a knowledge graph also includes a taxonomy, thesaurus, or set of controlled vocabularies to provide consistent labeling. As a result of this combination, people involved in knowledge graphs are taking an interest in taxonomies, and people involved in taxonomies are taking an interest in knowledge graphs.

The traditional and still primary use of taxonomies is to consistently and comprehensively tag and retrieve content, whereas the focus of knowledge graphs is to access and make connections among disparate data. Content tagged and retrieved with taxonomies includes pages in websites, intranets, content management systems; documents in document management systems; and images and video files in digital asset management systems. Knowledge graphs link together data which includes records in databases, customer relationship management systems, product information management systems, and other enterprise systems, and the values in cells in spreadsheets, referenced by their row and column headers. By integrating a taxonomy into a knowledge graph, users can then retrieve both content and data on the same subject together.

What is a knowledge graph? The first non-sponsored definition that pops up today with a Google search not from a vendor is from the the Alan Turning Institute, the U.K. national institute for data science and artificial intelligence, which provides the following explanation on its Knowledge graphs interest group page:

Knowledge graphs (KGs) organise data from multiple sources, capture information about entities of interest in a given domain or task (like people, places or events), and forge connections between them. In data science and AI, knowledge graphs are commonly used to:
Facilitate access to and integration of data sources;
Add context and depth to other, more data-driven AI techniques such as machine learning; and
Serve as bridges between humans and systems, such as generating human-readable explanations, or, on a bigger scale, enabling intelligent systems for scientists and engineers.

From the taxonomy perspective, a knowledge graph is a combination of controlled vocabularies or a taxonomy with the semantic layer of an ontology, which adds custom semantic relations and attributes, plus specific instance data, which is stored in a graph database. A knowledge graph thus extends the use of a taxonomy beyond content to also include data. From the graph data perspective, a knowledge graph is the gathering of disparate data, which has been extracted, transformed, and loaded (ETL) into a graph database, where it is linked with semantic relations provided by an ontology and described by terms in a taxonomy, and it can be queried and analyzed all in one place.

GraphViews of SWC ESG Knowledge Graph

It is an important to the definition of a knowledge graph to include its purpose and not just its components. The purposes include providing a unified view of data, easy availability of information, easy integration of new data, secure interoperability, visualization of entities and relations, the possibility of discovery and insights through semantic relations, and the support for complex multi-part queries with quick results. With inclusion of a taxonomy, a knowledge graph can bring together both data and content on in and organization.

With such lofty goals, knowledge graphs should be an area of interest not just of data scientists and ontologists, but also of information professionals (including taxonomists) and knowledge managers. This is gradually becoming the case. Knowledge graphs emerged in the 2010, and became popularized with the Google Knowledge Graph introduced in 2012. Knowledge graphs were first introduced at the KMWorld (Knowledge Management) conferences in 2017 as "semantic knowledge graphs,” and were also first mentioned at the Taxonomy Boot Camp conference that year. This November, the KMWorld conference has more talks on knowledge graphs than before. When I proposed multiple topics for this spring’s Information Architecture Conference, the conference chair chose the presentation on an introduction knowledge graphs. I also delivered a similar presentation this year to the joint Special Libraries Association and Medical Libraries Association conference.

I will be giving an updated version of those talks, “Knowledge Graphs for Information Professionals” as a free PoolParty webinar on Thursday, August 17, 11:00 – 12:00 EDT, after which the recording will also be available.

Friday, June 30, 2023

Taxonomies for Technical Documentation

Taxonomies are primarily for tagging content for what is about so that precise content can easily be found by users, who browse or search on the taxonomy terms. The types of content tagged and implementations of taxonomies are numerous. One growing area of taxonomy use is technical documentation.

Technical documentation describes and explains the use or design of products or services. We refer to “documentation,” rather than “documents,” because the format can vary, including book-length manuals, multi-page PDF files such as white papers, content for printed product inserts or brochures, public website pages, and internal content management system pages. Technical documentation has existed for a long time. It used to be published only in print, especially as manual, like books, so the tools of information findability were the table of contents and the index at the back of manual. Now that technical documentation is most often consumed online and always managed digitally, an alphabetical browsable index is not practical to create, maintain, or use. Furthermore, indexes also cannot serve multiple-use (multi-channel) content well.

Taxonomies for content tagging and retrieval

In contrast to creating an alphabetical index of terms referencing page numbers or linked to content sections, tagging content with a taxonomy, has several benefits.

Taxonomies provide a better user experience than indexes. While an index requires the user to browse a long alphabetical list of terms until the desired term is found, the browsing of taxonomies does not require the user to already know the name of the desired term. Taxonomies that are arranged in hierarchical trees allow the user to drill down from broad categories to a specific topic. Taxonomies that are arranged as facets allow the user to select displayed terms (often listed by frequency of tagged usage) grouped by various facets (aspects) to limit the search results.

Facets for technical documentation could be:

User audience
Content type
Product (name or module)
Feature or function
Topic

The process of tagging with a taxonomy or other controlled vocabulary is also simpler than creating an index. Creating a back-of-the-book index involves not only determining important concepts, but also giving them names as terms, determining subentries if any, and creating cross-references. Only trained indexers can do this well. Tagging with a taxonomy, especially if the taxonomy is already well-designed, is not so challenging. Since the terms and their synonyms or cross-references have already been established, it’s just a matter of looking up the term that describes to concept. Technical content now tends to be managed in component content management systems (CCMSs), so the unit of content to be tagged is already designated as a component. (See my April blog post.) Thus, content managers, editors, and writers can competently do tagging themselves. Tagging with a taxonomy can also be automated.

An index is tied to a specific document or collection. The same taxonomy, on the other hand, can be used for more than just technical documentation but across the enterprise, such as for website and other marketing content, product information, and research and development. Consistent terms support more efficient and comprehensive information gathering, sharing, and analysis.

Taxonomies to serve technical documentation’s diverse users

Taxonomies are a useful information finding tool when content is being used by different kinds of users. The same, or parts of the same, technical documentation often have diverse users: product customers, prospective customers, technical support agents, consultant staff, product managers, engineers, etc.

Taxonomy concepts have synonyms or alternative labels to reflect the preferred wording of different groups of users. Matches to even these synonyms can be displayed after a search string is entered into a search box.
https://help.poolparty.biz documentation search on taxonomy concepts
The same taxonomy can be adapted to different user groups with different user interfaces. For example, exposing more metadata in an “advanced search” or displaying just a subset of a larger set of facets.
Taxonomy concepts can be managed with labels in multiple languages, supporting the tagging and retrieval of multilingual content for users of different languages.

Events on taxonomies in technical documentation

I have found increasing interest in taxonomies at technical documentation events. While I have been writing and speaking about taxonomies for a long time, in the past year I have been invited to talk about taxonomies at several events and programs more focused on technical documentation.

Recent past events focusing on technical documentation, at which I spoke, with recordings available:

“Indexes, Search, and Taxonomies: Paths to Findability” Society for Technical Communication webinar, June 2023 (recording available for purchase in late July)
“Taxonomy For Delivering Targeted Technical Content” BrightTALK webinar, April 2023
“From Document Search to Document Understanding” presented by Helmut Nagy, ConVEx, April 2023 (The recording of my presentation on knowledge hubs, is only available for conference registrants.)

Upcoming presentations of mine focusing on taxonomies and technical documentation:

“Taxonomy Creation for Content Tagging” online workshop, Society for Technical Communication Tuesdays, July 18, July 25, and August 1, 4:00 – 5:30 EDT (Registration is still open.)
Taxonomy panel, ConVEx Ideas online conference, July 19, 12:00 – 1:30 pm EDT
“Leveraging Semantics to Provide Targeted Training Content: A Case Study” LavaCon content strategy conference, San Diego and hybrid online, October 16, 1:30 – 3:00 pm PDT

Monday, May 29, 2023

Taxonomies and ChatGPT

ChatGPT, generative AI, and large language models (LLMs) are hot topics of interest in fields of data, information, and knowledge management. LLMs dominated the keynote presentations at the networking conversations at Knowledge Graph Conference in New York and were also discussed in presentations and panels of this conference and Data Summit in Boston, both of which I attended this month. The technology is relevant to taxonomies as well.

ChatGPT is the user interface application on top of GPT (Generative Pre-Trained Transformer), a publicly available LLM developed by OpenAI, which is now in version 4. ChatGPT is thus a form of generative AI, in how it generates answers. There are many other LLMs (Neural network-based AI, trained with deep learning on very large volumes of text), including those which are proprietary, restricted, or for non-commercial research, but only some have generative AI user interfaces. Although we may think of generative AI for providing answers to questions, it can do a lot more, including tasks related to taxonomies.

Organizing terms into hierarchies

Building a taxonomy is a combination of top-down design (identifying the top concepts or facets) and bottom-up building (identifying specific concepts from content analysis). The top-level of a taxonomy is designed to serve user needs and thus should be based on stakeholder interviews, surveys, and brainstorming workshops, which is not something ChatGPT can do. The bottom-up building a taxonomy, based on terms extracted content or search log terms, may benefit from some AI involvement.

I have made a few test requests of ChatGPT for “Put the following list of terms into a hierarchical taxonomy…,” and the results are bulleted lists with indented narrower concepts. ChatGPT can also generate a taxonomy in a machine-readable SKOS in a requested RDF serialization format, as Bob DuCharme explained in his May 20 blog post “Getting ChatGPT to turn a flat vocabulary list into a hierarchical taxonomy.”

Like card sorting exercises, you can specify the top categories/concepts (like a “closed card sort”), or you can let ChatGPT create the top categories (like an “open card sort”). In any case, better results are with context, of course, so you should also tell ChatGPT what the subject domain or context is. Asking for a hierarchical taxonomy results in a third level of hierarchy sometimes, and not just a single level of grouping. Near duplicates usually appear next to each other in the list, and the taxonomist can then decide if and how to merge them into a single concept.

It is particularly for long lists of terms, where automated methods can save the taxonomist’s time. If a taxonomist comes up with terms based on manual content analysis, stakeholder interviews, or submitted lists from subject matter experts, the term lists tend not to be very long, and even the process of coming up with the terms tends to include some thoughts toward categorization at the same time. Longer term lists (such several hundred) are derived from automated term extraction (using text analytics technologies) across a corpus of dozens or hundreds of documents and from search log reports. ChatGPT is practical for putting these long lists of terms into draft hierarchies. There are inevitably some taxonomic errors in the results, which should be obvious to any taxonomist. For example, I have seen duplicated terms on different levels of the hierarchy.

In both lists of extracted terms and search log lists, terms occur that are not suitable as concepts for a taxonomy, such as verbs and adjectives or vague words. ChatGPT understands grammatical rules, so my prompt also says “Include in the taxonomy only nouns and noun phrases and omit the other terms.”

Generating alternative labels (“synonyms”) for concepts

Asking ChatGPT to “provide a list of synonyms for…” a given term can also be helpful for coming up with alternative labels for taxonomy concepts. Alternative labels should be customized for the context of the content and users, so alternative labels for a concept will vary from one taxonomy to another, and an external source, such as ChatGPT should not relied upon as the only source for alternative labels, but merely as a supplemental source of suggestions to be considered.

Again, context can help and should be provided. I asked “Provide a list of synonyms for “healthcare” and got 20 terms. But then when I asked “Provide a list of synonyms for health care, meaning the industry,” I received a slightly more focused list of 15 terms. Interestingly, the two-word variant “health care” was not on the list, so “synonyms” is understood by ChatGPT to mean different words with the same meaning and not orthographic variations. Nevertheless, even 15 terms are too many, and the taxonomist should select from the list of suggestions. It might be a good idea to then test search the suggested alternative labels in the content and system being used.

Although by strict definition a “synonym” is a single word with the same meaning as another word, ChatGPT provides acceptable synonyms for terms which are multi-word phrases, or synonymous multi-word phrases, such as “Chemical manufacturing and distribution” provided as a synonym for “chemical industry.”

Other taxonomy-related uses of ChatGPT

Getting help in designing an ontology (a more complex, yet high-level semantic model with defined classes of concepts, customized relationships, and attributes) is also possible with ChatGPT or other LLMs. Again, submitting the request multiple times with slight variations will yield multiple different responses for the ontologist to consider and select ideas from. Ontologies are not expressed in simple text, though, so the prompt request should specify it, such as RDF TTL. Dean Allemang, author of Semantic Web or the Working Ontologist, has written multiple articles (medium.com/@dallemang) recently on ChatGPT and ontologies/knowledge graphs.

ChatGPT can also be used for comparing lists of terms, data conversion, and basic coding, which may be useful for taxonomists who lack coding skills. It can convert taxonomy or ontology data from one data format to another (although taxonomy/ontology management software also imports/exports in multiple formats). Taxonomies and ontologies in their raw data format are most commonly expressed in the RDF (Resource Description Framework) data model which has various serialization format: RDF/XML, JSON, JSON- LD, .ttl (Turtle), etc., and ChatGPT can convert data from one to another. Data extraction can also be done with ChatGPT. For example, knowledge management professional Camille Mathieu recently shared in a LinkedIn post how she used ChatGPT to write a Python script to extract text & metadata from PDFs.

Perhaps what is most intriguing as a future implementation of taxonomies and ChatGPT is to go in the other direction and have knowledge organization systems, such as taxonomies, support the creation and use of queries (as called “prompts”) for generative AI, to obtain better results. This requires some back-end development, though, and is not merely a matter of putting a taxonomy into a prompt. Since a taxonomy is created for a specific subject domain, the questions need to be confined to the domain of the taxonomy. Semantic Web Company has developed a simple publicly accessible demo “PoolParty Meets Chat GPT,” whereby you can compare the results of questions you ask in the subject area of ESG (Environmental, Social, and Governance) that are submitted directly to ChatGPT and with those which are filtered through an ESG taxonomy and knowledge graph (managed in PoolParty software) so that the questions are enriched before being sent to ChatGPT. The semantically enriched questions generate answers that have more detail, better accuracy, and even web links to definitions and other articles.

Conclusions

While it’s arguable whether ChatGPT alone is a good way to obtain “facts,” there is no doubt that it is a good way to get suggestions and ideas. These suggestions can support the work of taxonomists and ontologists, and taxonomies and ontologies in turn can support the results of ChatGPT and other LLMs. Because there will be errors from ChatGPT, it should not be used to generate taxonomies by those who are not already knowledgeable with taxonomy requirements and best practices, nor should it be used as a substitute for the expertise of taxonomists.

I hope to experiment more with ChatGPT for taxonomies and share additional details in future blog posts.

Saturday, September 30, 2023

SEMANTiCS Conference 2023: Taxonomies, Knowledge Graphs, and LLMs

Semantics and taxonomies

Semantics and taxonomies

The SEMANTiCS conference

The SEMANTiCS conference

SEMANTiCS 2023

SEMANTiCS 2023

Past and future

Past and future

Thursday, August 24, 2023

Taxonomies for Digital Asset Management (DAM)

Why taxonomies for multimedia content and digital assets

1. Growing volume of digital assets

2. Varied audience for digital assets and the use of synonyms

3. Connecting both text and multimedia content across the enterprise

Tagging or keywording multimedia content and digital assets

Taxonomy is what powers DAM

Monday, July 31, 2023

Knowledge Graphs and Taxonomies

Friday, June 30, 2023

Taxonomies for Technical Documentation

Taxonomies for content tagging and retrieval

Taxonomies to serve technical documentation’s diverse users

Events on taxonomies in technical documentation

Monday, May 29, 2023

Taxonomies and ChatGPT

Organizing terms into hierarchies

Generating alternative labels (“synonyms”) for concepts

Other taxonomy-related uses of ChatGPT

Conclusions

Subscribe to The Accidental Taxonomist Blog