Sunday, December 31, 2023

IT and Taxonomies

Taxonomies are related to many fields of work, including knowledge management, information architecture, website design, website marketing at SEO, document management, terminology management, publishing, product management (for information products), content management and strategy, digital asset management, machine learning for classification, natural language processing for auto-tagging, data management, library and information management, and information technology. Information technology is relevant to the implementation of all taxonomies.

Why is IT involved in taxonomies?

Taxonomies link users to content (and taxonomies extended into ontologies also link users to data), but this linking relies on technology. The technology could be a kind of software, such as a content management system that supports the tagging and retrieval of content by taxonomies along with the feature of taxonomy management. Often, however, additional technology is needed to link multiple software systems together, with APIs, and to move data across systems, with extract-transform-load (ETL) tools. Taxonomies are increasingly built in the SKOS (Simple Knowledge Organization System) standard/data model, which enables taxonomies and other knowledge organization systems to be machine-readable and not just human readable.

Taxonomies are a concern of information technology professionals as they are the owners of, and often also the developers of, the systems in which taxonomies are implemented. The systems could be completely internally developed, or they could be licensed software that typically requires some customization or integration with other systems. In my experience as a taxonomy consultant, I have typically engaged in conversations with those in IT as key stakeholders of the taxonomy. However, the degree of the involvement of IT professionals in the taxonomy itself can vary.

In custom taxonomy implementations, such as in an information service/product or in an ecommerce business, IT professionals are usually not involved in the actual design of the taxonomy, but taxonomists or others who create that taxonomy need to collaborate with IT professionals to understand the system’s capabilities and limitations and may impose requirements. Taxonomists are concerned with how the taxonomy will be displayed to the users, how the users can interact with the taxonomy, how tagging is done, and how the search functions. Custom software development has great flexibility in how it supports a taxonomy.
In implementations of taxonomies in licensed software, there may still be some development work for the IT professionals, but there are limits to what can be done or changed.

Commercial content management systems (CMS) that allow for the custom development of the user interface, referred to as “headless” CMSs, however, are becoming more common. The user interface in particular is very significant to how a taxonomy is designed and how it functions.

Who in IT is involved in taxonomies?

Those who work in IT departments with involvement taxonomies could be in roles doing development or support for systems that manage and consume taxonomies, or they could be in systems integration roles. Additionally, there are taxonomy/metadata/ontology specialists who work within the IT department of an enterprise, especially if a knowledge/information management department does not exist in the organization.

In a survey of taxonomists I conducted in January 2022 for the 3rd edition of The Accidental Taxonomist book, of 162 people who do taxonomy work for their employers, which are not consultancies creating taxonomies for others, a multiple-choice question asked what area they work in. Information technology ranked 4th out of 11 choices, with 17% of the responses, following the areas of knowledge management, content management/strategy, and product development/management, yet ahead of the specialties of library, user experience, marketing, and others.

The survey also asked all respondents to provide their job titles, and some of those working in taxonomies have job title that are closely associated with information technology. These included titles of IT Data Analyst, Data and Technology Platform Products, SharePoint Product Owner, Senior Solutions Consultant, Implementation Project Manager, Data Architect, Senior Manager - Graph Solutions, Enterprise Architect, Staff Engineer - Systems, Information Governance Engineer, Head of Technical Services, and Director of Solutions Delivery.

What does IT do with taxonomies?

From my experience as a taxonomy consultant, I have observed that those working in IT, in their efforts to facilitate the adoption of new software and features that make use of taxonomies, may include starter taxonomies within the tool, whether selected from offerings of software vendor or created by the IT staff themselves. For example, IT professionals might create simple controlled vocabularies in the SharePoint term store, such as for document types, departments, locations, etc., so that users can start using the search refinements right away, and there is also an example of the functionality of taxonomy, which can be improved upon and expanded by someone else later.

Then there is enterprise taxonomy/ontology management software, which should be connected to search systems, content management systems, and tagging systems (if not using a tagging module of the taxonomy management system). In my experience working for a taxonomy software vendor, the IT department was often involved in the software purchasing process, if not actually leading the decision-making. Representatives from the IT department attend pre-sales demos of the tool, ask questions, and compile and compare system requirements when requesting a proposal.

That taxonomy is actually an area concern of IT, was also made clear when I saw that taxonomies were mentioned in a section within a chapter on knowledge management-related systems in my son’s introductory Management Information Systems textbook for a required course for his B.S. in Information Technology.

In sum, IT professionals who support enterprise knowledge or information management systems need to have a basic understanding of taxonomy principles, standards, benefits, and uses. My website contains various taxonomy resources. Some IT professionals may even want to go further and design and create small taxonomies (lacking the time to create large taxonomies), and they may want to read my book or attend my workshops or online courses.

Thursday, November 30, 2023

Generative AI at Taxonomy Boot Camp Conference

Generative AI and large language models (LLMs), the technology behind ChatGPT, have been topics of presentations, keynotes, and attendees’ conversations at all the varied conferences I had the fortune to attend this year, including the Taxonomy Boot Camp conference held November 6-7, in Washington, DC. Taxonomy Boot Camp is the only conference dedicated to taxonomies.

Opening and Keynotes


Right from the beginning in the opening welcome, the conference chair Stephanie Lemieux mentioned uses of ChatGPT for taxonomy creation, such as asking prompts: What is a category for a following list of terms?, What label for a concept might be better for scientists, or better for parents?, and What are alternative labels for a specific content? It has become clear that generative AI is a tool to assist taxonomists with specific tasks of a project but is not appropriate for automating the entire creation of a taxonomy. Thus, the Taxonomy Boot Camp theme this year, “Humans in the Loop,” was quite apt for the new era of generative AI, even if not specific to it.


The Taxonomy Boot Camp opening keynote, “Ontologies in the New Age of AI by Dean Allemang, was on this subject. Dean is more of an ontologist than a taxonomist, hence the title, but he discussed both taxonomies and ontologies. Allemang made the statement that Generative AI “understands” why we need a taxonomy (even if managers do not). He explained that has put RDF on many websites, which ChatGPT “reads.” Allemang has found that ChatGPT also performs perfectly on SPARQL queries, the query language for data, including taxonomies, that is in RDF. Allemang gave ChatGPT query examples, such as “Return all the claims we have by claim number, open date, and close date,” and “What is the total loss of each policy where loss is the sum of loss payment, loss reserve, expense, payment, and expense reserve amount?” Allemang advised taxonomists to identify uses for taxonomies that have not been fully delivered on and use generative AI to deliver it, and if people argue that generative AI does not understand their language, taxonomists should build in a link to the taxonomy that makes generative AI understand it.


On the second day, Taxonomy Boot Camp registrants  attend the same shared keynote presentations with all of the KMWorld co-located conferences, and this year these mostly dealt with generative AI, including the opening keynote by Dion Hinchcliffe “Tech-Driven Enterprise Thrills & Chills: The Future of Work.” 

Regular Sessions

In addition to being mentioned in various talks, generative AI was also the subject of a session, “ChatGPT, Taxonomist: Opportunities & Challenges in AI-Assisted Taxonomy Development,”  which comprised two separate presentations.

In this session, Xia Lin presented in “Chat GPT and Generative AI for Taxonomy Development” in which he discussed the steps involved in using ChatGPT in two case studies. In one, a taxonomy for data analytics projects of a small business was developed by providing ChatGPT with the scope of the first level of the taxonomy and then asking ChatGPT to expand individual categories by adding subcategories and then to add definitions of terms and categories. The results were reviewed and revised by experts. But Lin did not stop there. He showed the results of asking ChatGPT to provide stakeholder interview questions around a category, and (for those more technically inclined) how to create a ChatGPT plug-in for various defined functions of taxonomy creation, using ChatGPT’s APIs. 

Also in “ChatGPT and Generative AI for Taxonomy Development” Marjorie Hlava and Heather Kotula jointly presented on issues of the use of ChatGPT to create taxonomies and in general. They explained the risks of bias, plagiarism, ethics, data quality, matching the generated taxonomy to the content, and the amplification of errors upon repeating a prompt. In plagiarism, for example, if you ask ChatGPT to return a complete taxonomy on a subject domain in may return a copyrighted taxonomy that cannot be reused without a license.

Generative AI also impacts the topics of other presentations. For example, in the presentation “In Taxonomy We Trust: Building Buy-In for Taxonomy Projects,” Bonnie Griffin mentioned the importance of “continually re-introducing the value of taxonomy, as generative AI captures attention.” It was also the subject of a debate question in somewhat humorous closing sessions “Taxonomy Showdown—Point/Counterpoint With Taxonomy Experts.”


More on Taxonomies and AI

Of course, there is more to AI than just generative AI. Other sessions dealt with machine learning for auto-categorization. These included presentations by each Bob Kasenchak and Rachael Maddison in the session “Machine Learning Is Coming forYour Taxonomy,”  (link to Bob’s slides)  and Wytze Vlietstra’s presentation of  “Vision for Modular Taxonomy Product at Elsevier,” in which the program included “shared infrastructure supported by AI-based decision support tools.” In fact, AI has been a theme of Taxonomy Boot Camp in the past, in 2018. It is generative AI based on large language models that is new. 

For some more details on how this technology may be used for taxonomy development, see my prior blog post this spring Taxonomies and ChatGPT.  To get another perspective on this conference, check out the recent blog post by Taxonomy Boot Camp speaker Mary Katherine Barnes Integrating AI: Insights from KMWorld 2023.

Tuesday, October 31, 2023

Taxonomies for Learning and Training Content

Taxonomies are primarily for tagging digital content to make it more easily found when users search or browse on taxonomy concepts. Content can be of various kinds: articles and research reports, policies and procedures, technical documentation, product information, contracts and other legal documents, marketing content, etc. A growing area of digital content is instructional or training content, especially corporate training for employees.

The need for taxonomies for training content

When an organization offers its employees a large number of training courses, it can be difficult for employees to find desired training. Having the training content tagged with controlled terms from a taxonomy makes it easier to find.

The training content may come from different sources and thus may come with different, inconsistent metadata already applied to it. An organization may have generic training (such as on diversity and information security) produced by a corporate training company, industry-specific training (such as anti-money laundering for financial services and retail industries) produced by a different training company, and company-specific training which is internally produced. An organization may also subscribe to an offering of business skills and technical skills training offered by one ore more third party, such as LinkedIn Learning. It may be very difficult to search across all these different sources.

Furthermore, simply searching on words in training course titles might not be effective, if topics are broad or the course titles are vague. For example, a search on “communication” may yield far too many results to sort through. A search on “writing” might miss a training course with a title of “Bringing out Your Voice” or “Use Plain Language.” Tagged with the concept of “Writing,” these courses can then be found.

Faceted taxonomies for training content

Sample faceted taxonomy for
training content in PoolParty

For the complexities of training content, a single topical taxonomy is not enough. There could be ambiguity as to the skill level or between training topic and training format. For example, the topic of “Manager training” is not clear as to whether it is for new managers or all managers. The topic of “Presentation slides” is not clear as to whether it is training on how to create presentation slides or if presentation slides is the training format/medium. This is where a faceted taxonomy can help. Facets are different aspects of content which can be combined as search filters.

Training content is especially well suited for facets. Examples of possible facets for training content are: Content type, Level, Role, Skill, Training Program, and Topic.  An example of taxonomy terms in each facet are as follows:
•    Content type: Video training
•    Level: Intermediate
•    Role: Customer support
•    Skill: Written communication
•    Training program: Upskilling
•    Topic: Timeliness

It’s important to keep in mind that facets should be mutually exclusive, so the same concept, such as “Customer support,” cannot exist in both the Role and the Skill facets. Distinguishing a role and a skill can sometimes be difficult. It important to separate out Role, though, because then there is the possibility to recommend training courses based on one’s Role.

Taxonomy facets are based on metadata properties, but there likely exist many more metadata properties than needed for the end-user to filter train content searches. Additional, administrative metadata properties should not be implemented on the front-end for course searches. These might include Organizational unit, Original source, Region, Access Level, etc.

Skills taxonomy sources and challenges

Developing a skills taxonomy facet has its own challenges. First of all, there are multiple goals of skills taxonomies. Enabling employees or their managers to find appropriate training is just one goal. Other purposes may be to describe job openings to found by candidates with matching skills, to find an expert with a desired skill to ask question of or have work on a project, or to map roles and skills to identify gaps and improve human resources strategies and professional development programs.

There are also varied sources for skills taxonomies. Managers and subject matter experts would list certain skills, which might differ from a list of skills proposed by human resources staff. A taxonomist, metadata specialist, or information architect working on a taxonomy would come up with a slightly different list of skills, probably not as detailed. Finally, there are external sources, but these might not be appropriate to a specific organization. The largest, best known published taxonomy of skills is ESCO (European Skills, Competences, Qualifications, and Occupations), but with 13,890 skills, it is much too large and detailed for any one organization. It might be best to start with any skills list that the HR department has and build it out further with recommendations from managers, but not as detailed as some subject matter experts might suggest. External sources could be consulted to fill in some gaps.

There is the potential to get too detailed in creating a hierarchy of skills, and some of the narrower concepts may end up being specific topics and not exactly skills. For example, a skill of project management could get narrower concepts for different project management methodologies and then various components of each methodology.  This is would not be appropriate for a skills taxonomy, although, if important, these narrower concepts could be included in a Topics facet instead.

Presentations on taxonomies for corporate training content

My most recent conference presentation and my next conference presentation are both about taxonomies for corporate training content.  On October 16, I presented at the LavaCon content strategy conference in San Diego “Leveraging Semantics to Provide Targeted Training Content: A Case Study,” which was jointly presented with PoolParty software proof-of-concept project customer Esther Yoon of Google gTech. In addition to some of the issues described in this blog post, I also discussed how facets can be customized and how roles and skills can be linked for recommendation, and Esther presented how the POC improved the discovery of training content for those in roles related to customer support.

On November 6, at Taxonomy Boot Camp conference in Washington, DC, I will present “Challenges in Creating Taxonomies for Learning & Development,” which will be jointly presented with Amber Simpson of Walmart’s Walmart Academy, also a PoolParty software customer. In addition to issues described here, I will also provide specific examples of challenges in creation a Skills taxonomy facet. The slides will also be made available afterwards.

Saturday, September 30, 2023

SEMANTiCS Conference 2023: Taxonomies, Knowledge Graphs, and LLMs

The most recent conference I participated in was SEMANTiCS, September 20-22, in Leipzig Germany. This was the 19th year of this European conference focused on the application of semantic technologies and systems. This was also my fourth year presenting a workshop/tutorial on taxonomies and ontologies at the conference. The widespread value of taxonomies across different areas of specialization is indicated by the fact that taxonomy workshops are repeatedly a part of conferences on various subjects, including semantics, knowledge management, library and information science, information architecture, content strategy, and  digital asset management.

Semantics and taxonomies

Semantics means “meaning,” so semantic systems utilize standards to support the encoding of meaning of things/resources and their relations, making the semantics machine-readable. Various standards, guidelines, and data models for semantic systems were developed for what is called the Semantic Web. The Semantic Web goes beyond the simple hyperlinks of the World Wide Web to label shared metadata, specify the kinds of relations. This supports linked data, and the linking of taxonomies to other taxonomies and ontologies and their tagged content or data, which are stored on different servers. 

Just as World Wide Web protocols have been adapted within enterprises (“behind the firewall”), so have Semantic Web standards. You don’t have to share your data publicly to reap the benefits of the Semantic Web: open standards to enable the migration of taxonomies and related data between systems, sharing of data with partners, extracting and transforming data from within silos across the enterprise into a standard format, and the ability to link to data on the Web to bring in new content even if not sharing content out on the Web.

Taxonomies, as controlled vocabularies, have always been about concepts, each with unique understood meaning, not just words or strings of text. So, using taxonomies is using semantics. The Semantic Web standard SKOS (Simple Knowledge Organization System) specifies a data model to make taxonomies and other knowledge organization systems (thesauri, classification systems, etc.) machine-readable and interchangeable on the Web. Semantic Web standards also cover ontologies with RDF-Schema and OWL. By following Semantic Web Standards, taxonomies can easily be linked to and extended with ontologies, and then by linking to data stored in a graph database, knowledge graphs can be built.

The SEMANTiCS conference

The SEMANTiCS conference is somewhat unique by being semi-academic and semi-industry. It has separate academic track and industry track chairs and additional tutorials and workshops. It’s good to bring academia and industry together in a field like this, where research topics can be applied and partnerships can be developed. The location of the conference varies, and it partners with a local higher education institution for logistical support, with graduate students volunteering to help in exchange to getting access to sessions. 

This was the second year that SEMANTiCS combined its conferences with the Language Technology Industry Association, which organized a Language Intelligence track, dealing with technologies for the management of terminology, multilingual content, and machine translation. The conference also includes a one-day track focused on DBpedia, which is not the same first day as the tutorials and workshops. The entire conference lasts three full days, and has a social event one evening, and a dinner on the second evening.  

The conference has industry vendor sponsors, about eight of which were exhibiting, and a few more which did not exhibit. There are also slightly more organizations which are “partners,” including DBpedia, The Alan Turing Institute, and a number of institutes of higher education in Europe which have programs in semantic technologies. Additional organizers include Semantic Web Company, Institut für Angewandte Informatik and the Vjije Universities Amsterdam, representing the three countries where SEMANTiCS has been taking place: Austria, Germany, and Netherlands. 


The 2023 conference was held September 20-22 in Leipzig, Germany, under the leadership of a new chair Sahar Vahdati of Technical University Dresden. There were about 285 participants in person and about one-third as many online. The conference has been hybrid since 2021. There were often six simultaneous sessions. Themed tracks or sessions of multiple speakers included Knowledge Graphs, Reasoning & Recommendation, Natural Language Processing and Large Language Models, Legal & Data Governance, Ontologies Data Management, and Environmental-Social-Governance (ESG). While there was not a life sciences track like last year, there was a themed subject track on cultural heritage. LLMs and ESG were both new topics this year. Poster presentations also covered the range of topics. 

Knowledge graphs is a regular theme at this conference, but this time there was the addition of LLMs. The opening keynote was “Generations of Knowledge Graphs: The Crazy Ideas and the Business” presented by Xin Luna Dong of Meta. She spoke of three generations of knowledge graphs: entity-based knowledge graphs, text-rich knowledge graphs, and dual neural knowledge graphs, using an ontology and LLMs. The second day’s keynote was “Knowledge Graphs in the Age of Large Language Models,” presented by Aiden Hogan of the University of Chile. LLMs and AI topics were also presented in the Knowledge Graphs track, such as in Andreas Blumauer’s talk “Responsible AI and LLMs.” Finally, the moderated closing panel was “Large Language Models and Knowledge Graphs: Status Quo - Risks - Opportunities” with panelists, Andreas Blumauer and Jochen Hummel from software vendors and Kristina Podnar, a digital policy consultant, who were not completely in agreement.

In addition to my 3-hour tutorial, “Knowledge Engineering of Taxonomies and Ontologies,” only slightly updated from last year, I also contributed, along with Lutz Krüger, to Andreas Blumauer’s new 3-hour tutorial “They Key to Sustainable Enterprises: ESG, KNowledge Graphs, and Digitalization.” Adopting an ESG program and complying with upcoming ESG directives requires connecting a lot of information and data and aligning it with requirements and disclosure categories, and this is where a knowledge graph can be extremely helpful. Other tutorials and workshops dealt with data spaces, ontology reasoning, healthcare NLP, NLP for knowledge graph construction, and FAIR ontologies. 

Past and future

Semantic technologies were very new when the conference was first launched in 2005 by Semantic Web Company, even before launching its product PoolParty Semantic Suite. But it’s never been a vendor product-based conference. The main purpose was and still is to promote the understanding and advancement of semantic technologies. Competitor software vendors sponsor and exhibit, and Semantic Web Company has stepped back from a lead organizational role. The conference is not one where sponsors make business in selling their products or services, but rather for raising awareness, making and reinforcing partnerships, exchanging ideas, and general networking, including looking for work. It is more of a community conference than anything else, but it is an open welcoming community, with new people coming every year.

The next SEMANTiCS, celebrating its 20th year, will be September 16 - 18, 2024, in Amsterdam.

Thursday, August 24, 2023

Taxonomies for Digital Asset Management (DAM)

Icons for file types

Taxonomies, with their origin in thesauri and library subject heading systems, have traditionally been associated with the tagging and retrieving of text content. The management and retrieval of multimedia content (images, video, audio, or other graphics files), on the other hand, has traditionally been served by metadata schema, reflecting the various attributes of the content, including digital rights. 
Metadata for text content has become increasingly important to make it “structured” and easier to manage. Meanwhile, taxonomies, with their richness in topical detail, hierarchical structure, and synonyms, have become increasingly important in making multimedia content, especially digital assets, easier to identify and retrieve.

However, the features and uses of taxonomies and descriptive metadata have somewhat converged, now that faceted taxonomies have become common. A facet is an aspect or attribute, by which the user may limit, filter, or refine a search or initiate a search selection. (Several of my past blog posts discuss facets, including "Customizing Taxonomy Facets.") 

Why taxonomies for multimedia content and digital assets

There is considerable overlap between multimedia content and digital assets, although they are not identical. A digital asset is something that is created and stored in a digital form that has value. The word “asset” implies it has value. So, not everything that is in digital form is an asset. Creative works in digital form, whether by in-house producers or licensed, are considered digital assets. Multimedia content tends to have value, so it tends to be considered as digital assets. If it needs to be managed and made available for retrieval and reuse, it can probably be considered a digital asset. If it needs to be managed and made available for retrieval and reuse, then assigning metadata and taxonomy terms is probably important.

1. Growing volume of digital assets

The main reason to move beyond simple controlled lists of terms/values in metadata properties (such as Type, Location name, Location type, Event/Occasion, Person type, Season, etc.) and include relatively large topical taxonomies for digital assets is to provide the ability to better limit search results in large volumes of content. The number of digital assets owned or managed by organizations has grown immensely, as varied media sources have become more common, not just for brand content but also for marketing, instructional, and technical content. Limiting search results from only a few broad topic categories is often not sufficient, and too many digital assets are retrieved.

A taxonomy provides further granularity of subjects which a digital asset depicts or describes. A granular hierarchical taxonomy could provide the terms for a single metadata property, such as “Subject,” or there could detailed taxonomies in more than one metadata property, to also include “Activity,” “Product category,” or “Occasion,” depending on the use case.

2. Varied audience for digital assets and the use of synonyms

Another reason to use taxonomies for digital assets is to better suit a varied audience of users. While it is digital asset managers who rely on metadata to manage the digit assets, various other users need to find the same assets: product and brand managers, web content editors, art designers, partnership and licensing specialists, and perhaps even customers. Assets are most valuable when they have wider uses, but in order to be reused by different people and departments, a detailed taxonomy helps.

A taxonomy is not only more detailed than a list of a few categories, but it is also usually enriched with synonyms (also called alternative labels or variant terms). This way, different people who may describe the same thing by different names will find the same concept and its tagged content. For example, synonyms could be “Bridal” and “Wedding”; “Infant” and “Baby”; “Botanical” and “Plants”; “DIY” and “How to.” Internal users and external users often have different preferred names for things.

3. Connecting both text and multimedia content across the enterprise

Applying a taxonomy to tag digital assets can also allow digital assets to be retrieved along with other content, text content, in other content management systems (CSMs). This would require that the taxonomy be a centrally managed enterprise taxonomy, and not just a siloed taxonomy within a single DAM system, and that more than one system are connected to each other (such as through APIs or integrations) or that a dedicated front-end enterprise search application is linked to content in their source repositories.  

While users often look only for digital assets that they know are located within a specific DAM system, other times users want to conduct a more exhaustive search on a subject. While most images and videos are expected to be in the DAM, along with some PDF files, other PDF files, presentations, and documents, and even some images and videos from other sources may be located in other systems. Taxonomies that can be linked to each other or a single master taxonomy managed centrally in a dedicated taxonomy management system, such as PoolParty, serving as "middleware," connected to the content in each of the systems, can enable comprehensive search and retrieval across the organization, especially if all the data is managed in a knowledge graph (explained in my last blog post "Knowledge Graphs and Taxonomies").

Tagging or keywording multimedia content and digital assets

Finally, there is the tagging component of taxonomies, which is often called keywording with respect to images. Digital asset managers must assign descriptive metadata to the assets they manage, which is not difficult if the controlled lists of available values are short. A taxonomy, however, may be large, so it can be a challenge to determine which subject terms to tag. 

For text-only content, the technologies of text analytics, including entity extraction and natural language processing, can be applied to enable auto-tagging. Image, video, and audio content had previously been considered unsuitable for auto-tagging, and thus less suitable for large taxonomies, but this is no longer the case.

There are new technologies and methods to enable auto-tagging of digital assets. Audio-to-text technologies enable transcripts to be created from audio and video files, and these texts can automatically analyze and tagged. Improvements in image recognition technology can enable images to be auto-tagged for their subjects. Human review of auto-tagging is still recommended, but that’s easier than tagging from scratch.

Taxonomy is what powers DAM

DAM systems do support taxonomies, so you should not hold back from creating a suitable taxonomy for your DAM content. To learn more about creating taxonomies for digital assets, attend the session “Taxonomy is What Powers DAM” on September 14, 2023, at the HS Events DAM New York conference. I will join three other panelists to discuss taxonomies for digital asset management: what taxonomies are, how to develop a taxonomy, how to do research for a taxonomy, and how to manage a taxonomy, especially for DAM applications. Register with the code SPEAKER100 for $100 off.


Monday, July 31, 2023

Knowledge Graphs and Taxonomies

Knowledge graphs have recently emerged as an additional and growing use of taxonomies. A knowledge graph comprises data extracted and stored typically in a graph database with an ontology to semantically link types of data, but usually a knowledge graph also includes a taxonomy, thesaurus, or set of controlled vocabularies to provide consistent labeling. As a result of this combination, people involved in knowledge graphs are taking an interest in taxonomies, and people involved in taxonomies are taking an interest in knowledge graphs.

The traditional and still primary use of taxonomies is to consistently and comprehensively tag and retrieve content, whereas the focus of knowledge graphs is to access and make connections among disparate data. Content tagged and retrieved with taxonomies includes pages in websites, intranets, content management systems; documents in document management systems; and images and video files in digital asset management systems. Knowledge graphs link together data which includes records in databases, customer relationship management systems, product information management systems, and other enterprise systems, and the values in cells in spreadsheets, referenced by their row and column headers. By integrating a taxonomy into a knowledge graph, users can then retrieve both content and data on the same subject together.

What is a knowledge graph? The first non-sponsored definition that pops up today with a Google search not from a vendor is from the the Alan Turning Institute, the U.K. national institute for data science and artificial intelligence, which provides the following explanation on its Knowledge graphs interest group page:

Knowledge graphs (KGs) organise data from multiple sources, capture information about entities of interest in a given domain or task (like people, places or events), and forge connections between them. In data science and AI, knowledge graphs are commonly used to:

  • Facilitate access to and integration of data sources;
  • Add context and depth to other, more data-driven AI techniques such as machine learning; and
  • Serve as bridges between humans and systems, such as generating human-readable explanations, or, on a bigger scale, enabling intelligent systems for scientists and engineers.

From the taxonomy perspective, a knowledge graph is a combination of controlled vocabularies or a taxonomy with the semantic layer of an ontology, which adds custom semantic relations and attributes, plus specific instance data, which is stored in a graph database.  A knowledge graph thus extends the use of a taxonomy beyond content to also include data. From the graph data perspective, a knowledge graph is the gathering of disparate data, which has been extracted, transformed, and loaded (ETL) into a graph database, where it is linked with semantic relations provided by an ontology and described by terms in a taxonomy, and it can be queried and analyzed all in one place. 

GraphViews of SWC ESG Knowledge Graph
GraphViews of SWC ESG Knowledge Graph
It is an important to the definition of a knowledge graph to include its purpose and not just its components. The purposes include providing a unified view of data, easy availability of information, easy integration of new data, secure interoperability, visualization of entities and relations, the possibility of discovery and insights through semantic relations, and the support for complex multi-part queries with quick results. With inclusion of a taxonomy, a knowledge graph can bring together both data and content on in and organization.
With such lofty goals, knowledge graphs should be an area of interest not just of data scientists and ontologists, but also of information professionals (including taxonomists) and knowledge managers. This is gradually becoming the case. Knowledge graphs emerged in the 2010, and became popularized with the Google Knowledge Graph introduced in 2012. Knowledge graphs were first introduced at the KMWorld (Knowledge Management) conferences in 2017 as "semantic knowledge graphs,” and were also first mentioned at the Taxonomy Boot Camp conference that year. This November, the KMWorld conference has more talks on knowledge graphs than before. When I proposed multiple topics for this spring’s Information Architecture Conference, the conference chair chose the presentation on an introduction knowledge graphs. I also delivered a similar presentation this year to the joint Special Libraries Association and Medical Libraries Association conference.

I will be giving an updated version of those talks, “Knowledge Graphs for Information Professionals” as a free PoolParty webinar on Thursday, August 17, 11:00 – 12:00 EDT, after which the recording will also be available.

Friday, June 30, 2023

Taxonomies for Technical Documentation

Taxonomies are primarily for tagging content for what is about so that precise content can easily be found by users, who browse or search on the taxonomy terms. The types of content tagged and implementations of taxonomies are numerous. One growing area of taxonomy use is technical documentation.

Technical documentation describes and explains the use or design of products or services. We refer to “documentation,” rather than “documents,” because the format can vary, including book-length manuals, multi-page PDF files such as white papers, content for printed product inserts or brochures, public website pages, and internal content management system pages.  Technical documentation has existed for a long time. It used to be published only in print, especially as manual, like books, so the tools of information findability were the table of contents and the index at the back of manual. Now that technical documentation is most often consumed online and always managed digitally, an alphabetical browsable index is not practical to create, maintain, or use. Furthermore, indexes also cannot serve multiple-use (multi-channel) content well.

Taxonomies for content tagging and retrieval

In contrast to creating an alphabetical index of terms referencing page numbers or linked to content sections, tagging content with a taxonomy, has several benefits.

Taxonomies provide a better user experience than indexes. While an index requires the user to browse a long alphabetical list of terms until the desired term is found, the browsing of taxonomies does not require the user to already know the name of the desired term. Taxonomies that are arranged in hierarchical trees allow the user to drill down from broad categories to a specific topic. Taxonomies that are arranged as facets allow the user to select displayed terms (often listed by frequency of tagged usage) grouped by various facets (aspects) to limit the search results. 
PoolParty help documentation facets
 Facets for technical documentation could be:

  • User audience
  • Content type
  • Product (name or module)
  • Feature or function
  • Topic

The process of tagging with a taxonomy or other controlled vocabulary is also simpler than creating an index. Creating a back-of-the-book index involves not only determining important concepts, but also giving them names as terms, determining subentries if any, and creating cross-references. Only trained indexers can do this well. Tagging with a taxonomy, especially if the taxonomy is already well-designed, is not so challenging. Since the terms and their synonyms or cross-references have already been established, it’s just a matter of looking up the term that describes to concept. Technical content now tends to be managed in component content management systems (CCMSs), so the unit of content to be tagged is already designated as a component. (See my April blog post.) Thus, content managers, editors, and writers can competently do tagging themselves. Tagging with a taxonomy can also be automated.

An index is tied to a specific document or collection. The same taxonomy, on the other hand, can be used for more than just technical documentation but across the enterprise, such as for website and other marketing content, product information, and research and development. Consistent terms support more efficient and comprehensive information gathering, sharing, and analysis.

Taxonomies to serve technical documentation’s diverse users

Taxonomies are a useful information finding tool when content is being used by different kinds of users. The same, or parts of the same, technical documentation often have diverse users: product customers, prospective customers, technical support agents, consultant staff, product managers, engineers, etc.

  • Taxonomy concepts have synonyms or alternative labels to reflect the preferred wording of different groups of users. Matches to even these synonyms can be displayed after a search string is entered into a search box. documentation search on taxonomy concepts documentation search on taxonomy concepts

  •  The same taxonomy can be adapted to different user groups with different user interfaces. For example, exposing more metadata in an “advanced search” or displaying just a subset of a larger set of facets.
  • Taxonomy concepts can be managed with labels in multiple languages, supporting the tagging and retrieval of multilingual content for users of different languages.

Events on taxonomies in technical documentation

I have found increasing interest in taxonomies at technical documentation events. While I have been writing and speaking about taxonomies for a long time, in the past year I have been invited to talk about taxonomies at several events and programs more focused on technical documentation.

Recent past events focusing on technical documentation, at which I spoke, with recordings available:
Upcoming presentations of mine focusing on taxonomies and technical documentation: