The Accidental Taxonomist: Mapping taxonomies

Showing posts with label Mapping taxonomies. Show all posts

Sunday, November 27, 2022

Taxonomies to Bridge Silos

There is increasing interest in organizations to “break down silos” of content and data. Silos may be different software applications, distinct web or intranet content, or merely different computer drives and folders. The goal is to enable search and retrieval across content that is stored in different content/document management systems and shared folders and the analysis and comparison of data stored in different kinds of database management systems, records management systems, and spreadsheets. This results in better, more complete information to enable more informed decisions and knowledge discovery, along with improved user satisfaction, while also saving time. Breaking down or bridging such silos was a theme of my two most recent conferences.

LavaCon: Connecting Content Silos

The 20^th annual LavaCon conference on content strategy, held October 23-26 in New Orleans, had the theme this year of “Connecting content silos across the Enterprise.” The conference had a number of presentations tied to the theme, 10 of which had “silos” in their titles. Two presentations I especially enjoyed were by leading content strategy consultants about how to connect silos.

Sarah O’Keefe of Scriptorium, in her presentation “From Silo Busting to CaaStle Building,” with a fairy tale castle metaphor, explained that completely unified content cannot be achieved, because CMSs are tuned to specific content domains, corporate websites accommodate different goals of different groups, content silos have their own delivery pipelines, and silos often match the organizational structure. Her solution was to provide Content as a Service (CaaS), or a “CaaStle in the cloud(s).” Silos are kept, allowing for unique requirements, and perhaps reduced in number, but are connected were needed.

Val Swisher of Content Rules, in her presentation “Creating a Unified (Siloed) Content Experience: The Importance of Terminology and Taxonomy,” explained that siloed content results in different user experiences for each silo. But silos are not going away, because there is no single toolset, particular content has its owners, and certain content may be considered special. Therefore, the user experience should be improved to “ensure that all content looks like it comes from the same company” and to “eliminate the confusion that users experience when they consume content created by various silos.” This is done by standardizing the content, the search, page layout, navigation, content types, terminology, and taxonomy.

At LavaCon, I presented a pre-conference workshop with the title “Using Taxonomies and Tagging to Connect Content Across the Enterprise.” While most of my workshop addressed the general principles and best practice for taxonomy creation, along with the basics of tagging, I did discuss a how centrally managed taxonomy, external from but linked to various content management systems and other applications or repositories of content, can bridge silos. Taxonomy management software positioned as “middleware” such as PoolParty, connects to these different content applications and repositories, and then the taxonomy is presented to the user in a single user interface.

Taxonomy Boot Camp: Taxonomy Breaking Down Silos

At the annual Taxonomy Boot Camp conference, held November 7-8 in Washington, DC, and co-located with the KM World conference, I spoke in a two-presentation session titled “Taxonomy Breaking Down Silos.” The idea is that taxonomies provide the connections to break down barriers between different systems and teams. I presented on taxonomy linking jointly with Donna Popky, Senior Taxonomy & Information Architecture Specialist, Harvard Business School. I explained the principles of taxonomy project linking, and Donna presented a case study of taxonomy linking using a hub and spoke method to link separate taxonomies managed by different business units with separate content repositories for different purposes at Harvard Business School. So, this was a case of creating a hub taxonomy linked to the various business unit spoke taxonomies.

The other speaker in the session, Rachael Maddison, Content Infrastructure Architect & Taxonomy Product Manager for Adobe Digital Media Experience and Engagement, presented on taxonomy adoption across corporate silos and not merely content silos. Collaboration plays a role in wider taxonomy adoption, and as Rachael stated: “Mapping or merging can’t happen until there is stakeholder buy-in.”

Over the years, my list of the benefits of taxonomies has grown. Linking data, content, and corporate silos are additional benefits. This can be done with a single, enterprise taxonomy or with multiple linked taxonomies. In either case, the taxonomy needs to be managed externally from any individual siloed application in a dedicated taxonomy management system. Taxonomies can then break down corporate silos and connect content and data silos.

Thursday, August 22, 2019

Taxonomy Mapping

As more taxonomies get created, we see a growing need to “map” taxonomies to each other, which is linking between individual terms or concepts in each taxonomy so that the taxonomies may be used in some combination. Mapping is not new, but as it has become more frequent it is now reflected in newer standards and in taxonomy management software features.

Mapping taxonomies

Reasons or use cases for mapping include:

Selected content with an enterprise taxonomy is made available on a public web site with a different public-facing taxonomy.
A provider of scientific/technical/medical content with a technical thesaurus creates a simpler taxonomy aimed at laypeople.
Content will be made available in a different language region, and a comparable taxonomy already exists in the other language.
A knowledge graph is built to aggregate data from multiple repositories, each with its own taxonomy.
An enterprise search is based on “federated search” and different areas have different search-support thesauri.
Terms from search engine logs are mapped to a taxonomy to add alternative labels.
Terms from an open source or licensed vocabulary are mapped to a taxonomy to enrich it.

I’ve worked on occasional taxonomy mapping projects since the late 1990s, and I discuss mapping in a section of my book, The Accidental Taxonomist (2^nd edition, pp. 369-73) and in an earlier blog post. I’ve also presented in conferences before on mapping taxonomies, as early as 2009, but only briefly and in the wider in the context of related activities of merging taxonomies and creating multilingual taxonomies. My next conference presentation (not including a pre-conference workshop), “Mapping Taxonomies, Thesauri, and Ontologies” (SEMANTiCS 2019 in Karlsruhe, Germany), will be dedicated to subject of mapping.

In talking recently with more people about mapping, both clients and software vendors, I’ve learned that my previous view of mapping was somewhat narrow. I had considered mapping to be only one-way directional from terms in a tagged taxonomy to terms in a retrieval taxonomy.

One-way directional taxonomy mapping

I still think this model applies to the majority of use cases, but mapping has a broader meaning in the standards and in taxonomy management software capabilities.

Standards for Taxonomy Mapping

The SKOS (Simple Knowledge Organization System) W3C standard adopted in 2009 for a controlled vocabulary model and interchangeable format specifies not only the familiar thesaurus relationships of broader, narrower, and related, but what are called mapping relationships comprising exactMatch, closeMatch, broadmatch, narrowMatch, and relatedMatch. How these different mapping relationship types are to be used is really up to the taxonomy owner. The broadMatch and narrowMatch are directional, but reciprocal, so using these permits bidirectional mapping. However, there is no reason why you cannot use just one mapping relationship type if you are mapping in only a single direction. Or you could use just two, such as exactMatch and broadMatch.

The international standard ISO 25964-2 Thesaurus and Interoperability with Other Vocabularies – Part 2: Interoperability with Other Vocabularies (published in 2013) is substantially about mapping. Interoperability is not synonymous with mapping but covers more, including using a standard format such as SKOS. However, the ISO standard discusses mapping in more detail than any other form of interoperability. The introduction states that “inter-vocabulary mapping will be the principal focus of this part of ISO 25964.” (The slightly older American standard, ANSI/NISO Z.39.19-2005 is comparable with ISO 25964 Part 1, which is all about thesauri, and lacks any explanation of mapping.) While SKOS provides standardized labels, useful for porting and linking vocabularies between different systems and the web, ISO 25964-2 provides guidance on the theory and practice of various types of mappings.

ISO 25964-2 defines mapping broadly as the “process of establishing relationships between the concepts of one vocabulary and those of another.” Like SKOS, it also covers different kinds of mapping relationships, although it describes more types: equivalence, compound equivalence, hierarchical, associative, exact, inexact, and partial equivalence. It also discusses mapping on the high level between pairs or multiple vocabularies and in what kind of direction/arrangement. The standard also includes examples. There is really a lot to consider, and I’ll definitely re-read ISO 25964-2 in detail before embarking on my next mapping project.

Software for Taxonomy Mapping

When I first did taxonomy mapping, Excel files of each vocabulary were compared with either the features of Excel or through scripting. Now, mapping can be also done within taxonomy management software, once both vocabularies are in the software, usually requiring that at least one be imported.

As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.

The software’s mapping feature is agnostic to your intentions and direction of mapping, so it’s important to plan the mapping so that it supports mapping in the direction you want. In addition to terms with equivalent meaning, it is also acceptable to map from a narrower to a broader concept as the narrower is an example of the broader and can be used for it, but the mapping won’t work in the other direction. It is also acceptable to map from a term that is a preferred label to a concept where that term is an alternative label/nonpreferred term, and that mapping also won’t work in the other direction.

If planning your mapping project seems daunting, the software vendors, PoolParty, Smartlogic, Synaptica, and Access Innovations (vendor of Data Harmony Thesaurus Master) will provide assistance or the full service of mapping. In fact, Access Innovations has not included an auto-mapping feature in DH Thesaurus Master, because customized results may be better achieved through offline mapping.

Mapping is not just between taxonomies, but can be between taxonomies and thesauri, thesauri and ontologies, or other controlled vocabularies, something else that ISO 25964-2 covers. If you need assistance with mapping, I'd be happy to help.

Thursday, January 19, 2012

Taxonomy Merging or Mapping

Yesterday I gave a webinar presentation for members of Taxonomy Division of the SLA professional association, entitled “Taxonomy Updating, Combining, and Translating.” It was not the first time I presented on these topics and on the topic of taxonomy combining (mapping and merging) in particular. What was different this time is that I am currently involved in a project that involves taxonomy merging. But since I’m right in the middle of the project, I had not had the time to reflect on it and include takeaways from this project in yesterday’s presentation. Now I shall.

Merging and mapping are not the same thing. Merging brings together two taxonomies on the same subject, eliminating duplicate terms, supplementing each other with terms from one or the other taxonomy. The end result is a new and improved taxonomy taking the best of both of the legacy taxonomies. Mapping matches one taxonomy against another, so that terms in one taxonomy may be used for terms in another, such as a user interface taxonomy matching to another taxonomy that had been used to index the content. The end result is that one taxonomy can now retrieve more content.

The project I am involved in was described by the client as “mapping,” but then it became apparent that it was really the merging of taxonomies, not mapping. A second “mapping” component of the project turned out to be more about matching taxonomy to content. While in general it is good practice for the consultant to continue use the client’s own terminology, referring to “merging” as “mapping” was initially confusing. The distinction is important when it comes to activity of term-by-term comparisons, which is done in both merging and mapping.

Equivalent concepts, whether with the same terms or slightly different terms (such as Cars and Automobiles), are easily mapped or merged. So, the distinction between merging and mapping is not that important. In the case of merging you just have to decide which term to use as the preferred term.

The main difference in merging and mapping is how to deal with cases of a term in one taxonomy having a broader meaning that includes a term in the other taxonomy which lacks an equivalent in the first taxonomy. For example, Taxonomy A has the term “Precipitation,” and Taxonomy B has the terms “Snow” and “Rain” but does not have the term “Precipitation.” When you merge taxonomies, you take both terms and establish a hierarchical relationship between them so the merged taxonomy will have Precipitation and two narrower terms of Snow and Rain. As for mapping the taxonomies, you first need to know in which direction you are mapping. If you map from Taxonomy A to Taxonomy B, you cannot map these terms together. You cannot use Snow for Precipitation and you cannot use Rain for Precipitation, because the latter is broader in its meaning. (You could map Precipitation to Weather, if it existed in Taxonomy B, whereas Snow and Rain are left unmapped.) If you map from Taxonomy B to Taxonomy A, then these terms do map. Precipitation can be used for Snow and for Rain, since it includes both in its meaning.

Mapping needs to be precise, because it is relied upon to retrieve pre-indexed content. Merging, on the other hand does not need to be so precise, as a new taxonomy is the result. Therefore, initially mistaking a merging project for a mapping project, as I first did in this case, is not as bad as mistaking a mapping project for a merging project. I don’t think the latter, is likely, though.

Of course, you may have a project that involves merging two taxonomies and then mapping the resulting new taxonomy back to each of the two legacy taxonomies. This is actually not as much work as two consecutive projects. Adding an additional column in a spreadsheet, you can track merging and mapping at the same time. In fact, my current project involves that.