Thursday, August 22, 2019

Taxonomy Mapping


As more taxonomies get created, we see a growing need to “map” taxonomies to each other, which is linking between individual terms or concepts in each taxonomy so that the taxonomies may be used in some combination. Mapping is not new, but as it has become more frequent it is now reflected in newer standards and in taxonomy management software features.

Diagram of mapping taxonomies
Mapping taxonomies

Reasons or use cases for mapping include:
  • Selected content with an enterprise taxonomy is made available on a public web site with a different public-facing taxonomy.
  • A provider of scientific/technical/medical content with a technical thesaurus creates a simpler taxonomy aimed at laypeople.
  • Content will be made available in a different language region, and a comparable  taxonomy already exists in the other language.
  • A knowledge graph is built to aggregate data from multiple repositories, each with its own taxonomy.
  • An enterprise search is based on “federated search” and different areas have different search-support thesauri.
  • Terms from search engine logs are mapped to a taxonomy to add alternative labels.
  • Terms from an open source or licensed vocabulary are mapped to a taxonomy to enrich it.

I’ve worked on occasional taxonomy mapping projects since the late 1990s, and I discuss mapping in a section of my book, The Accidental Taxonomist (2nd edition, pp. 369-73) and in an earlier blog post. I’ve also presented in conferences before on mapping taxonomies, as early as 2009, but only briefly and in the wider in the context of related activities of merging taxonomies and creating multilingual taxonomies. My next conference presentation (not including a pre-conference workshop), “Mapping Taxonomies, Thesauri, and Ontologies” (SEMANTiCS 2019 in Karlsruhe, Germany), will be dedicated to subject of mapping.

In talking recently with more people about mapping, both clients and software vendors, I’ve learned that my previous view of mapping was somewhat narrow. I had considered mapping to be only one-way directional from terms in a tagged taxonomy to terms in a retrieval taxonomy. 

Diagram of one-way taxonomy mapping
One-way directional taxonomy mapping
I still think this model applies to the majority of use cases, but mapping has a broader meaning in the standards and in taxonomy management software capabilities.

Standards for Taxonomy Mapping


The SKOS (Simple Knowledge Organization System) W3C standard adopted in 2009 for a controlled vocabulary model and interchangeable format specifies not only the familiar thesaurus relationships of broader, narrower, and related, but what are called mapping relationships comprising exactMatch, closeMatch, broadmatch, narrowMatch, and relatedMatch. How these different mapping relationship types are to be used is really up to the taxonomy owner. The broadMatch and narrowMatch are directional, but reciprocal, so using these permits bidirectional mapping. However, there is no reason why you cannot use just one mapping relationship type if you are mapping in only a single direction. Or you could use just two, such as exactMatch and broadMatch.

The international standard ISO 25964-2 Thesaurus and Interoperability with Other Vocabularies – Part 2: Interoperability with Other Vocabularies (published in 2013) is substantially about mapping. Interoperability is not synonymous with mapping but covers more,  including using a standard format such as SKOS. However, the ISO standard discusses mapping in more detail than any other form of interoperability. The introduction states that “inter-vocabulary mapping will be the principal focus of this part of ISO 25964.” (The slightly older American standard, ANSI/NISO Z.39.19-2005 is comparable with ISO 25964 Part 1, which is all about thesauri, and lacks any explanation of mapping.) While SKOS provides standardized labels, useful for porting and linking vocabularies between different systems and the web, ISO 25964-2 provides guidance on the theory and practice of various types of mappings.

ISO 25964-2 defines mapping broadly as the “process of establishing relationships between the concepts of one vocabulary and those of another.” Like SKOS, it also covers different kinds of mapping relationships, although it describes more types: equivalence, compound equivalence, hierarchical, associative, exact, inexact, and partial equivalence. It also discusses mapping on the high level between pairs or multiple vocabularies and in what kind of direction/arrangement. The standard also includes examples. There is really a lot to consider, and I’ll definitely re-read ISO 25964-2 in detail before embarking on my next mapping project.

Software for Taxonomy Mapping


When I first did taxonomy mapping, Excel files of each vocabulary were compared with either the features of Excel or through scripting. Now, mapping can be also done within taxonomy management software, once both vocabularies are in the software, usually requiring that at least one be imported.

As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.

The software’s mapping feature is agnostic to your intentions and direction of mapping, so it’s important to plan the mapping so that it supports mapping in the direction you want. In addition to terms with equivalent meaning, it is also acceptable to map from a narrower to a broader concept as the narrower is an example of the broader and can be used for it, but the mapping won’t work in the other direction. It is also acceptable to map from a term that is a preferred label to a concept where that term is an alternative label/nonpreferred term, and that mapping also won’t work in the other direction.

If planning your mapping project seems daunting, the software vendors, PoolParty, Smartlogic, Synaptica, and Access Innovations (vendor of Data Harmony Thesaurus Master) will provide assistance or the full service of mapping. In fact, Access Innovations has not included an auto-mapping feature in DH Thesaurus Master, because customized results may be better achieved through offline mapping.

Mapping is not just between taxonomies, but can be between taxonomies and thesauri, thesauri and ontologies, or other controlled vocabularies, something else that ISO 25964-2 covers. If you need assistance with mapping, I'd be happy to help.

6 comments:

  1. The ISO standard referenced costs $200 to access.

    ReplyDelete
    Replies
    1. ... unfortunately - but a large part of the documentation is freely available on the NISO website https://www.niso.org/schemas/iso25964

      Delete
  2. Yes, unfortunately it is expensive. I had gotten a free review copy when I wrote a review article (http://www.hedden-information.com/wp-content/uploads/2019/07/IS0_25964-2_Review_2013.pdf). Thus, instead of 25964-part 1, I recommend obtaining the free ANSI/NISO Z.39.19 instead. But for 25964-part 2 on thesaurus/taxonomy mapping, there is no alternative. Actually $200 is not that much in comparison to the costs of a mapping project.

    ReplyDelete
  3. Heather, thank you. It is a lot of money if you just want to see it :)

    ReplyDelete
  4. There is an Open Source web application designed for taxonomy mapping: https://coli-conc.gbv.de/cocoda/ just file a github issue if you miss a feature!

    ReplyDelete
  5. A PDF of the slides of my conference presentation “Mapping Taxonomies, Thesauri, and Ontologies” presented September 11, 2019 at the SEMANTiCS conference in Karlsruhe, Germany is now on my website page of past presentations.
    https://www.hedden-information.com/presentations/

    ReplyDelete