As more taxonomies get created, we see a growing need to
“map” taxonomies to each other, which is linking between individual terms or concepts
in each taxonomy so that the taxonomies may be used in some combination.
Mapping is not new, but as it has become more frequent it is now reflected in
newer standards and in taxonomy management software features.
Mapping taxonomies |
Reasons or use cases for mapping include:
- Selected content with an enterprise taxonomy is made available on a public web site with a different public-facing taxonomy.
- A provider of scientific/technical/medical content with a technical thesaurus creates a simpler taxonomy aimed at laypeople.
- Content will be made available in a different language region, and a comparable taxonomy already exists in the other language.
- A knowledge graph is built to aggregate data from multiple repositories, each with its own taxonomy.
- An enterprise search is based on “federated search” and different areas have different search-support thesauri.
- Terms from search engine logs are mapped to a taxonomy to add alternative labels.
- Terms from an open source or licensed vocabulary are mapped to a taxonomy to enrich it.
I’ve worked on occasional taxonomy mapping projects since
the late 1990s, and I discuss mapping in a section of my book, The Accidental
Taxonomist (2nd edition, pp. 369-73) and in an earlier blog post. I’ve also presented in conferences
before on mapping taxonomies, as early as 2009, but only briefly and in the wider
in the context of related activities of merging taxonomies and creating
multilingual taxonomies. My next conference presentation (not including a
pre-conference workshop), “Mapping Taxonomies, Thesauri, and Ontologies”
(SEMANTiCS 2019 in Karlsruhe, Germany), will be dedicated to subject of
mapping.
In talking recently with more people about mapping, both
clients and software vendors, I’ve learned that my previous view of mapping was
somewhat narrow. I had considered mapping to be only one-way directional from
terms in a tagged taxonomy to terms in a retrieval taxonomy.
One-way directional taxonomy mapping |
I still think this
model applies to the majority of use cases, but mapping has a broader meaning
in the standards and in taxonomy management software capabilities.
Standards for Taxonomy Mapping
The SKOS (Simple Knowledge Organization System) W3C standard
adopted in 2009 for a controlled vocabulary model and interchangeable format
specifies not only the familiar thesaurus relationships of broader, narrower,
and related, but what are called mapping relationships comprising exactMatch,
closeMatch, broadmatch, narrowMatch, and relatedMatch. How these different
mapping relationship types are to be used is really up to the taxonomy owner. The
broadMatch and narrowMatch are directional, but reciprocal, so using these
permits bidirectional mapping. However, there is no reason why you cannot use
just one mapping relationship type if you are mapping in only a single
direction. Or you could use just two, such as exactMatch and broadMatch.
The international standard ISO 25964-2 Thesaurus and Interoperability with Other Vocabularies – Part 2: Interoperability with Other Vocabularies (published
in 2013) is substantially about mapping. Interoperability is not synonymous
with mapping but covers more, including
using a standard format such as SKOS. However, the ISO standard discusses
mapping in more detail than any other form of interoperability. The introduction
states that “inter-vocabulary mapping will be the principal focus of this part
of ISO 25964.” (The slightly older American standard, ANSI/NISO Z.39.19-2005 is
comparable with ISO 25964 Part 1, which is all about thesauri, and lacks any
explanation of mapping.) While SKOS provides standardized labels, useful for
porting and linking vocabularies between different systems and the web, ISO
25964-2 provides guidance on the theory and practice of various types of
mappings.
ISO 25964-2 defines mapping broadly as the “process of
establishing relationships between the concepts of one vocabulary and those of
another.” Like SKOS, it also covers different kinds of mapping relationships,
although it describes more types: equivalence, compound equivalence,
hierarchical, associative, exact, inexact, and partial equivalence. It also
discusses mapping on the high level between pairs or multiple vocabularies and in
what kind of direction/arrangement. The standard also includes examples. There is
really a lot to consider, and I’ll definitely re-read ISO 25964-2 in detail
before embarking on my next mapping project.
Software for Taxonomy Mapping
When I first did taxonomy mapping,
Excel files of each vocabulary were compared with either the features of Excel
or through scripting. Now, mapping can be also done within taxonomy management software,
once both vocabularies are in the software, usually requiring that at least one
be imported.
As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.
As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.
The software’s mapping feature is agnostic to your intentions
and direction of mapping, so it’s important to plan the mapping so that it
supports mapping in the direction you want. In addition to terms with equivalent
meaning, it is also acceptable to map from a narrower to a broader concept as
the narrower is an example of the broader and can be used for it, but the
mapping won’t work in the other direction. It is also acceptable to map from a
term that is a preferred label to a concept where that term is an alternative
label/nonpreferred term, and that mapping also won’t work in the other
direction.
If planning your mapping project seems daunting, the software
vendors, PoolParty, Smartlogic, Synaptica, and Access Innovations (vendor of
Data Harmony Thesaurus Master) will provide assistance or the full service of
mapping. In fact, Access Innovations has not included an auto-mapping feature in DH Thesaurus Master, because customized results
may be better achieved through offline mapping.
Mapping is not just between taxonomies, but can be between
taxonomies and thesauri, thesauri and ontologies, or other controlled vocabularies,
something else that ISO 25964-2 covers. If you need assistance with mapping, I'd be happy to help.
The ISO standard referenced costs $200 to access.
ReplyDelete... unfortunately - but a large part of the documentation is freely available on the NISO website https://www.niso.org/schemas/iso25964
DeleteYes, unfortunately it is expensive. I had gotten a free review copy when I wrote a review article (http://www.hedden-information.com/wp-content/uploads/2019/07/IS0_25964-2_Review_2013.pdf). Thus, instead of 25964-part 1, I recommend obtaining the free ANSI/NISO Z.39.19 instead. But for 25964-part 2 on thesaurus/taxonomy mapping, there is no alternative. Actually $200 is not that much in comparison to the costs of a mapping project.
ReplyDeleteHeather, thank you. It is a lot of money if you just want to see it :)
ReplyDeleteThere is an Open Source web application designed for taxonomy mapping: https://coli-conc.gbv.de/cocoda/ just file a github issue if you miss a feature!
ReplyDeleteA PDF of the slides of my conference presentation “Mapping Taxonomies, Thesauri, and Ontologies” presented September 11, 2019 at the SEMANTiCS conference in Karlsruhe, Germany is now on my website page of past presentations.
ReplyDeletehttps://www.hedden-information.com/presentations/