Showing posts with label Taxonomy software. Show all posts
Showing posts with label Taxonomy software. Show all posts

Thursday, August 22, 2019

Taxonomy Mapping


As more taxonomies get created, we see a growing need to “map” taxonomies to each other, which is linking between individual terms or concepts in each taxonomy so that the taxonomies may be used in some combination. Mapping is not new, but as it has become more frequent it is now reflected in newer standards and in taxonomy management software features.

Diagram of mapping taxonomies
Mapping taxonomies

Reasons or use cases for mapping include:
  • Selected content with an enterprise taxonomy is made available on a public web site with a different public-facing taxonomy.
  • A provider of scientific/technical/medical content with a technical thesaurus creates a simpler taxonomy aimed at laypeople.
  • Content will be made available in a different language region, and a comparable  taxonomy already exists in the other language.
  • A knowledge graph is built to aggregate data from multiple repositories, each with its own taxonomy.
  • An enterprise search is based on “federated search” and different areas have different search-support thesauri.
  • Terms from search engine logs are mapped to a taxonomy to add alternative labels.
  • Terms from an open source or licensed vocabulary are mapped to a taxonomy to enrich it.

I’ve worked on occasional taxonomy mapping projects since the late 1990s, and I discuss mapping in a section of my book, The Accidental Taxonomist (2nd edition, pp. 369-73) and in an earlier blog post. I’ve also presented in conferences before on mapping taxonomies, as early as 2009, but only briefly and in the wider in the context of related activities of merging taxonomies and creating multilingual taxonomies. My next conference presentation (not including a pre-conference workshop), “Mapping Taxonomies, Thesauri, and Ontologies” (SEMANTiCS 2019 in Karlsruhe, Germany), will be dedicated to subject of mapping.

In talking recently with more people about mapping, both clients and software vendors, I’ve learned that my previous view of mapping was somewhat narrow. I had considered mapping to be only one-way directional from terms in a tagged taxonomy to terms in a retrieval taxonomy. 

Diagram of one-way taxonomy mapping
One-way directional taxonomy mapping
I still think this model applies to the majority of use cases, but mapping has a broader meaning in the standards and in taxonomy management software capabilities.

Standards for Taxonomy Mapping


The SKOS (Simple Knowledge Organization System) W3C standard adopted in 2009 for a controlled vocabulary model and interchangeable format specifies not only the familiar thesaurus relationships of broader, narrower, and related, but what are called mapping relationships comprising exactMatch, closeMatch, broadmatch, narrowMatch, and relatedMatch. How these different mapping relationship types are to be used is really up to the taxonomy owner. The broadMatch and narrowMatch are directional, but reciprocal, so using these permits bidirectional mapping. However, there is no reason why you cannot use just one mapping relationship type if you are mapping in only a single direction. Or you could use just two, such as exactMatch and broadMatch.

The international standard ISO 25964-2 Thesaurus and Interoperability with Other Vocabularies – Part 2: Interoperability with Other Vocabularies (published in 2013) is substantially about mapping. Interoperability is not synonymous with mapping but covers more,  including using a standard format such as SKOS. However, the ISO standard discusses mapping in more detail than any other form of interoperability. The introduction states that “inter-vocabulary mapping will be the principal focus of this part of ISO 25964.” (The slightly older American standard, ANSI/NISO Z.39.19-2005 is comparable with ISO 25964 Part 1, which is all about thesauri, and lacks any explanation of mapping.) While SKOS provides standardized labels, useful for porting and linking vocabularies between different systems and the web, ISO 25964-2 provides guidance on the theory and practice of various types of mappings.

ISO 25964-2 defines mapping broadly as the “process of establishing relationships between the concepts of one vocabulary and those of another.” Like SKOS, it also covers different kinds of mapping relationships, although it describes more types: equivalence, compound equivalence, hierarchical, associative, exact, inexact, and partial equivalence. It also discusses mapping on the high level between pairs or multiple vocabularies and in what kind of direction/arrangement. The standard also includes examples. There is really a lot to consider, and I’ll definitely re-read ISO 25964-2 in detail before embarking on my next mapping project.

Software for Taxonomy Mapping


When I first did taxonomy mapping, Excel files of each vocabulary were compared with either the features of Excel or through scripting. Now, mapping can be also done within taxonomy management software, once both vocabularies are in the software, usually requiring that at least one be imported.

As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.

The software’s mapping feature is agnostic to your intentions and direction of mapping, so it’s important to plan the mapping so that it supports mapping in the direction you want. In addition to terms with equivalent meaning, it is also acceptable to map from a narrower to a broader concept as the narrower is an example of the broader and can be used for it, but the mapping won’t work in the other direction. It is also acceptable to map from a term that is a preferred label to a concept where that term is an alternative label/nonpreferred term, and that mapping also won’t work in the other direction.

If planning your mapping project seems daunting, the software vendors, PoolParty, Smartlogic, Synaptica, and Access Innovations (vendor of Data Harmony Thesaurus Master) will provide assistance or the full service of mapping. In fact, Access Innovations has not included an auto-mapping feature in DH Thesaurus Master, because customized results may be better achieved through offline mapping.

Mapping is not just between taxonomies, but can be between taxonomies and thesauri, thesauri and ontologies, or other controlled vocabularies, something else that ISO 25964-2 covers. If you need assistance with mapping, I'd be happy to help.

Tuesday, April 30, 2019

Taxonomy Software Trends: Convergence and Visualizations


I recently looked more closely into current offerings of taxonomy software to prepare for an upcoming presentation at the SLA conference in Cleveland in June: “Taxonomy Tools and Tool Evaluation.” I will speak about the tools, and my co-presenter, Marti Heyman, will speak about how to evaluate them. I had last contacted various software vendors in 2015 when I was writing the second edition for my book, The Accidental Taxonomist. I had previously blogged on Taxonomy Software Trends in January 2015 and observed that, since researching software for my first edition in 2009, there is more cloud/web-based software, more SKOS/RDF/Semantic web framework software, and more plugins to SharePoint, content management systems, and search engines. Those trends continue. Now that I look into taxonomy software again, the additional trends I see are taxonomy, thesaurus, and ontology tool convergence and graphical vocabulary visualization.

Taxonomy, thesaurus, and ontology software convergence


Originally there was thesaurus management software (also used for any taxonomies), such as MultiTes, Data Harmony Thesaurus Manager, Synaptica KMS, and other products that no longer exist;  and ontology management software, such as TopBraid Composer, Protégé, ad others. The two kinds of software were very distinct, from different vendors, based on completely different standards and models, with different features, used by different users, for different purposes.

Now, we don’t hear as much about “thesaurus software” as before, but rather vocabulary/taxonomy/knowledge organization system (KOS)/ontology software, where the same software tool supports thesaurus standards (ANSI/NISO Z39.19 or ISO 25964) and ontology standards (OWL and RDF), and especially the SKOS (Simple Knowledge Organization System) model for any kind of controlled vocabulary. This makes sense, because an organization often has needs for more than one kind of controlled vocabulary. Newer software offerings have combined taxonomy, thesaurus, and  ontology software into one. These include Smartlogic Semaphore, PoolParty, Synaptica Graphite, TopBraid Enterprise Data Governance’s Vocabulary Manager, Mondeca Intelligent Topic Manager, and VocBench

Visualizations of taxonomies, thesauri, and ontologies


Interactive visualization charts/graphs of taxonomies (what I shall call all controlled vocabularies here) are not something I had paid much attention to, because the feature is not considered so important by a professional taxonomist for creating taxonomies. However, while taxonomists are the primary users of taxonomy management software, other stakeholders in taxonomies are important secondary users. These people include content managers, content strategists, project managers, knowledge managers, information product managers, user interface/experience designers, and subject matter experts. Rather than creating taxonomies, these various stakeholders need to view draft taxonomies and provide feedback on them. Viewing the taxonomy in the user interface used by the taxonomist is often not practical or intuitive. However, viewing the taxonomy as the end-user will see view it may not be possible, because the taxonomy has not yet been implemented into its final system or product. Therefore, a taxonomy visualization feature of taxonomy management software can be quite useful for stakeholder review and input.

Visualizations are especially useful for ontologies with their semantic relationships, but they are also helpful for taxonomies and thesauri. With the convergence of taxonomy, thesaurus, and ontology-creation capabilities in the same software, vocabulary visualization has become a more common feature. However, they are not the same in all vocabulary management software products. Following are some varied examples of visualizations. In many cases, they are interactive, whereby the user can drag and reposition the nodes.

Data Harmony Thesaurus Master offers a “sunburst” visualization for hierarchical taxonomies, as an alternative to the inverted tree display, which is available in the editing interface of the software.

Taxonomy visualization from Data Harmony Thesaurus Master
Visualization from Data Harmony Thesaurus Master

Synaptica KMS has a node and link relationship display for taxonomies and thesauri, where relationships do not need to be defined. Synaptica Graphite will have a new directed-graph visualizer feature added later this year.

Thesaurus visualization from Synaptica KMS
Visualization from Synaptica KMS


Semaphore, Mondeca, and TopBraid EDG Vocabulary Management each have a node and link relationship display for ontologies that additionally describes the types of relationships.



Ontology visualization from Smartlogic Semaphore
Visualization from Semaphore



Ontology visualization from Mondeca ITM
Visualization from Mondeca ITM




Visualization from TopBraid EDG Vocabulary Management
Visualization from TopBraid EDG Vocabulary Management



PoolParty offers a different type of visualization, focusing on the relationships of a selected concept, with each type color-coded. 
Visualization of a taxonomy concept from PoolParty
Visualization from PoolParty



In combination with other graph database tools, both Syaptica Graphite and PoolParty can support interactive nonhierarchical visualizations and graph analytics. This brings us to our next topic, knowledge graphs, which I will dicuss in my next blog post.

Monday, February 29, 2016

Free Taxonomy Management Software

There is always an interest in free taxonomy or thesaurus management software. Many people who create taxonomies try to save money on purchasing taxonomy management software by simply not using any taxonomy management software but something else they already have, such as Excel. Those who are developing either very large taxonomies or more complex thesauri, however, realize that a dedicated taxonomy/thesaurus management system will save a lot of time and headache in the long term.

Various free thesaurus management software offerings have been available since the early 1990s. They tend to have their origins in academic projects in computer science, information science, or library science at universities, and others have been government projects. Some free software of the previous decade is no longer available, though. Discontinued software is still listed for posterity on the web directory of "Software for building and editing thesauri," started by Leonard Will and now managed on the Taxobank website. For example, two free software products listed were for MS-DOS and one no later than Windows 3.1.

The first free thesaurus software I was familiar with was TheW, a simple thesaurus management software developed by Tim Craven a professor of information science at the University of Western Ontario, since retired. I actually ran across it, because I was at the time exploring another software program of Prof. Craven’s for creating website indexes. TheW32, which is available for Windows XP, Vista, and 8 and for Java, is no longer maintained. It was last updated for Windows in in 2007 and for Java in 2009. At this point, I would no longer recommend it.

Protégé Ontology Editor is an established free and open-source ontology editor from Stanford University. It is quite robust, has an active user community and support groups, and continues to be upgraded (with version 5.0.0 recently released in beta). The issue with Protégé is that it is a native ontology management tool, not a thesaurus management program (or even ontology “lite” as some thesaurus management software can manage semantic relationships and classes). Thus, it takes a very different approach to modeling and building vocabularies, which is not intuitive to taxonomists, such as myself, and, although I downloaded it, I never found it worth the difficulty to learn. If you can truly consider yourself an ontologist, though, then great, this might just be the solution for you.

I had explored some other free software offerings when writing my book, The Accidental Taxonomist, six years ago and came across TemTres and ThManager. At the time I did not find them adequately enforcing valid relationships between terms, so I was somewhat dismissive about the software. Recently I revisited these products.

TemaTres, which has its origins in the Library and the University of Buenos Aires, Argentina still does allow creating duplicate terms, which was my initial cause for concern, but since then the user interface of the latest version (2.1) offers a new configuration option for quality policies, to enable or disallow duplicate terms. Thus, TemaTres is a suitable free thesaurus software product if used by a knowledgeable and experienced taxonomist who knows to set the options and understands the alerts. TemaTres is being supported, and its latest version was just this winter, 2016. The software is web-based, which means that it requires a PHP, MySQL, and HTTP web server, so it may not be the configuration that any independent taxonomist would set up and install in a small/home office. Otherwise, TemaTres is worth looking into.

ThManager is from the University of Zaragoza and GeoSpatiumLab S.L., both in Zaragoza, Spain. ThManager supports the SKOS standard rather than ANSI/NISO Z39.19 or ISO 25964, which means it does not by default enforce all rules of the latter standards. But I have since found this to be a trend of new vocabulary management software: compliance with SKOS and support for ANSI/NISO Z39.19 or ISO 25964, as configurable rather than by default. Thus, I am no longer complaining if it does not support ANSI/NISO Z39.19 by default. The main problem with ThManager, though, is that it is not kept so well up to date. It was last significantly updated in 2006. The installation for even Windows 7 requires a “portable” version due to an installation bug.

More recently I discovered another free thesaurus management software, VocBench. It was developed originally for the management the AGROVOC thesaurus of the Food and Agriculture Organization (FAO) of the United Nations as a joint project of FAO, which is based in Rome, Italy, and the Artificial Intelligence Research group at the University of Rome Tor Vergata. VocBench, like TemaTres, is SKOS-compliant, rather than ANSI/NISO Z39.19 compliant. VocBench is web based, with web server requirements of Apache Tomcat, MySQL, and OWLIM installed on a Sesame2 server.

In addition to being free, these applications tend to have the advantage of being able to run on multiple platforms and yet can be installed and used by single user. The editing features may be a little less standard and thus less intuitive, and documentation and support tends to be less than commercial software. Yet, they are worth considering for long-term experimentation (with no time limit as in commercial demo software), for use in nonprofit or low-budget projects, or by anyone with a strong interest in working with open source software.

Saturday, January 31, 2015

Taxonomy Software Trends



I reviewed various taxonomy/thesaurus management software offerings recently, in preparation for the last of my 3-part webinar series, Practical Taxonomy Creation, and I noticed some trends since I last looked into software in such detail for my book over 5 years ago: more cloud/web-based software, more SKOS/RDF/Semantic web framework software, and more plugins to SharePoint, content management systems, and search engines.

The number of commercial vendors selling taxonomy/thesaurus management software is not significantly different, as some have left the market, and others have entered, and the rest have continued with updates and improvements. There are fewer commercial low-end, inexpensive, single-user desktop offerings, however. Products I have reviewed in the past and that have gone away include Webchoir TCS-10 Personal and Term Tree 2000. The Mac OS program Cognatrix has been unavailable for the past year, although the vendor intends to release it again as an Apple App Store program following the release of the next major version of Mac OS.

Subscription, web-based software


Synaptica pioneered web-based thesaurus management software when it introduced its product in 1995, when the Web was still young, but now other vendors also offer web-based subscription software. Data Harmony Thesaurus Master from Access Innovations was originally only available in a java-based multi-platform client-server installation. For the past several years a web version has also been available, and Access Innovations president Marjorie Hlava said in an email: “Increasingly our customers use the cloud version of the software.” Newer thesaurus management software products to the market have also been solely cloud-based. These include PoolParty, introduced by the Semantic Web Company in 2009, and TopBraid Enterprise Vocabulary Net (EVN), released by TopQuadrant in 2010. Meanwhile Synaptica began offering Synpatica Express, a cloud-computing solution for individuals and smaller businesses. Finally, the long affordable mainstay MultiTes Pro, a Windows-based desktop program that that has been available since 1983 in a single user version and then also for multiple users, introduced a multi-user cloud version about six years ago, which in 2013 was updated and renamed as MultiTes Online.

The cloud-based software offerings are, of course, priced on annual (and in one case, monthly) subscription fees, instead one-time license costs with lesser priced updates. Hopefully this means that more organizations will try out developing a taxonomy in the appropriate tool with the reduced commitment of cost for a shorter time.

SKOS/RDF/OWL Semantic web framework software


Supporting linked data and interoperability with Semantic Web content has become more important. Therefore, World Wide Web consortium (W3C) recommendations, such as the SKOS (Simple Knowledge Organization System) framework, RDF (Resource Description Framework) specifications, and OWL (Web Ontology Language) are being adopted by newer thesaurus/taxonomy software. The newer products, PoolParty and TopBraid EVN are both built around SKOS models. Synaptica and Data Harmony Thesaurus Master have been able to export to a SKOS and OWL schema for a long time, but it was only in 2013 that Data Harmony added user-defined fields to the SKOS export to include all fields in a term record. Additionally, in 2011 Synaptica introduced an Ontology Publishing Suite to publish an ontology or part of an ontology to the Web.

My first criterion for thesaurus management software is that is that in enforces relationship rules in accordance with the ANSI/NISO Z39.19-2005 standards. SKOS is not an alternative standard, but rather a framework that can be followed in addition to ANSI/NISO Z39.19-2005. Ideally a software product complies with both, and some now do.

Plug-ins and connectors for search and content management


The most common software for internal content management (even though it is not really a content management system) is SharePoint. Prior to 2010, SharePoint handled controlled vocabulary metadata in such a simple way (not even in hierarchies) that there was no point in trying to use taxonomies.  Starting with SharePoint 2010, with its Managed Metadata Services, taxonomies can now be utilized in its Term Store. However, despite Term Store improvements from SharePoint 2010 to 2013, it is still far from having the features and capabilities of a dedicated thesaurus management software product. Thus, ideally you create the taxonomy in the dedicated tool and port it over to SharePoint, and now almost all enterprise-level thesaurus management software products have methods to connect to SharePoint, whether through APIs, plug-ins, or dedicate “connector” modules.

There are also increasing numbers of content management systems and search software products being supported by thesaurus management connections. For example, SmartLogic Semaphore Ontology Manager has integrations with a greater number of applications than in the past, including SharePoint, Google Search Appliance, Apache Solr, OpenText, MarkLogic, and IBM Watson. PoolParty has a WordPress plugin, in addition to integrations with SharePoint and Drupal. Surely more such connections will be added, as I have recently heard of requests for taxonomy imports into Drupal.