Friday, September 13, 2019

SEMANTiCS conference

I attended the 15th annual SEMANTiCS conference this week for the first time. Semantics means “meaning” in language, and in the context of taxonomies and other controlled vocabularies (knowledge organization systems) semantics is a given. We taxonomists don’t concentrate on the topic of semantics that much, because it’s a basic characteristic of knowledge organization systems, which focus on concepts and their meanings, rather than just words. Tagging/indexing with a taxonomy or other kind of knowledge organization system may even be called “semantic enrichment.” Semantics is not a given, however, in related areas of information technology and data science, but more awareness and interest in how technology and semantics can support each other,  for better utilization of information, is growing, as this conference demonstrates. These may include technologies and standards of the Semantic Web, but uses go beyond the Web to include various internal enterprise applications.

SEMANTiCS Karlsruhe 2019 conference

SEMANTiCS is a European conference that rotates in different cities This year the conference was in Karlsruhe, Germany, for the first time, which turns out to be somewhat of a technology center. Before I went, someone told me to expect European conferences which are not merely spinoffs of American conferences to be different, with perhaps less intermingling, socializing, and networking. That was certainly not the case. I found the attendees, whether German or from other European countries, to be very friendly and open to speaking with and connecting with new colleagues, whether myself or others. So, it was definitely a good networking opportunity.

The SEMANTiCS conference is more in the area of information technology and data science than in fields of content/knowledge management, where we taxonomists tend to be, but, of course, it was not just about technology, but rather about the added “semantic layer.” What I liked is that it brought together taxonomists (I was not the only one) with those who work in technology (software developers, solutions architects, computer scientists, data scientists, etc.). The theme of the conference is knowledge graphs and AI, which have also become themes of the Taxonomy Boot Camp conferences recently. Ontologies, another specialty that bridges the work of taxonomists and computer scientists, were also a focus of this conference. Other topics included machine learning, data governance, and knowledge management.

Heather Hedden presenting at SEMANTiCS conference 2019 in Karlsruhe
Heather Hedden presenting at SEMANTiCS 2019 Karlsruhe
 The SEMANTiCS conference is somewhat unique in how it bridges both industry and academia. It has both industry presentations and academic papers, each with separate conference chairs/review committees, and with academic papers to be published as conference proceedings , yet the presentations were not in separate tracks, and both industry and academic presentations were combined into the same sessions by theme. Session themes included knowledge graphs, natural language processing, semantic information management, knowledge discovery & semantic search, knowledge extraction, data integration, and also thesaurus & ontology management (in which I presented). There were also subject-themed tracks on legal technology and on digital humanities/cultural heritage. In each time slot were five consecutive sessions.

SEMANTiCS is not put on by an event company, but is rather a collaborative effort of several organizations, companies and educational institutions, with some variation, depending on the location. The Semantic Web Company has been a consistent organizer/sponsor. Others this year included FIZ Karlsruhe and several European universities.

By the numbers, the conference had 472 registered attendees and 25 sponsors, of which 15 were also exhibitors. There were 37 industry presentations, 28 academic paper presentations, 5 keynote/plenary presentations, 2 invited talks, 1 panel discussion, 31 posters, and 9 preconference workshops/tutorials. This was the largest SEMANTiCS conference to date.

SEMANTiCS Karlsruhe 2019 conference gala dinner
Attendees gather for the conference gala dinner
Particularly exciting was the announcement that, in additional to next September’s conference in Amsterdam, for the first time SEMANTiCS will come to the United States, scheduled for April 21-23 in Austin, Texas: SEMANTiCS Austin 2020. (Call for proposals due November 8.) Lead organizers are the Semantic Web Company and Enterprise Knowledge. The conference won’t be identical to the European version, as it will not have academic papers, but it promises to be very interesting and informative, and I plan to be there.

Thursday, August 22, 2019

Taxonomy Mapping

As more taxonomies get created, we see a growing need to “map” taxonomies to each other, which is linking between individual terms or concepts in each taxonomy so that the taxonomies may be used in some combination. Mapping is not new, but as it has become more frequent it is now reflected in newer standards and in taxonomy management software features.

Diagram of mapping taxnomies
Mapping taxonomies

Reasons or use cases for mapping include:
  • Selected content with an enterprise taxonomy is made available on a public web site with a different public-facing taxonomy.
  • A provider of scientific/technical/medical content with a technical thesaurus creates a simpler taxonomy aimed at laypeople.
  • Content will be made available in a different language region, and a comparable  taxonomy already exists in the other language.
  • A knowledge graph is built to aggregate data from multiple repositories, each with its own taxonomy.
  • An enterprise search is based on “federated search” and different areas have different search-support thesauri.
  • Terms from search engine logs are mapped to a taxonomy to add alternative labels.
  • Terms from an open source or licensed vocabulary are mapped to a taxonomy to enrich it.

I’ve worked on occasional taxonomy mapping projects since the late 1990s, and I discuss mapping in a section of my book, The Accidental Taxonomist (2nd edition, pp. 369-73) and in an earlier blog post. I’ve also presented in conferences before on mapping taxonomies, as early as 2009, but only briefly and in the wider in the context of related activities of merging taxonomies and creating multilingual taxonomies. My next conference presentation (not including a pre-conference workshop), “Mapping Taxonomies, Thesauri, and Ontologies” (SEMANTiCS 2019 in Karlsruhe, Germany), will be dedicated to subject of mapping.

In talking recently with more people about mapping, both clients and software vendors, I’ve learned that my previous view of mapping was somewhat narrow. I had considered mapping to be only one-way directional from terms in a tagged taxonomy to terms in a retrieval taxonomy. 

Diagram of one-way taxonomy mapping
One-way directional taxonomy mapping
I still think this model applies to the majority of use cases, but mapping has a broader meaning in the standards and in taxonomy management software capabilities.

Standards for Taxonomy Mapping

The SKOS (Simple Knowledge Organization System) W3C standard adopted in 2009 for a controlled vocabulary model and interchangeable format specifies not only the familiar thesaurus relationships of broader, narrower, and related, but what are called mapping relationships comprising exactMatch, closeMatch, broadmatch, narrowMatch, and relatedMatch. How these different mapping relationship types are to be used is really up to the taxonomy owner. The broadMatch and narrowMatch are directional, but reciprocal, so using these permits bidirectional mapping. However, there is no reason why you cannot use just one mapping relationship type if you are mapping in only a single direction. Or you could use just two, such as exactMatch and broadMatch.

The international standard ISO 25964-2 Thesaurus and Interoperability with Other Vocabularies – Part 2: Interoperability with Other Vocabularies (published in 2013) is substantially about mapping. Interoperability is not synonymous with mapping but covers more,  including using a standard format such as SKOS. However, the ISO standard discusses mapping in more detail than any other form of interoperability. The introduction states that “inter-vocabulary mapping will be the principal focus of this part of ISO 25964.” (The slightly older American standard, ANSI/NISO Z.39.19-2005 is comparable with ISO 25964 Part 1, which is all about thesauri, and lacks any explanation of mapping.) While SKOS provides standardized labels, useful for porting and linking vocabularies between different systems and the web, ISO 25964-2 provides guidance on the theory and practice of various types of mappings.

ISO 25964-2 defines mapping broadly as the “process of establishing relationships between the concepts of one vocabulary and those of another.” Like SKOS, it also covers different kinds of mapping relationships, although it describes more types: equivalence, compound equivalence, hierarchical, associative, exact, inexact, and partial equivalence. It also discusses mapping on the high level between pairs or multiple vocabularies and in what kind of direction/arrangement. The standard also includes examples. There is really a lot to consider, and I’ll definitely re-read ISO 25964-2 in detail before embarking on my next mapping project.

Software for Taxonomy Mapping

When I first did taxonomy mapping, Excel files of each vocabulary were compared with either the features of Excel or through scripting. Now, mapping can be also done within taxonomy management software, once both vocabularies are in the software, usually requiring that at least one be imported.

As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.

The software’s mapping feature is agnostic to your intentions and direction of mapping, so it’s important to plan the mapping so that it supports mapping in the direction you want. In addition to terms with equivalent meaning, it is also acceptable to map from a narrower to a broader concept as the narrower is an example of the broader and can be used for it, but the mapping won’t work in the other direction. It is also acceptable to map from a term that is a preferred label to a concept where that term is an alternative label/nonpreferred term, and that mapping also won’t work in the other direction.

If planning your mapping project seems daunting, the software vendors, PoolParty, Smartlogic, Synaptica, and Access Innovations (vendor of Data Harmony Thesaurus Master) will provide assistance or the full service of mapping. In fact, Access Innovations has not included an auto-mapping feature in DH Thesaurus Master, because customized results may be better achieved through offline mapping.

Mapping is not just between taxonomies, but can be between taxonomies and thesauri, thesauri and ontologies, or other controlled vocabularies, something else that ISO 25964-2 covers. If you need assistance with mapping, I'd be happy to help.

Friday, July 19, 2019

Onsite Corporate Taxonomy Training

I enjoy teaching about taxonomies. The feedback I get from my students or workshop participants helps me improve my methods of communication, teaching, and consulting, and I learn about the varied implementations of taxonomies. The courses evolve and improve over time.  I teach online courses, conference workshops, and corporate onsite workshops. I’ve been making enhancements to the latter offering and this week led a  two-day onsite workshop at a major company on the West Coast.
Heather Hedden leading an onsite corporate training workshop in taxonomy design and creation.

Accommodating a varied audience


The participants in my “introductory” workshops, whether at conferences or at their corporate offices, have varied knowledge and experience with taxonomies. Some are complete beginners and are curious to learn about taxonomies and what they can do. Others have been tasked to build a taxonomy with little instruction and are looking for best practices and guidelines. Some of have read my book but have not had the opportunity to put what they have read into practice, so the workshop’s exercises are very helpful. Finally, some participants are experienced taxonomists seeking to fill in the gaps in their knowledge.

The absolute beginners may feel overwhelmed at the amount of information on taxonomies presented in one of my workshops, but I feel it’s important to provide enough instruction to enable people to actually create basic taxonomies (while ideally still getting feedback from someone more experienced). Also, I expect people to combine instruction from my workshop with other methods of learning taxonomies, such as reading my book, taking my online course, attending conference session on taxonomies, or getting advice from a taxonomist in their organization. While I would like to offer a more advanced workshops, it’s difficult to find enough experienced practicing taxonomists at the same location. (At a conference is possible, but sometimes conference organizers equate advanced taxonomy topics with ontologies.)

Interactive exercises

Taxonomy workshop participants doing a card-sorting exercise
Workshop participants doing a card-sorting exercise

Participants like interactive or hands-on exercises. One of the learning benefits of my onsite workshops is that they include interactive exercises that involve the entire group or class. My online course includes exercises or assignment to learn from the practice and from feedback I provide, but only the onsite workshops offer the opportunity to work on assignments with others and thus learn from others. Creating taxonomies, like designing websites or software user interfaces, needs to consider different views and is somewhat subjective. The classroom setting offers the opportunity to learn from others. 

Small-group exercises are the best for this kind of learning. My full-length workshops include small-group exercises for designing a set of facets and for doing a card-sorting exercise to categorize topics. Groups may comprise from three to six participants, depending on the total number. In addition to hearing ideas from their group members, participants then share the resulting taxonomy outline to the larger class, and I provide comments. Even exercises that do not involve small groups, but are assignments to consider and shout out answers, are beneficial, because we obtain, discuss, and evaluate various answers beyond the answers that any one individual might consider.

Remote participation is also possible, especially if the remote participants are co-located in the same office. They can form their own small group for the small group exercises, and they can do the card-sorting exercise online. This was the case in my latest corporate workshop.

Customizing corporate workshops


Heather Hedden leading a corporate onsite trainging workshop in taxonomy design and creation
To what extent I should customize the workshops for a specific organization was a question when I first offered corporate workshops. It’s not necessary, nor worth the time, to customize every example of taxonomy terms in the workshop presentation with something from the client’s domain of content. Rather, I found that it is sufficient yet instructive to customize just a few slides, such as those with examples of content types and use cases.

Another way I customize the workshops is by the outline and topics included. While all workshops include the basics (taxonomy types, definitions, uses and benefits, standards, structural design, best practices for creating terms and relationships, and governance), optional topics include: user interface display options, metadata and taxonomies, testing taxonomies, tagging, mapping taxonomies, multilingual taxonomies, integration with search, and taxonomy management software.

Finally, I customize the group exercises so that the choices for topics for facets would be applicable, and the card-sorting exercise may take an actual example especially if the client has a public taxonomy I can use as a basis for the exercise. I also include discussion questions, so that the participants can share and discuss the taxonomy issues as pertinent to their organization. In any case, I sign an NDA, so the client can comfortably share information with me which I may sue in the workshop.

Continuous improvement 


I found that asking the client for some input on possible customization, I can also generalize the issues to enhance the workshop presentation for future use. In other words, the client input on “customization” is not always that, but rather leads to a general improvement. The result has been to make the workshop presentation based more on real-world scenarios and less theoretical than my previous conference presentations. I actually did not consider my conference presentations to be that theoretical in the first place (since, after all, my knowledge of taxonomies is based on my work experience, not on studies for a degree in library/information science). But now I have made the workshops even more practical. 

Input from the client can also lead to topics for clarification, such as differing use of terminology. For example, a client wanted me to discuss taxonomy “mapping,” which we taxonomists understand to mean the creation of equivalence links between terms in one taxonomy and another, so that one taxonomy may be used to retrieve content that was tagged in the other taxonomy. However, what my client meant by “mapping” was a kind of “see also” related-term relationships between terms in two different taxonomies. Now I know to clarify and discuss both kinds of links between taxonomies.

Just as I am an accidental taxonomist and then an accidental consultant, so am I now also an accidental trainer. Details of my corporate training offerings are on my website