Sunday, November 17, 2019

Taxonomy Boot Camp Conferences 2019

Taxonomies may seem like a very niche specialization, but interest keeps growing, as indicated by participation in the conferences dedicated to taxonomies, Taxonomy Boot Camp in Washington, DC (TBC) and Taxonomy Boot Camp London (TBCL). TBC, now in its 14th year, was held November 4 and 5, and TBCL, now in its 4th year, was held October 15 and 16. Interest in taxonomies is clearly growing, as new people continue to attend the conferences. By a show of hands, a large majority, perhaps 75%, of the attendees of TBC were there for the first time, and more than half of the attendees of TBCL were also first-timers. TBCL also increased the number of its preconference workshops to four this year. While I didn't get official numbers, overall attendance also seems to be rising.

Taxonomy Boot Camp London sessions


TBCL’s theme, "Anything is possible," while not exactly a unifying theme, emphasized the diversity of applications of taxonomies. Sessions which may be considered related to this theme included those on knowledge graphs, search, blockchain, automatic tagging, taxonomy interoperability, and machine learning. Case study presentations included BBC content tagging, maintaining large complex taxonomies at CAB International and SAGE Publishing, healthcare taxonomies of Elsevier and NHS Digital, and public sector taxonomies. Practical sessions from experienced taxonomists included presentations on taxonomy software selection, taxonomies in SharePoint, validating a taxonomy with stakeholders, and selling the value of taxonomies.

TBCL sessions this year that I found particularly interesting included Maura Moran's on how to sell your organization on the value of taxonomy, get agreement, and start organizing  information silos. I found her advice on working with stakeholders relevant to my work. Patrick Lambe's presentation on capabilities that taxonomists need to a quite was also good. Agnes Molnar gave an informative presentation "Extending SharePoint Taxonomy," which explained a method, with third-party tools and technology, to overcome the various deficits SharePoint has in supporting robust taxonomy features.

TBCL had taxonomy-related talks for the keynotes on both mornings. Tuesday's keynote by Emma Chittendon dealt with the topic of term labels, and Wednesday's keynote by Nick Poole dealt with the ethics of structured information.

Taxonomy Boot Camp (DC) sessions


Three weeks later, TBC's theme was "Building Strong Foundations," which us what taxonomies are basically for. Taxonomy is like infrastructure, and, as one speaker said, as such it goes unnoticed until there is something wrong with it. Presentations that fit into this theme of foundations included the opening taxonomy workshop (1.75 hours in the Basics track the first morning), defining the business case for a taxonomy, managing stakeholder input, taxonomy governance, tagging with a taxonomy, and content models. There were also case studies, which included those on improving content quality, reuse, and reporting at Intel, a taxonomy and metadata enrichment initiative scaled with AI at Sony Pictures Entertainment, the alignment of siloed taxonomies at Travelers Insurance, ambiguities in a retail taxonomy at Zappos, and tagging that supports personalization at Salesforce.com

TBC sessions that I found particularly useful included Erica Chao's presentation "5 Essential Components of Taxonomy Governance," Michele Ann Jenkins' presentation "Managing Stakeholder Input," and Carrie Hanes' presentation "Content Models and Taxonomies."

Distinct conferences


As similar as TBC and TBCL are in their subject  scope and detail and in their diverse audiences, the two conferences maintain their own distinct character, largely due to the vision and leadership of each of their respective conference chairs, consultants Stephanie Lemieux of Dovecot Studio for TBC and Helen Lippell for TBCL.

Helen summarized this year's TBCL to me: "I was really thrilled with the energy and passion of the audience at Taxonomy Boot Camp London 2019. We always try to put together a programme that offers something for everyone, whether they're total beginners, or expert practitioners pushing the boundaries. When I wasn't running around and could actually sit in the talks, I thoroughly enjoyed every single one."

Stephanie shared her thoughts with remarks at the TBC opening: "One of the main things I love about this event: the diversity of experience that it brings together.... What we all have in common, regardless of where you are in the journey, is that we are all architects and custodians of incredibly important foundational pieces of any information ecosystem."

So, if you're just getting started with taxonomies, then either conference, whichever is more convenient, is appropriate. If taxonomies are your profession, then you should try to attend each conference at least once. It’s worth the trip.

Thursday, October 31, 2019

Managing Tagging with a Taxonomy



A lot of work can be put into designing and creating a taxonomy, but if it’s not implemented or used properly for tagging or indexing, then that work can be wasted. As the volume of content has grown, many organizations have invested in auto-tagging/auto-categorization solutions utilizing text analytics technologies. However, there remain many situations where manual tagging is still more practical. So, support for correct and efficient manual tagging needs to be considered. This is the topic of my upcoming presentation at the Taxonomy Boot Camp conference, in Washington, DC, on November 4.



A taxonomy can be designed to support manual tagging by including alternative labels (synonyms), hierarchical and associative relationships between terms, and term notes, to guide those doing the tagging to the most appropriate terms, even if these taxonomy features are not fully available to end-users in their user interface. It may be easier to have these features available in a customized manual tagging/indexing tool than it is to make them available in the end-user application. A taxonomy has more than one set of users, and the tagging-users need the full benefits a taxonomy can offer.
It’s very important to develop a customized policy for tagging with a taxonomy, so that it is used correctly and consistently. Any policy for tagging or indexing should include both rules and recommended guidelines. Examples of policy topics include:

  • Criteria for determining topic or name relevancy for tagging
  • Depth and level of detail of tagging
  • Comprehensiveness of aspects (what, who, where, when, how, why, etc.)
  • Required term types/facets (and any dependencies)
  • Number of terms (of each type) to tag
  • Tagging of certain terms in combination (e.g.: a parent/broader term in addition to its narrower/child term)
  • Other types of metadata that must be entered

It’s often not enough to just provide people with a policy document. Some degree of training on proper tagging can be very beneficial. In a current SharePoint taxonomy project, one of the users who tags uploaded documents said to me, “The problem is that we have not been trained. We are guessing.” Policy and guidelines should initially be delivered as a presentation (live or web meeting) to allow for questions and answers.

With large volume tagging, the initial tagging should be reviewed and feedback should be provided. This is the case for both new and experienced indexers. Even experienced indexers need to become familiar with the content and learn the policies and guidelines that are particular to the organization and project. In a recent taxonomy project that involved indexing hundreds of articles by a professional indexer, even the professional indexer’s initial indexing was reviewed to make sure it was as thorough and accurate as required.

Finally, there needs to me a method of communication and feedback between those doing the tagging and the person (taxonomist) who is managing the taxonomy, which is a controlled vocabulary, after all. The taxonomist should inform those tagging of new terms and changed terms, especially if they are high-profile terms, and may also provide tips for tagging new and trending topics. Meanwhile those doing tagging need a method to contact the taxonomist to request clarifications or the addition of new terms. This could be by email, but collaboration workspaces may also work well.  While I, as a consultant, do not stay on as tagging continues, I like to be available at the start of tagging with a new taxonomy, to answer indexing questions, something I did just this past month on my most recent consulting project.



Friday, September 13, 2019

SEMANTiCS conference


I attended the 15th annual SEMANTiCS conference this week for the first time. Semantics means “meaning” in language, and in the context of taxonomies and other controlled vocabularies (knowledge organization systems) semantics is a given. We taxonomists don’t concentrate on the topic of semantics that much, because it’s a basic characteristic of knowledge organization systems, which focus on concepts and their meanings, rather than just words. Tagging/indexing with a taxonomy or other kind of knowledge organization system may even be called “semantic enrichment.” Semantics is not a given, however, in related areas of information technology and data science, but more awareness and interest in how technology and semantics can support each other,  for better utilization of information, is growing, as this conference demonstrates. These may include technologies and standards of the Semantic Web, but uses go beyond the Web to include various internal enterprise applications.

SEMANTiCS Karlsruhe 2019 conference


SEMANTiCS is a European conference that rotates in different cities This year the conference was in Karlsruhe, Germany, for the first time, which turns out to be somewhat of a technology center. Before I went, someone told me to expect European conferences which are not merely spinoffs of American conferences to be different, with perhaps less intermingling, socializing, and networking. That was certainly not the case. I found the attendees, whether German or from other European countries, to be very friendly and open to speaking with and connecting with new colleagues, whether myself or others. So, it was definitely a good networking opportunity.

The SEMANTiCS conference is more in the area of information technology and data science than in fields of content/knowledge management, where we taxonomists tend to be, but, of course, it was not just about technology, but rather about the added “semantic layer.” What I liked is that it brought together taxonomists (I was not the only one) with those who work in technology (software developers, solutions architects, computer scientists, data scientists, etc.). The theme of the conference is knowledge graphs and AI, which have also become themes of the Taxonomy Boot Camp conferences recently. Ontologies, another specialty that bridges the work of taxonomists and computer scientists, were also a focus of this conference. Other topics included machine learning, data governance, and knowledge management.

Heather Hedden presenting at SEMANTiCS conference 2019 in Karlsruhe
Heather Hedden presenting at SEMANTiCS 2019 Karlsruhe
 The SEMANTiCS conference is somewhat unique in how it bridges both industry and academia. It has both industry presentations and academic papers, each with separate conference chairs/review committees, and with academic papers to be published as conference proceedings , yet the presentations were not in separate tracks, and both industry and academic presentations were combined into the same sessions by theme. Session themes included knowledge graphs, natural language processing, semantic information management, knowledge discovery & semantic search, knowledge extraction, data integration, and also thesaurus & ontology management (in which I presented). There were also subject-themed tracks on legal technology and on digital humanities/cultural heritage. In each time slot were five consecutive sessions.

SEMANTiCS is not put on by an event company, but is rather a collaborative effort of several organizations, companies and educational institutions, with some variation, depending on the location. The Semantic Web Company has been a consistent organizer/sponsor. Others this year included FIZ Karlsruhe and several European universities.

By the numbers, the conference had 472 registered attendees and 25 sponsors, of which 15 were also exhibitors. There were 37 industry presentations, 28 academic paper presentations, 5 keynote/plenary presentations, 2 invited talks, 1 panel discussion, 31 posters, and 9 preconference workshops/tutorials. This was the largest SEMANTiCS conference to date.

SEMANTiCS Karlsruhe 2019 conference gala dinner
Attendees gather for the conference gala dinner
Particularly exciting was the announcement that, in additional to next September’s conference in Amsterdam, for the first time SEMANTiCS will come to the United States, scheduled for April 21-23 in Austin, Texas: SEMANTiCS Austin 2020. (Call for proposals due November 8.) Lead organizers are the Semantic Web Company and Enterprise Knowledge. The conference won’t be identical to the European version, as it will not have academic papers, but it promises to be very interesting and informative, and I plan to be there.

Thursday, August 22, 2019

Taxonomy Mapping


As more taxonomies get created, we see a growing need to “map” taxonomies to each other, which is linking between individual terms or concepts in each taxonomy so that the taxonomies may be used in some combination. Mapping is not new, but as it has become more frequent it is now reflected in newer standards and in taxonomy management software features.

Diagram of mapping taxnomies
Mapping taxonomies

Reasons or use cases for mapping include:
  • Selected content with an enterprise taxonomy is made available on a public web site with a different public-facing taxonomy.
  • A provider of scientific/technical/medical content with a technical thesaurus creates a simpler taxonomy aimed at laypeople.
  • Content will be made available in a different language region, and a comparable  taxonomy already exists in the other language.
  • A knowledge graph is built to aggregate data from multiple repositories, each with its own taxonomy.
  • An enterprise search is based on “federated search” and different areas have different search-support thesauri.
  • Terms from search engine logs are mapped to a taxonomy to add alternative labels.
  • Terms from an open source or licensed vocabulary are mapped to a taxonomy to enrich it.

I’ve worked on occasional taxonomy mapping projects since the late 1990s, and I discuss mapping in a section of my book, The Accidental Taxonomist (2nd edition, pp. 369-73) and in an earlier blog post. I’ve also presented in conferences before on mapping taxonomies, as early as 2009, but only briefly and in the wider in the context of related activities of merging taxonomies and creating multilingual taxonomies. My next conference presentation (not including a pre-conference workshop), “Mapping Taxonomies, Thesauri, and Ontologies” (SEMANTiCS 2019 in Karlsruhe, Germany), will be dedicated to subject of mapping.

In talking recently with more people about mapping, both clients and software vendors, I’ve learned that my previous view of mapping was somewhat narrow. I had considered mapping to be only one-way directional from terms in a tagged taxonomy to terms in a retrieval taxonomy. 

Diagram of one-way taxonomy mapping
One-way directional taxonomy mapping
I still think this model applies to the majority of use cases, but mapping has a broader meaning in the standards and in taxonomy management software capabilities.

Standards for Taxonomy Mapping


The SKOS (Simple Knowledge Organization System) W3C standard adopted in 2009 for a controlled vocabulary model and interchangeable format specifies not only the familiar thesaurus relationships of broader, narrower, and related, but what are called mapping relationships comprising exactMatch, closeMatch, broadmatch, narrowMatch, and relatedMatch. How these different mapping relationship types are to be used is really up to the taxonomy owner. The broadMatch and narrowMatch are directional, but reciprocal, so using these permits bidirectional mapping. However, there is no reason why you cannot use just one mapping relationship type if you are mapping in only a single direction. Or you could use just two, such as exactMatch and broadMatch.

The international standard ISO 25964-2 Thesaurus and Interoperability with Other Vocabularies – Part 2: Interoperability with Other Vocabularies (published in 2013) is substantially about mapping. Interoperability is not synonymous with mapping but covers more,  including using a standard format such as SKOS. However, the ISO standard discusses mapping in more detail than any other form of interoperability. The introduction states that “inter-vocabulary mapping will be the principal focus of this part of ISO 25964.” (The slightly older American standard, ANSI/NISO Z.39.19-2005 is comparable with ISO 25964 Part 1, which is all about thesauri, and lacks any explanation of mapping.) While SKOS provides standardized labels, useful for porting and linking vocabularies between different systems and the web, ISO 25964-2 provides guidance on the theory and practice of various types of mappings.

ISO 25964-2 defines mapping broadly as the “process of establishing relationships between the concepts of one vocabulary and those of another.” Like SKOS, it also covers different kinds of mapping relationships, although it describes more types: equivalence, compound equivalence, hierarchical, associative, exact, inexact, and partial equivalence. It also discusses mapping on the high level between pairs or multiple vocabularies and in what kind of direction/arrangement. The standard also includes examples. There is really a lot to consider, and I’ll definitely re-read ISO 25964-2 in detail before embarking on my next mapping project.

Software for Taxonomy Mapping


When I first did taxonomy mapping, Excel files of each vocabulary were compared with either the features of Excel or through scripting. Now, mapping can be also done within taxonomy management software, once both vocabularies are in the software, usually requiring that at least one be imported.

As most commercial taxonomy/thesaurus/ontology management software now supports the SKOS standard, such software also supports the SKOS mapping relationships between vocabularies. The leading vendors, PoolParty, Smartlogic and Synaptica additionally include an auto-mapping tool that uses “smart” or “fuzzy” match techniques, including some stemming, to automatically match equivalences or near-matches between concepts in two different vocabularies, which can then be manually reviewed and approved or rejected. To be done correctly, a taxonomist should perform this review. Automated mapping also takes alternative labels (nonpreferred terms) into consideration and creates a propose match if an alternative label in one vocabulary matches a preferred label in another.

The software’s mapping feature is agnostic to your intentions and direction of mapping, so it’s important to plan the mapping so that it supports mapping in the direction you want. In addition to terms with equivalent meaning, it is also acceptable to map from a narrower to a broader concept as the narrower is an example of the broader and can be used for it, but the mapping won’t work in the other direction. It is also acceptable to map from a term that is a preferred label to a concept where that term is an alternative label/nonpreferred term, and that mapping also won’t work in the other direction.

If planning your mapping project seems daunting, the software vendors, PoolParty, Smartlogic, Synaptica, and Access Innovations (vendor of Data Harmony Thesaurus Master) will provide assistance or the full service of mapping. In fact, Access Innovations has not included an auto-mapping feature in DH Thesaurus Master, because customized results may be better achieved through offline mapping.

Mapping is not just between taxonomies, but can be between taxonomies and thesauri, thesauri and ontologies, or other controlled vocabularies, something else that ISO 25964-2 covers. If you need assistance with mapping, I'd be happy to help.