The Accidental Taxonomist: 2019

Saturday, December 28, 2019

Taxonomy Licensing Interest

Just over a year ago I had blogged on the topic of Taxonomy Licensing. I explained that usually a customized taxonomy is best, but occasionally licensing a taxonomy is a option worth considering in certain circumstances: as a starting point to then modify, to serve as a single facet in a faceted taxonomy, or to index content from various external sources on a defined topic area for which a good taxonomy exists. There are issues, though, such as whether to right kind of taxonomy exists and whether the license permits modification of the taxonomy.

Various organizations, companies, and even individuals have created taxonomies or other controlled vocabularies, which they have made available for license. Whether it’s worthwhile for them to promote taxonomies that are for license is uncertain. So, a year ago I created an online survey of taxonomy (or more broadly, any controlled vocabulary) licensing interest, which I announced not only on this blog, but also the blogs of taxonomy software vendors and at various conferences. The survey stayed open for about 6 months, and there were over 60 responses to most questions. Now it is time to share those results. Although the responses are in the context of licensing controlled vocabularies, some of the questions and responses--about the taxonomy purpose, type or subject area of interest--might reflect general interest in taxonomies. (Percentages have been rounded.)

The first question asked about interest in licensing taxonomies or other controlled vocabularies. Slightly more than half of the respondents (61%) have considered licensing taxonomies, but most have not gone any further in identifying appropriate taxonomies to license. The leading reasons given not to from those respondents who said they would not likely to license a taxonomy (22 respondents out of 66), were:

Custom-created taxonomies would best serve my purposes: 59%
Licensed taxonomies that are modifiable and permit commercial reuse are too expensive: 14%

The leading concerns regarding licensing a taxonomy, ranked in order were the following:

Difficulty finding or lack of a suitable taxonomy
Difficulty integrating a licensed taxonomy into an existing taxonomy or taxonomy set
Effort to modify, adapt, and/or expand a license taxonomy
Licensing fee cost
Features of the licensed taxonomy missing
File format and implementation issues

The types of controlled vocabularies that respondents are most interested in licensing (allowing multiple responses) were:

Hierarchical taxonomy: 56%
Controlled vocabulary for part of a faceted taxonomy: 55%
Ontology: 40%
Thesaurus: 35%
Name authority file (companies, places, organizations, person names, etc.): 17%
Classification scheme (such as with alpha-numeric codes): 10%

The subject areas of controlled vocabularies that respondents are most interested in licensing (allowing multiple responses) were:

Business/management/enterprise functions: 36%
Information technology/computing: 30%
Industries: 28%
Company or organization names: 26%
Products/services: 23%Health/medicine: 21%
Geographic places: 21%
Engineering & design: 20%
Law & policy: 20%
Science & math: 18%
Humanities & social sciences: 13%
Occupations or job titles: 13%

Finance was a popular write-in option under “Other.”

The purposes that respondents said a licensed controlled vocabulary would serve (allowing multiple responses) were:

Internal content management and search & retrieval: 82%
Business intelligence/market research/competitive intelligence/data analysis: 32%
Expertise identification: 24%
Public/website content findability – commercial: 21%
Education/research: 19%
Ecommerce or B2B: 18%
Public/website content findability – nonprofit: 15%
Public/website content findability – government: 8%

The size ranges of a controlled vocabulary that respondents said they would be interested in licensing (allowing multiple responses) were:

1,000 - 5,000 concepts: 33%
More than 10,000 concepts: 26%
500 - 1,000 concepts: 21%5,000 - 10,000 concepts: 21%
Less than 100 concepts: 17%
100 - 500 concepts: 14%

The formats of a controlled vocabulary that respondents said they would be interested in licensing (allowing multiple responses, especially since some of these formats are not mutually exclusive) were:

XML: 44%
Unsure:39%
Excel (xls or xlsx): 34%
SKOS: 32%
CSV: 26%
RDF: 24%
JSON: 24%
OWL: 16%
Turtle: 11%
Z Thes: 8%

The leading industries of respondents were:

Consulting/professional services: 18% (Perhaps taxonomy consultants, like me?)
Nongovernmental/nonprofit: 18% (Perhaps because licensing restrictions for commercial re-use are not an issue.)
Software/Hardware/IT: 13%
Manufacturing/Construction/Engineering: 10%

Additionally, 10 other individual industries were indicated with only 2-3 individual responses each.

Conclusions from the survey include:

Concerns around licensing are shared, and there is no dominant single concern.
Hierarchical taxonomies and vocabularies for facets of faceted taxonomies are the types most of interest.
The subject area of greatest interest is business/management/enterprise functions.
Internal content is the leading purpose for controlled licensing.
Size of vocabularies of interest includes all, but the mid-range dominates.
Industries interested in vocabulary licensing vary, and none dominates.
XML and CSV/Excel or the formats of greatest interest, but a significant number are unsure of format desired.

Sunday, November 17, 2019

Taxonomy Boot Camp Conferences 2019

Taxonomies may seem like a very niche specialization, but interest keeps growing, as indicated by participation in the conferences dedicated to taxonomies, Taxonomy Boot Camp in Washington, DC (TBC) and Taxonomy Boot Camp London (TBCL). TBC, now in its 14th year, was held November 4 and 5, and TBCL, now in its 4th year, was held October 15 and 16. Interest in taxonomies is clearly growing, as new people continue to attend the conferences. By a show of hands, a large majority, perhaps 75%, of the attendees of TBC were there for the first time, and more than half of the attendees of TBCL were also first-timers. TBCL also increased the number of its preconference workshops to four this year. While I didn't get official numbers, overall attendance also seems to be rising.

Taxonomy Boot Camp London sessions

TBCL’s theme, "Anything is possible," while not exactly a unifying theme, emphasized the diversity of applications of taxonomies. Sessions which may be considered related to this theme included those on knowledge graphs, search, blockchain, automatic tagging, taxonomy interoperability, and machine learning. Case study presentations included BBC content tagging, maintaining large complex taxonomies at CAB International and SAGE Publishing, healthcare taxonomies of Elsevier and NHS Digital, and public sector taxonomies. Practical sessions from experienced taxonomists included presentations on taxonomy software selection, taxonomies in SharePoint, validating a taxonomy with stakeholders, and selling the value of taxonomies.

TBCL sessions this year that I found particularly interesting included Maura Moran's on how to sell your organization on the value of taxonomy, get agreement, and start organizing information silos. I found her advice on working with stakeholders relevant to my work. Patrick Lambe's presentation on capabilities that taxonomists need to a quite was also good. Agnes Molnar gave an informative presentation "Extending SharePoint Taxonomy," which explained a method, with third-party tools and technology, to overcome the various deficits SharePoint has in supporting robust taxonomy features.

TBCL had taxonomy-related talks for the keynotes on both mornings. Tuesday's keynote by Emma Chittendon dealt with the topic of term labels, and Wednesday's keynote by Nick Poole dealt with the ethics of structured information.

Taxonomy Boot Camp (DC) sessions

Three weeks later, TBC's theme was "Building Strong Foundations," which us what taxonomies are basically for. Taxonomy is like infrastructure, and, as one speaker said, as such it goes unnoticed until there is something wrong with it. Presentations that fit into this theme of foundations included the opening taxonomy workshop (1.75 hours in the Basics track the first morning), defining the business case for a taxonomy, managing stakeholder input, taxonomy governance, tagging with a taxonomy, and content models. There were also case studies, which included those on improving content quality, reuse, and reporting at Intel, a taxonomy and metadata enrichment initiative scaled with AI at Sony Pictures Entertainment, the alignment of siloed taxonomies at Travelers Insurance, ambiguities in a retail taxonomy at Zappos, and tagging that supports personalization at Salesforce.com

TBC sessions that I found particularly useful included Erica Chao's presentation "5 Essential Components of Taxonomy Governance," Michele Ann Jenkins' presentation "Managing Stakeholder Input," and Carrie Hanes' presentation "Content Models and Taxonomies."

Distinct conferences

As similar as TBC and TBCL are in their subject scope and detail and in their diverse audiences, the two conferences maintain their own distinct character, largely due to the vision and leadership of each of their respective conference chairs, consultants Stephanie Lemieux of Dovecot Studio for TBC and Helen Lippell for TBCL.

Helen summarized this year's TBCL to me: "I was really thrilled with the energy and passion of the audience at Taxonomy Boot Camp London 2019. We always try to put together a programme that offers something for everyone, whether they're total beginners, or expert practitioners pushing the boundaries. When I wasn't running around and could actually sit in the talks, I thoroughly enjoyed every single one."

Stephanie shared her thoughts with remarks at the TBC opening: "One of the main things I love about this event: the diversity of experience that it brings together.... What we all have in common, regardless of where you are in the journey, is that we are all architects and custodians of incredibly important foundational pieces of any information ecosystem."

So, if you're just getting started with taxonomies, then either conference, whichever is more convenient, is appropriate. If taxonomies are your profession, then you should try to attend each conference at least once. It’s worth the trip.

Thursday, October 31, 2019

Managing Tagging with a Taxonomy

A lot of work can be put into designing and creating a taxonomy, but if it’s not implemented or used properly for tagging or indexing, then that work can be wasted. As the volume of content has grown, many organizations have invested in auto-tagging/auto-categorization solutions utilizing text analytics technologies. However, there remain many situations where manual tagging is still more practical. So, support for correct and efficient manual tagging needs to be considered. This is the topic of my upcoming presentation at the Taxonomy Boot Camp conference, in Washington, DC, on November 4.

A taxonomy can be designed to support manual tagging by including alternative labels (synonyms), hierarchical and associative relationships between terms, and term notes, to guide those doing the tagging to the most appropriate terms, even if these taxonomy features are not fully available to end-users in their user interface. It may be easier to have these features available in a customized manual tagging/indexing tool than it is to make them available in the end-user application. A taxonomy has more than one set of users, and the tagging-users need the full benefits a taxonomy can offer.

It’s very important to develop a customized policy for tagging with a taxonomy, so that it is used correctly and consistently. Any policy for tagging or indexing should include both rules and recommended guidelines. Examples of policy topics include:

Criteria for determining topic or name relevancy for tagging
Depth and level of detail of tagging
Comprehensiveness of aspects (what, who, where, when, how, why, etc.)
Required term types/facets (and any dependencies)
Number of terms (of each type) to tag
Tagging of certain terms in combination (e.g.: a parent/broader term in addition to its narrower/child term)
Other types of metadata that must be entered

It’s often not enough to just provide people with a policy document. Some degree of training on proper tagging can be very beneficial. In a current SharePoint taxonomy project, one of the users who tags uploaded documents said to me, “The problem is that we have not been trained. We are guessing.” Policy and guidelines should initially be delivered as a presentation (live or web meeting) to allow for questions and answers.

With large volume tagging, the initial tagging should be reviewed and feedback should be provided. This is the case for both new and experienced indexers. Even experienced indexers need to become familiar with the content and learn the policies and guidelines that are particular to the organization and project. In a recent taxonomy project that involved indexing hundreds of articles by a professional indexer, even the professional indexer’s initial indexing was reviewed to make sure it was as thorough and accurate as required.

Finally, there needs to me a method of communication and feedback between those doing the tagging and the person (taxonomist) who is managing the taxonomy, which is a controlled vocabulary, after all. The taxonomist should inform those tagging of new terms and changed terms, especially if they are high-profile terms, and may also provide tips for tagging new and trending topics. Meanwhile those doing tagging need a method to contact the taxonomist to request clarifications or the addition of new terms. This could be by email, but collaboration workspaces may also work well. While I, as a consultant, do not stay on as tagging continues, I like to be available at the start of tagging with a new taxonomy, to answer indexing questions, something I did just this past month on my most recent consulting project.