Tuesday, December 4, 2018

Taxonomy Licensing


As a taxonomist who designs and creates taxonomies, I have always advocated creating a customized taxonomy for each implementation, which takes into consideration the particular set of content and type of users. Nevertheless, there are situations when licensing a taxonomy (or any kind of controlled vocabulary) created by a third party may be desirable, such as for a start of a taxonomy that is then modified, for a single facet of a faceted taxonomy, or for tagging multi-source research content.

Taking an existing taxonomy created by a third party, without modification, can have several problems. Its scope may be narrower than needed, or it might not be as detailed, so needed concepts would be missing. Its scope may be broader than deeded, or it may be more detailed than needed, so it’s cumbersome and not user friendly, and indexing with it would be inconsistent. Its language style might not suit the new users, so users cannot find what they are looking for. Its terms and even their alternative labels (synonyms), may not match the language of the content, so content may not get indexed properly. Finally, it might not even have the desired structure, such as the difference between a thesaurus and a hierarchical taxonomy

Taxonomy Licensing Uses


Licensing a taxonomy can be done as a starting point, whereby the taxonomy can then be sufficiently modified for its new use. Modifications include removing concepts out of scope and not needed, adding missing concepts and their relationships, creating additional alternative labels to existing or new concepts, and changing the wording of selected preferred labels to conform with the preference of the users. If only a fraction of concepts need changing, and it’s more a matter of adding new concepts, then licensing can be a good way to get a taxonomy up and running more quickly than starting from scratch.

Licensing a controlled vocabulary to serve for just one or two facets or metadata properties of a larger taxonomy set may also be practical option. A faceted taxonomy enables user to filter or limit search results by a combination of concepts selected from multiple facets/filters. For example, for images these could be: geographic place, location type, occasion, person type, time of year, activity, and object. It might be desirable to license a vocabulary for geographic place or person type and create the other vocabularies.  Other examples of a single-facet taxonomy that might be of interest for licensing include product types and industries.  A facet may contain a hierarchical structure or a flat list.

Licensing a taxonomy as is, with little or no modification, is sometimes appropriate if the original purpose and the new purpose are the same and the type of user is the same. This would not be the case for internally created content, but if the content comes from multiple external sources, such as published articles, and the users are conducting external research, then a third-party created taxonomy in the desired discipline or industry might be appropriate. Fields such as medicine, pharmaceuticals, engineering, and the sciences in general may be suitable for licensing a taxonomy with little modification.

Taxonomy Licensing Issues


The licensed taxonomy not only needs to be in the appropriate subject area but needs to have been initially created for a similar audience and purpose, which can be determined by contacting the original creator/publisher of the taxonomy. For example, a subject area of “finance” will have somewhat different concepts depending on whether it was created for academic/research use or for internal enterprise content management use.

The licensed controlled vocabulary should be of the desired type: classification system, taxonomy, thesaurus, ontology, etc. This is not always obvious, since the distinctions between taxonomies, thesauri, and ontologies can be blurred, and the term “taxonomy” is sometimes used for many different kinds. So, it’s important to ask the taxonomy publisher specific questions, such as how many top terms there are, what kinds of relationships there are between concepts, and whether there are classes or categories assigned to concepts.

If modification is going to be done, which is often the case, the license needs to permit modification. An open source and free taxonomy may restrict modification and require attribution to the source of the unaltered taxonomy. An open source and free taxonomy usually prohibits commercial reuse as well. A paid license, on the other hand, typically permits modification, the use of the terms to create a new taxonomy (as a “derivative work”), and commercial use.

A taxonomy that is available for license typically comes in standard interchangeable format, such as CSV, XML, RDF, SKOS, etc., so it can be imported into taxonomy/thesaurus/ontology management software, where it can be further modified. An understanding of the formats is needed to select the most desirable one, when multiple formats are supported.

Taxonomy Licensing Sources


Finding the right taxonomy is important. A good source of taxonomies and other vocabularies for license  is Taxonomy Warehouse, where you can search or browse for taxonomies by subject. Taxonomy Warehouse contains over 760 vocabularies of all kinds in all subject areas in various formats from 330 organizations. It’s the largest listing available of proprietary vocabularies available for commercial-use licenses.

There is also a larger, more international resource, developed and maintained by the University of Basel Library, the Basel Register of Thesauri, Ontologies & Classifications (BARTOC). As a “register,” not all the 2,878 indexed vocabularies are available for license. Each vocabulary is classified and assigned metadata for subject, category, vocabulary type, file format, language, and license type, among other classifications.  It’s quite comprehensive for open source/free vocabularies, and has some, but is not as inclusive yet of, commercially licensed vocabularies, but it’s growing

Some major information publishers who have developed extensive thesauri or taxonomies to index their published content do offer the vocabularies for license, but thee do not promote it, so this is little known, and they reserve the right not to license vocabularies to a party considered a competitor. Examples include the Gale Subject Thesaurus and the Associated Press’ News Taxonomy.

Taxonomy Licensing Trends: A Survey


So, to what extent do organizations seek to license a taxonomy as part of their knowledge or content management strategy? That’s a good question. Thus, I have created a short multiple-choice questionnaire, the results of which will be posted in a future blog post and may perhaps become a conference presentation topic as well. Please take a few minutes (estimated 4 minutes) to fill out my short Taxonomy Licensing Interest Survey.