As a taxonomist who designs and creates taxonomies, I have
always advocated creating a customized taxonomy for each implementation, which
takes into consideration the particular set of content and type of users. Nevertheless,
there are situations when licensing a taxonomy (or any kind of controlled vocabulary)
created by a third party may be desirable, such as for a start of a taxonomy
that is then modified, for a single facet of a faceted taxonomy, or for tagging
multi-source research content.
Taking an existing taxonomy created by a third party, without modification, can have several problems. Its scope may be narrower than needed, or it might not be as detailed, so needed concepts would be missing. Its scope may be broader than deeded, or it may be more detailed than needed, so it’s cumbersome and not user friendly, and indexing with it would be inconsistent. Its language style might not suit the new users, so users cannot find what they are looking for. Its terms and even their alternative labels (synonyms), may not match the language of the content, so content may not get indexed properly. Finally, it might not even have the desired structure, such as the difference between a thesaurus and a hierarchical taxonomy
Taking an existing taxonomy created by a third party, without modification, can have several problems. Its scope may be narrower than needed, or it might not be as detailed, so needed concepts would be missing. Its scope may be broader than deeded, or it may be more detailed than needed, so it’s cumbersome and not user friendly, and indexing with it would be inconsistent. Its language style might not suit the new users, so users cannot find what they are looking for. Its terms and even their alternative labels (synonyms), may not match the language of the content, so content may not get indexed properly. Finally, it might not even have the desired structure, such as the difference between a thesaurus and a hierarchical taxonomy
Taxonomy Licensing Uses
Licensing a taxonomy can be done as a starting point, whereby
the taxonomy can then be sufficiently modified for its new use. Modifications
include removing concepts out of scope and not needed, adding missing concepts
and their relationships, creating additional alternative labels to existing or
new concepts, and changing the wording of selected preferred labels to conform
with the preference of the users. If only a fraction of concepts need changing,
and it’s more a matter of adding new concepts, then licensing can be a good way
to get a taxonomy up and running more quickly than starting from scratch.
Licensing a controlled vocabulary to serve for just one or
two facets or metadata properties of a larger taxonomy set may also be
practical option. A faceted taxonomy enables user to filter or limit search
results by a combination of concepts selected from multiple facets/filters. For
example, for images these could be: geographic place, location type, occasion,
person type, time of year, activity, and object. It might be desirable to
license a vocabulary for geographic place or person type and create the other
vocabularies. Other examples of a
single-facet taxonomy that might be of interest for licensing include product
types and industries. A facet may
contain a hierarchical structure or a flat list.
Licensing a taxonomy as is, with little or no modification,
is sometimes appropriate if the original purpose and the new purpose are the same
and the type of user is the same. This would not be the case for internally
created content, but if the content comes from multiple external sources, such
as published articles, and the users are conducting external research, then a
third-party created taxonomy in the desired discipline or industry might be
appropriate. Fields such as medicine, pharmaceuticals, engineering, and the
sciences in general may be suitable for licensing a taxonomy with little
modification.
Taxonomy Licensing Issues
The licensed taxonomy not only needs to be in the
appropriate subject area but needs to have been initially created for a similar
audience and purpose, which can be determined by contacting the original
creator/publisher of the taxonomy. For example, a subject area of “finance”
will have somewhat different concepts depending on whether it was created for
academic/research use or for internal enterprise content management use.
The licensed controlled vocabulary should be of the desired
type: classification system, taxonomy, thesaurus, ontology, etc. This is not
always obvious, since the distinctions between taxonomies, thesauri, and
ontologies can be blurred, and the term “taxonomy” is sometimes used for many
different kinds. So, it’s important to ask the taxonomy publisher specific
questions, such as how many top terms there are, what kinds of relationships
there are between concepts, and whether there are classes or categories
assigned to concepts.
If modification is going to be done, which is often the
case, the license needs to permit modification. An open source and free
taxonomy may restrict modification and require attribution to the source of the
unaltered taxonomy. An open source and free taxonomy usually prohibits
commercial reuse as well. A paid license, on the other hand, typically permits
modification, the use of the terms to create a new taxonomy (as a “derivative
work”), and commercial use.
A taxonomy that is available for license typically comes in
standard interchangeable format, such as CSV, XML, RDF, SKOS, etc., so it can be imported into
taxonomy/thesaurus/ontology management software, where it can be further modified.
An understanding of the formats is needed to select the most desirable one,
when multiple formats are supported.
Taxonomy Licensing Sources
Finding the right taxonomy is important. A good source of taxonomies and other vocabularies for
license is Taxonomy Warehouse, where you can search or browse for taxonomies by subject. Taxonomy Warehouse contains over 760 vocabularies of all kinds in all subject areas in various formats from 330 organizations. It’s the largest listing available of proprietary vocabularies
available for commercial-use licenses.
There is also a larger, more international resource,
developed and maintained by the University of Basel Library, the Basel Register of Thesauri,
Ontologies & Classifications (BARTOC). As a “register,” not all the
2,878 indexed vocabularies are available for license. Each vocabulary is
classified and assigned metadata for subject, category, vocabulary type, file
format, language, and license type, among other classifications. It’s quite comprehensive for open source/free
vocabularies, and has some, but is not as inclusive yet of, commercially
licensed vocabularies, but it’s growing
Some major information publishers who have developed extensive
thesauri or taxonomies to index their published content do offer the
vocabularies for license, but thee do not promote it, so this is little known,
and they reserve the right not to license vocabularies to a party considered a
competitor. Examples include the Gale
Subject Thesaurus and the Associated Press’ News Taxonomy.
Taxonomy Licensing Trends: A Survey
So, to what extent do organizations seek to license a
taxonomy as part of their knowledge or content management strategy? That’s a
good question. Thus, I have created a short multiple-choice questionnaire, the
results of which will be posted in a future blog post and may perhaps become a
conference presentation topic as well. Please take a few minutes (estimated
4 minutes) to fill out my short Taxonomy Licensing Interest
Survey.