Showing posts with label Licensing taxonomies. Show all posts
Showing posts with label Licensing taxonomies. Show all posts

Saturday, December 28, 2019

Taxonomy Licensing Interest


Just over a year ago I had blogged on the topic of Taxonomy Licensing. I explained that usually a customized taxonomy is best, but occasionally licensing a taxonomy is a option worth considering in certain circumstances:  as a starting point to then modify, to serve as a single facet in a faceted taxonomy, or to index content from various external sources on a defined topic area for which a good taxonomy exists. There are issues, though, such as whether to right kind of taxonomy exists and whether the license permits modification of the taxonomy.

Various organizations, companies, and even individuals have created taxonomies or other controlled vocabularies, which they have made available for license.  Whether it’s worthwhile for them to promote taxonomies that are for license is uncertain. So, a year ago I created an online survey of taxonomy (or more broadly, any controlled vocabulary) licensing interest, which I announced not only on this blog, but also the blogs of taxonomy software vendors and at various conferences. The survey stayed open for about 6 months, and there were over 60 responses to most questions.  Now it is time to share those results. Although the responses are in the context of licensing controlled vocabularies, some of the questions and responses--about the taxonomy purpose, type or subject area of interest--might reflect general interest in taxonomies. (Percentages have been rounded.)

The first question asked about interest in licensing taxonomies or other controlled vocabularies. Slightly more than half of the respondents (61%) have considered licensing taxonomies, but most have not gone any further in identifying appropriate taxonomies to license. The leading reasons given not to from those respondents who said they would not likely to license a taxonomy (22 respondents out of 66), were:

  1. Custom-created taxonomies would best serve my purposes: 59%
  2. Licensed taxonomies that are modifiable and permit commercial reuse are too expensive: 14%

The leading concerns regarding licensing a taxonomy, ranked in order were the following:
  1. Difficulty finding or lack of a suitable taxonomy
  2. Difficulty integrating a licensed taxonomy into an existing taxonomy or taxonomy set
  3. Effort to modify, adapt, and/or expand a license taxonomy
  4. Licensing fee cost
  5. Features of the licensed taxonomy missing
  6. File format and implementation issues


The types of controlled vocabularies that respondents are most interested in licensing (allowing multiple responses) were:

  1. Hierarchical taxonomy: 56%
  2.  Controlled vocabulary for part of a faceted taxonomy: 55%
  3.  Ontology: 40%
  4.  Thesaurus: 35%
  5.  Name authority file (companies, places, organizations, person names, etc.): 17%
  6.  Classification scheme (such as with alpha-numeric codes): 10%


The subject areas of controlled vocabularies that respondents are most interested in licensing (allowing multiple responses) were:

  1. Business/management/enterprise functions: 36%
  2.  Information technology/computing: 30%
  3.  Industries: 28%
  4.  Company or organization names: 26%
  5.  Products/services: 23%Health/medicine: 21%
  6.  Geographic places: 21%
  7.  Engineering & design: 20%
  8.  Law & policy: 20%
  9.  Science & math: 18%
  10.  Humanities & social sciences: 13%
  11.  Occupations or job titles: 13%
 Finance was a popular write-in option under “Other.”


 
The purposes that respondents said a licensed controlled vocabulary would serve (allowing multiple responses) were:

  1. Internal content management and search & retrieval: 82%
  2. Business intelligence/market research/competitive intelligence/data analysis: 32%
  3. Expertise identification: 24%
  4. Public/website content findability – commercial: 21%
  5. Education/research: 19%
  6. Ecommerce or B2B: 18%
  7. Public/website content findability – nonprofit: 15%
  8. Public/website content findability – government: 8%


The size ranges of a controlled vocabulary that respondents said they would be interested in licensing (allowing multiple responses) were:

  1. 1,000 - 5,000 concepts: 33%
  2.  More than 10,000 concepts: 26%
  3.  500 - 1,000 concepts: 21%5,000 - 10,000 concepts: 21%
  4. Less than 100 concepts: 17%
  5.  100 - 500 concepts: 14% 
 
The formats of a controlled vocabulary that respondents said they would be interested in licensing (allowing multiple responses, especially since some of these formats are not mutually exclusive) were:
  1.  XML: 44%
  2.  Unsure:39%
  3.  Excel (xls or xlsx): 34%
  4.  SKOS: 32%
  5.  CSV: 26%
  6.  RDF: 24%
  7.  JSON: 24%
  8.  OWL: 16%
  9. Turtle: 11%
  10. Z Thes: 8%


The leading industries of respondents were:

  1. Consulting/professional services: 18% (Perhaps taxonomy consultants, like me?)
  2. Nongovernmental/nonprofit: 18% (Perhaps because licensing restrictions for commercial re-use are not an issue.)
  3. Software/Hardware/IT: 13%
  4. Manufacturing/Construction/Engineering: 10%

Additionally, 10 other individual industries were indicated with only 2-3 individual responses each.

Conclusions from the survey include:

  • Concerns around licensing are shared, and there is no dominant single concern.
  •  Hierarchical taxonomies and vocabularies for facets of faceted taxonomies are the types most of interest.
  • The subject area of greatest interest is business/management/enterprise functions.
  •  Internal content is the leading purpose for controlled licensing.
  • Size of vocabularies of interest includes all, but the mid-range dominates.
  • Industries interested in vocabulary licensing vary, and none dominates.
  • XML and CSV/Excel or the formats of greatest interest, but a significant number are unsure of format desired.

 


Tuesday, December 4, 2018

Taxonomy Licensing


As a taxonomist who designs and creates taxonomies, I have always advocated creating a customized taxonomy for each implementation, which takes into consideration the particular set of content and type of users. Nevertheless, there are situations when licensing a taxonomy (or any kind of controlled vocabulary) created by a third party may be desirable, such as for a start of a taxonomy that is then modified, for a single facet of a faceted taxonomy, or for tagging multi-source research content.

Taking an existing taxonomy created by a third party, without modification, can have several problems. Its scope may be narrower than needed, or it might not be as detailed, so needed concepts would be missing. Its scope may be broader than deeded, or it may be more detailed than needed, so it’s cumbersome and not user friendly, and indexing with it would be inconsistent. Its language style might not suit the new users, so users cannot find what they are looking for. Its terms and even their alternative labels (synonyms), may not match the language of the content, so content may not get indexed properly. Finally, it might not even have the desired structure, such as the difference between a thesaurus and a hierarchical taxonomy

Taxonomy Licensing Uses


Licensing a taxonomy can be done as a starting point, whereby the taxonomy can then be sufficiently modified for its new use. Modifications include removing concepts out of scope and not needed, adding missing concepts and their relationships, creating additional alternative labels to existing or new concepts, and changing the wording of selected preferred labels to conform with the preference of the users. If only a fraction of concepts need changing, and it’s more a matter of adding new concepts, then licensing can be a good way to get a taxonomy up and running more quickly than starting from scratch.

Licensing a controlled vocabulary to serve for just one or two facets or metadata properties of a larger taxonomy set may also be practical option. A faceted taxonomy enables user to filter or limit search results by a combination of concepts selected from multiple facets/filters. For example, for images these could be: geographic place, location type, occasion, person type, time of year, activity, and object. It might be desirable to license a vocabulary for geographic place or person type and create the other vocabularies.  Other examples of a single-facet taxonomy that might be of interest for licensing include product types and industries.  A facet may contain a hierarchical structure or a flat list.

Licensing a taxonomy as is, with little or no modification, is sometimes appropriate if the original purpose and the new purpose are the same and the type of user is the same. This would not be the case for internally created content, but if the content comes from multiple external sources, such as published articles, and the users are conducting external research, then a third-party created taxonomy in the desired discipline or industry might be appropriate. Fields such as medicine, pharmaceuticals, engineering, and the sciences in general may be suitable for licensing a taxonomy with little modification.

Taxonomy Licensing Issues


The licensed taxonomy not only needs to be in the appropriate subject area but needs to have been initially created for a similar audience and purpose, which can be determined by contacting the original creator/publisher of the taxonomy. For example, a subject area of “finance” will have somewhat different concepts depending on whether it was created for academic/research use or for internal enterprise content management use.

The licensed controlled vocabulary should be of the desired type: classification system, taxonomy, thesaurus, ontology, etc. This is not always obvious, since the distinctions between taxonomies, thesauri, and ontologies can be blurred, and the term “taxonomy” is sometimes used for many different kinds. So, it’s important to ask the taxonomy publisher specific questions, such as how many top terms there are, what kinds of relationships there are between concepts, and whether there are classes or categories assigned to concepts.

If modification is going to be done, which is often the case, the license needs to permit modification. An open source and free taxonomy may restrict modification and require attribution to the source of the unaltered taxonomy. An open source and free taxonomy usually prohibits commercial reuse as well. A paid license, on the other hand, typically permits modification, the use of the terms to create a new taxonomy (as a “derivative work”), and commercial use.

A taxonomy that is available for license typically comes in standard interchangeable format, such as CSV, XML, RDF, SKOS, etc., so it can be imported into taxonomy/thesaurus/ontology management software, where it can be further modified. An understanding of the formats is needed to select the most desirable one, when multiple formats are supported.

Taxonomy Licensing Sources


Finding the right taxonomy is important. A good source of taxonomies and other vocabularies for license  is Taxonomy Warehouse, where you can search or browse for taxonomies by subject. Taxonomy Warehouse contains over 760 vocabularies of all kinds in all subject areas in various formats from 330 organizations. It’s the largest listing available of proprietary vocabularies available for commercial-use licenses.

There is also a larger, more international resource, developed and maintained by the University of Basel Library, the Basel Register of Thesauri, Ontologies & Classifications (BARTOC). As a “register,” not all the 2,878 indexed vocabularies are available for license. Each vocabulary is classified and assigned metadata for subject, category, vocabulary type, file format, language, and license type, among other classifications.  It’s quite comprehensive for open source/free vocabularies, and has some, but is not as inclusive yet of, commercially licensed vocabularies, but it’s growing

Some major information publishers who have developed extensive thesauri or taxonomies to index their published content do offer the vocabularies for license, but thee do not promote it, so this is little known, and they reserve the right not to license vocabularies to a party considered a competitor. Examples include the Gale Subject Thesaurus and the Associated Press’ News Taxonomy.

Taxonomy Licensing Trends: A Survey


So, to what extent do organizations seek to license a taxonomy as part of their knowledge or content management strategy? That’s a good question. Thus, I have created a short multiple-choice questionnaire, the results of which will be posted in a future blog post and may perhaps become a conference presentation topic as well. Please take a few minutes (estimated 4 minutes) to fill out my short Taxonomy Licensing Interest Survey.