Just over a year ago I had blogged on the topic of Taxonomy Licensing. I explained that usually a customized taxonomy is best, but occasionally licensing a taxonomy is a option worth considering in certain circumstances: as a starting point to then modify, to serve as a single facet in a faceted taxonomy, or to index content from various external sources on a defined topic area for which a good taxonomy exists. There are issues, though, such as whether to right kind of taxonomy exists and whether the license permits modification of the taxonomy.
Various organizations, companies, and even individuals have created taxonomies or other controlled vocabularies, which they have made available for license. Whether it’s worthwhile for them to promote taxonomies that are for license is uncertain. So, a year ago I created an online survey of taxonomy (or more broadly, any controlled vocabulary) licensing interest, which I announced not only on this blog, but also the blogs of taxonomy software vendors and at various conferences. The survey stayed open for about 6 months, and there were over 60 responses to most questions. Now it is time to share those results. Although the responses are in the context of licensing controlled vocabularies, some of the questions and responses--about the taxonomy purpose, type or subject area of interest--might reflect general interest in taxonomies. (Percentages have been rounded.)
The first question asked about interest in licensing taxonomies or other controlled vocabularies. Slightly more than half of the respondents (61%) have considered licensing taxonomies, but most have not gone any further in identifying appropriate taxonomies to license. The leading reasons given not to from those respondents who said they would not likely to license a taxonomy (22 respondents out of 66), were:
- Custom-created taxonomies would best serve my purposes: 59%
- Licensed taxonomies that are modifiable and permit commercial reuse are too expensive: 14%
The leading concerns regarding licensing a taxonomy, ranked in order were the following:
- Difficulty finding or lack of a suitable taxonomy
- Difficulty integrating a licensed taxonomy into an existing taxonomy or taxonomy set
- Effort to modify, adapt, and/or expand a license taxonomy
- Licensing fee cost
- Features of the licensed taxonomy missing
- File format and implementation issues
The types of controlled vocabularies that respondents are most interested in licensing (allowing multiple responses) were:
- Hierarchical taxonomy: 56%
- Controlled vocabulary for part of a faceted taxonomy: 55%
- Ontology: 40%
- Thesaurus: 35%
- Name authority file (companies, places, organizations, person names, etc.): 17%
- Classification scheme (such as with alpha-numeric codes): 10%
The subject areas of controlled vocabularies that respondents are most interested in licensing (allowing multiple responses) were:
- Business/management/enterprise functions: 36%
- Information technology/computing: 30%
- Industries: 28%
- Company or organization names: 26%
- Products/services: 23%Health/medicine: 21%
- Geographic places: 21%
- Engineering & design: 20%
- Law & policy: 20%
- Science & math: 18%
- Humanities & social sciences: 13%
- Occupations or job titles: 13%
Finance was a popular write-in option under “Other.”
The purposes that respondents said a licensed controlled vocabulary would serve (allowing multiple responses) were:
- Internal content management and search & retrieval: 82%
- Business intelligence/market research/competitive intelligence/data analysis: 32%
- Expertise identification: 24%
- Public/website content findability – commercial: 21%
- Education/research: 19%
- Ecommerce or B2B: 18%
- Public/website content findability – nonprofit: 15%
- Public/website content findability – government: 8%
The size ranges of a controlled vocabulary that respondents said they would be interested in licensing (allowing multiple responses) were:
- 1,000 - 5,000 concepts: 33%
- More than 10,000 concepts: 26%
- 500 - 1,000 concepts: 21%5,000 - 10,000 concepts: 21%
- Less than 100 concepts: 17%
- 100 - 500 concepts: 14%
The formats of a controlled vocabulary that respondents said they would be interested in licensing (allowing multiple responses, especially since some of these formats are not mutually exclusive) were:
- XML: 44%
- Excel (xls or xlsx): 34%
- SKOS: 32%
- CSV: 26%
- RDF: 24%
- JSON: 24%
- OWL: 16%
- Turtle: 11%
- Z Thes: 8%
The leading industries of respondents were:
- Consulting/professional services: 18% (Perhaps taxonomy consultants, like me?)
- Nongovernmental/nonprofit: 18% (Perhaps because licensing restrictions for commercial re-use are not an issue.)
- Software/Hardware/IT: 13%
- Manufacturing/Construction/Engineering: 10%
Additionally, 10 other individual industries were indicated with only 2-3 individual responses each.
Conclusions from the survey include:
- Concerns around licensing are shared, and there is no dominant single concern.
- Hierarchical taxonomies and vocabularies for facets of faceted taxonomies are the types most of interest.
- The subject area of greatest interest is business/management/enterprise functions.
- Internal content is the leading purpose for controlled licensing.
- Size of vocabularies of interest includes all, but the mid-range dominates.
- Industries interested in vocabulary licensing vary, and none dominates.
- XML and CSV/Excel or the formats of greatest interest, but a significant number are unsure of format desired.