I usually have spoken or written only of creating controlled vocabularies, or more specifically taxonomies, rather than creating knowledge models. Now, I am beginning to think of knowledge models and knowledge modeling.
A knowledge model is not just a fancy buzzword for a controlled vocabulary. It’s more complex than that. A knowledge model is more similar to a knowledge organization system, which I defined in an earlier blog post. As a system or a model, it comprises not only the concepts, their labels and attributes, and their relationships, but also rules or policies for their use. Furthermore, a knowledge model is either a complex type of knowledge organization system, such as a thesaurus or an ontology, or a set of multiple controlled vocabularies to be used in combination for the same content set that form a set of taxonomies, such as facets, but it is not a simple single controlled vocabulary. The designation of “model” is also what is used for RDF, SKOS, and OWL-based systems. These are often called semantic models.
The activity of “knowledge modeling” is also slightly different and more complex than mere “taxonomy creation.” Taxonomy creation involves identifying concepts through obtaining input from stakeholders/users and from surveying the content, possibly with some additional external resources, but the extent of obtaining user input may vary. It is possible to build a taxonomy, especially one for external users, with no user input and just input from some other stakeholders. Knowledge modeling also involves inputs of people and content, but more emphasis is on stakeholder/user input. Content contains information, but people contain knowledge, so knowledge modeling requires the input of various people, with the input gathered in a comprehensive and systematic way, such as through interactive brainstorming workshops and interviews. Furthermore, knowledge modeling does not look at merely content, but starts out considering the body “knowledge” that can be derived from the content.
Knowledge modeling may also involve a slightly different thinking of the taxonomist or knowledge modeler. Instead of thinking of what terms are needed for indexing and retrieval of a set of content, the knowledge modeler thinks of what are the possible classes, facets, or concept schemes to describe a domain of knowledge, and what are the various user activities and use cases that could be supported. From there, specific concepts are then created. Taxonomy creation involves a combination of top-down and bottom approaches to the hierarchy of concepts, but knowledge modeling puts more emphasis on the top-down approach.
Knowledge modeling is a very apt description for what is involved in designing and creating ontologies, which are knowledge organization systems that describe a domain of knowledge, through concepts, classes of concepts, and customized semantic relationships between concepts of different classes. (Ontologies, by definition, should also follow the OWL standards of the World Wide Web Consortium for data representation.) There are knowledge organization systems which are not ontologies yet make use of some semantic relationships, and designing these also involves the activity knowledge modeling. Determining what additional semantic relationships are desired, how specific they should be, and what they should be named in both directions is very much a knowledge modeling task.
Knowledge modeling also suggests that it is an activity of knowledge management and not merely information management. Knowledge management is defined as “the process of capturing, distributing, and effectively using knowledge,”(Tom Davenport, 1994), which goes beyond the mere support of search, discovery, and retrieval. Knowledge management is especially for internal enterprise-level knowledge.
I think knowledge modeling is more challenging than mere taxonomy creation, but I am up for the challenge.
Topics related to information management taxonomies posted by the author of the book, The Accidental Taxonomist.
Friday, March 29, 2019
Thursday, February 28, 2019
Taxonomy Building Steps
What are the steps to take when building a taxonomy? This
question was posted not long ago to a discussion group of which I am member. I
referred the person asking to slides of one of my past presentations, "Everything You Need to Know to Start a Taxonomy from Scratch."
That presentation, however, is more
about what to consider in a project of creating a new taxonomy, rather than
actual steps to take. So, I’ll summarize the steps here.
The main steps in developing a taxonomy are information
gathering, draft taxonomy design and building, taxonomy review/testing/validation
and revision, and taxonomy governance/maintenance plan drafting. The steps may
overlap slightly.
Information gathering for a taxonomy
Information gathering involves the two sides of the
taxonomy: the content to which it will be tagged and the users who will utilize
the taxonomy in browsing, searching, filtering, etc.
Information gathering about the content involves looking at
a large representative sample of content (documents, intranet or web pages,
database records, digital assets, etc.) and determining how they would be
classified and what they are about. Determining
how they would be classified is on the higher level of content types or
document types. Determining what they are about is on the more specific level
of indexing terms. As a former indexer, I approach the task as if I were going to
index the documents with index terms of my choosing. These terms are then
gathered and organized into the taxonomy. Any existing term lists or sets of metadata should also be gathered and analyzed.
Information gathering about the needs of the users involves
conducting interviews or using questionnaires to learn about the
information-seeking needs and behaviors of the primary users of the future
taxonomy. Some of the users of the taxonomy won’t be those looking for content
but rather those who will be publishing or uploading content and they will use
the taxonomy to select terms for tagging. Those users should also be interviewed
or asked questions on questionnaires, but they are asked different questions
than of those who perform information-seeking.
Draft taxonomy designing and building
Creating the taxonomy may begin with an initial high-level
taxonomy design and metadata specification, based on the information gathered
from users and some of the content. It is at this stage that the taxonomy type
(hierarchical, faceted, a combination), any larger metadata schema, and the top
terms are determined. Depending on the situation, the taxonomy project owner or
other key stakeholders should provide their feedback on the high-level design
before detailed taxonomy building begins.
Building out the taxonomy involves approaching the structure
from both directions: top down and bottom up. The top-down design and some
building comes primarily from the information gathered in speaking with the
users and other stakeholders. The bottom-up building comes from the index terms
discerned when analyzing sample content. The taxonomy needs to be well designed
from both ends and integrate well in the middle. Terms at both ends may be
revised in the process.
A well-designed taxonomy not only suits the needs of the
users and represents the range of content, but it also needs to follow best
practices for taxonomies so that the format of terms and the relationships
between terms conform to standards, and thus the taxonomy is logical and
intuitive to use.
Taxonomy review/testing/validation and revision
At one or more points in the process, the taxonomy should be
reviewed and tested. Testing should ideally involve both uses of the taxonomy:
finding terms to tag content and finding desired content by means of taxonomy
terms. This testing can be done with an offline sample of content and taxonomy
terms, if the taxonomy has not yet been implemented. Testing may be based on
use cases that came out of the initial user interviews. In this process, concepts missing from the
taxonomy whose meaning is unclear can be identified and added or clarified.
Testing that is done when the taxonomy is nearly finished and expected to be in
good shape might be called “validation.”
Taxonomy governance/maintenance plan drafting
Documenting the policy for the taxonomy and its usage does
not come merely at the end of the project but gets started as the taxonomy is
built and tested. As issues come up and get resolved, they get documented.
Taxonomy governance includes the taxonomy editorial policy/guidelines, the
taxonomy use/tagging policy, and policies and procedures for updating and maintain
the taxonomy. A taxonomy is expected to change and require updating.
Conclusions
Those with skills in creating index terms need to broaden
their skills to include requirements gathering, stakeholder interviewing, and
governance planning, if they want to design and build a taxonomy. Those with
skills in information project management may need to deepen their skills in
best practices for creating taxonomy terms and relationships. If you would like to develop those skills, I
am offering full-day workshops in taxonomy design and creation in Rome, Italy, on March 25, 2019, and in Cleveland, Ohio, on June 15, 2019. I also offer a
self-paced online taxonomy course that can be started any time.
Thursday, January 31, 2019
Indexes and Faceted Taxonomies
I recently completed a project of creating an index for a book. I had done quite a bit of freelance back-of-the-book indexing 2005 – 2013 but had not indexed a book in over four years. Since I also do taxonomy work, whenever I do indexing, I draw comparison between index creation and taxonomy creation. This time I drew some new comparisons.
It is back-of-the-book indexing, rather than the kind of indexing of content items that is done with a taxonomy, that has some similarities with taxonomy creation. That is because they both involve creating taxonomy terms, naming them, coming up with variant names, and relating them to each other. I have written a detailed article “Creating Indexes and Thesauri: Similarities and Differences” published in the journal The Indexer.
During my most recent index project, I thought of comparisons not with thesauri, but with faceted taxonomies. Faceted taxonomies are increasingly common form of taxonomies or controlled vocabularies. Different aspects/dimension/refinements/filter types of a content item and of a query to find it are considered in creating a set of facets from which terms are used in combination. Facets can be for each of such things as named persons, places, person types, events, activities, things, etc. The set of facets, ideally around 4-7, is customized to the set of content. Each facet may contain just a few or hundreds of terms.
An index, of course, is quite unlike a faceted taxonomy, because a single index includes all kinds of terms: named persons, places, person types, events, activities, things, etc. Some books, however, have separate Name and Subject indexes, so that could be like having two facets. Whether it’s a single index or a set of two, however, the user is only looking up one term at a time, unlike a faceted taxonomy, which allows the user to select multiple terms from multiple facets and combine them to limit the search results.
What is significant is that a good index should include all the aspects/dimensions/types of terms. Thus, the intellectual activity of creating a good back-of-the-book index is similar to creating a good faceted taxonomy, because a full set of aspects needs to be considered and created.
The book I recently indexed was a biography of a jazz saxophonist. As I indexed, focusing on the content at the level of a paragraph or a couple of consecutive paragraphs, I found myself making sure I created index terms that covered the different aspects or term types. In this case they tended to be: named persons, named places, person types (different kinds of musicians, music producers, etc.), place types, activities, music groups, music genres, record label companies, names of songs or albums, and music-related topics.
Of course, it is rare that a single paragraph would have more than a couple of distinct index term concepts (not counting synonyms, what in indexes is called “double posts”); a full set of facets is not expected. Rather, though, as I was indexing, after I selected an initial, obvious index term for the paragraph(s), I would then pause to think if there was a different aspect that could also apply as an index term from among potential facet-like categories, as listed above. I felt that being “facet aware” I was able to create a very comprehensive index.
The resulting index is simply an alphabetical arrangement of terms, with the larger concepts further broken down with subentries. It does not appear faceted. However, all the potential facets are included. The variants or synonyms, as “double posts” in the index, help guide different users who think of different words for the same thing to find the text passage of the desired topic. Additionally, the terms of the different aspects, like facets, help guide different users in another way, by serving those who are thinking about different aspects of the book’s content and narrative.
Tuesday, December 4, 2018
Taxonomy Licensing
As a taxonomist who designs and creates taxonomies, I have
always advocated creating a customized taxonomy for each implementation, which
takes into consideration the particular set of content and type of users. Nevertheless,
there are situations when licensing a taxonomy (or any kind of controlled vocabulary)
created by a third party may be desirable, such as for a start of a taxonomy
that is then modified, for a single facet of a faceted taxonomy, or for tagging
multi-source research content.
Taking an existing taxonomy created by a third party, without modification, can have several problems. Its scope may be narrower than needed, or it might not be as detailed, so needed concepts would be missing. Its scope may be broader than deeded, or it may be more detailed than needed, so it’s cumbersome and not user friendly, and indexing with it would be inconsistent. Its language style might not suit the new users, so users cannot find what they are looking for. Its terms and even their alternative labels (synonyms), may not match the language of the content, so content may not get indexed properly. Finally, it might not even have the desired structure, such as the difference between a thesaurus and a hierarchical taxonomy
Taking an existing taxonomy created by a third party, without modification, can have several problems. Its scope may be narrower than needed, or it might not be as detailed, so needed concepts would be missing. Its scope may be broader than deeded, or it may be more detailed than needed, so it’s cumbersome and not user friendly, and indexing with it would be inconsistent. Its language style might not suit the new users, so users cannot find what they are looking for. Its terms and even their alternative labels (synonyms), may not match the language of the content, so content may not get indexed properly. Finally, it might not even have the desired structure, such as the difference between a thesaurus and a hierarchical taxonomy
Taxonomy Licensing Uses
Licensing a taxonomy can be done as a starting point, whereby
the taxonomy can then be sufficiently modified for its new use. Modifications
include removing concepts out of scope and not needed, adding missing concepts
and their relationships, creating additional alternative labels to existing or
new concepts, and changing the wording of selected preferred labels to conform
with the preference of the users. If only a fraction of concepts need changing,
and it’s more a matter of adding new concepts, then licensing can be a good way
to get a taxonomy up and running more quickly than starting from scratch.
Licensing a controlled vocabulary to serve for just one or
two facets or metadata properties of a larger taxonomy set may also be
practical option. A faceted taxonomy enables user to filter or limit search
results by a combination of concepts selected from multiple facets/filters. For
example, for images these could be: geographic place, location type, occasion,
person type, time of year, activity, and object. It might be desirable to
license a vocabulary for geographic place or person type and create the other
vocabularies. Other examples of a
single-facet taxonomy that might be of interest for licensing include product
types and industries. A facet may
contain a hierarchical structure or a flat list.
Licensing a taxonomy as is, with little or no modification,
is sometimes appropriate if the original purpose and the new purpose are the same
and the type of user is the same. This would not be the case for internally
created content, but if the content comes from multiple external sources, such
as published articles, and the users are conducting external research, then a
third-party created taxonomy in the desired discipline or industry might be
appropriate. Fields such as medicine, pharmaceuticals, engineering, and the
sciences in general may be suitable for licensing a taxonomy with little
modification.
Taxonomy Licensing Issues
The licensed taxonomy not only needs to be in the
appropriate subject area but needs to have been initially created for a similar
audience and purpose, which can be determined by contacting the original
creator/publisher of the taxonomy. For example, a subject area of “finance”
will have somewhat different concepts depending on whether it was created for
academic/research use or for internal enterprise content management use.
The licensed controlled vocabulary should be of the desired
type: classification system, taxonomy, thesaurus, ontology, etc. This is not
always obvious, since the distinctions between taxonomies, thesauri, and
ontologies can be blurred, and the term “taxonomy” is sometimes used for many
different kinds. So, it’s important to ask the taxonomy publisher specific
questions, such as how many top terms there are, what kinds of relationships
there are between concepts, and whether there are classes or categories
assigned to concepts.
If modification is going to be done, which is often the
case, the license needs to permit modification. An open source and free
taxonomy may restrict modification and require attribution to the source of the
unaltered taxonomy. An open source and free taxonomy usually prohibits
commercial reuse as well. A paid license, on the other hand, typically permits
modification, the use of the terms to create a new taxonomy (as a “derivative
work”), and commercial use.
A taxonomy that is available for license typically comes in
standard interchangeable format, such as CSV, XML, RDF, SKOS, etc., so it can be imported into
taxonomy/thesaurus/ontology management software, where it can be further modified.
An understanding of the formats is needed to select the most desirable one,
when multiple formats are supported.
Taxonomy Licensing Sources
Finding the right taxonomy is important. A good source of taxonomies and other vocabularies for
license is Taxonomy Warehouse, where you can search or browse for taxonomies by subject. Taxonomy Warehouse contains over 760 vocabularies of all kinds in all subject areas in various formats from 330 organizations. It’s the largest listing available of proprietary vocabularies
available for commercial-use licenses.
There is also a larger, more international resource,
developed and maintained by the University of Basel Library, the Basel Register of Thesauri,
Ontologies & Classifications (BARTOC). As a “register,” not all the
2,878 indexed vocabularies are available for license. Each vocabulary is
classified and assigned metadata for subject, category, vocabulary type, file
format, language, and license type, among other classifications. It’s quite comprehensive for open source/free
vocabularies, and has some, but is not as inclusive yet of, commercially
licensed vocabularies, but it’s growing
Some major information publishers who have developed extensive
thesauri or taxonomies to index their published content do offer the
vocabularies for license, but thee do not promote it, so this is little known,
and they reserve the right not to license vocabularies to a party considered a
competitor. Examples include the Gale
Subject Thesaurus and the Associated Press’ News Taxonomy.
Taxonomy Licensing Trends: A Survey
So, to what extent do organizations seek to license a
taxonomy as part of their knowledge or content management strategy? That’s a
good question. Thus, I have created a short multiple-choice questionnaire, the
results of which will be posted in a future blog post and may perhaps become a
conference presentation topic as well. Please take a few minutes (estimated
4 minutes) to fill out my short Taxonomy Licensing Interest
Survey.
Subscribe to:
Posts (Atom)

