Terms in a taxonomy are generally nouns or noun phrases, but this does not mean that a taxonomy cannot comprise adjectives or verbs instead. There may be differences of opinion on this, though.
A thesaurus, another kind of controlled vocabulary, by contrast, is expected to follow standards (ANSI/NISO Z.39.19 or ISO 25964), which dictate that the terms be only nouns or noun phrases. Since a thesaurus is more structured than a taxonomy, it might be assumed that a thesaurus is a kind of taxonomy with additional features (nonpreferred terms, associative relationships, scope notes, etc.), but that the basic format of the terms are the same. In general, this is true. Terms in the vast majority of taxonomies follow the same format as terms in thesauri, and the differences between these two different knowledge organization systems lie rather in their use of term relationships and additional attributes.
Taxonomists should attempt to follow the thesaurus standards when creating taxonomies, to the extent that is practical or relevant. Reflecting the content and serving the users are always the first priorities for taxonomies. So, there may be cases when terms as adjectives or verbs are practical.
Taxonomies vary more than thesauri do, though. While the structure of a thesaurus is consistent, taxonomies can be based on hierarchies or on facets or a combination of both. Facets are lists of terms to describe certain attributes, aspects, limit-by/filter-by categories, or metadata fields. Facets could include types such as color, size, speed, etc., in which the terms in these facets are adjectives, for example the names of individual colors.
Taxonomies with terms that are verbs are even less common than taxonomies with terms that are adjectives. Taxonomy terms of verbs (not merely verbal nouns ending in -ing) are found in only very special-purpose taxonomies. As with taxonomies with adjectives, the verb terms would not comprise or be scattered throughout an entire hierarchical taxonomy, but would rather serve as shorter term lists or facets. A good example, is Bloom’s taxonomy of educational outcomes, which is just the short list of the following verbs in this order: Remember, Understand, Apply, Analyze, Evaluate, and Create. Taxonomists might dismiss Bloom’s as not really a “taxonomy,” but it is very common to use Bloom’s terms in a facet within a faceted taxonomy for educational content.
Sets of longer verb phrases may stretch the definition of taxonomy or controlled vocabulary, but they still serve the same purpose of a controlled list within a metadata property used to tag content. This is the case for learning objectives used to tag educational content. An example of a learning objective is: “Classify costs as direct versus indirect.” Learning objectives can even be put into a hierarchy, like other taxonomies.
Metadata of phrases that begin with verbs could also be used to describe processes or procedures. I had been asked once to design a “taxonomy” for the steps and options of statements/questions to be made by sales representatives as they go through the process of achieving a sale. These “terms” would have been verbal statements similarly complex as learning objectives. The issue I had with calling it a taxonomy is that the statements would not be arranged hierarchically of broader/narrower, but rather in a flow-chart procedure format. Indeed, this would have violated the definition of a taxonomy which has to have some hierarchy. However, this would have resemblance to an ontology with its semantic relationships. So, such a procedure system still would be a kind of knowledge organization system.
Topics related to information management taxonomies posted by the author of the book, The Accidental Taxonomist.
Wednesday, May 24, 2017
Sunday, April 23, 2017
Taxonomy Term Specificity
One of the challenges in creating or editing taxonomies is
determining how specific the terms should be. This is a key issue in making a taxonomy customized for a certain
implementation, which involves a unique set of content to be tagged/indexed and
a certain set of users. Highly specific
terms tend to be the consequence of deeper hierarchies. So, the decision of how
specific the terms should be is also related to the decision of how many
hierarchical levels of depth the taxonomy should be. Taxonomies that are
organized into multiple facets, on the other hand, tend to have more limited
hierarchy, if any, and terms that are not so specific.
Having taxonomy terms that are more specific than necessary
inevitably means that there are more taxonomy terms than necessary. The larger
taxonomy is more difficult to maintain both in currency and consistency. Terms
that are more specific than necessary are also likely to be more specific than
expected by the users and might get overlooked and not even used. If the
taxonomy is searched, the users will not likely search for such highly specific
terms. If the taxonomy is browsed, the users might stop at a higher-level
broader term and be satisfied with that. Furthermore, users like to retrieve
multiple results (content items or references) for a single search term, so
that they can browse the list and evaluate what they want. Highly specific
terms will match fewer content items, so retrieved results could comprise only
one or two items per taxonomy term, which may not satisfy most users. Having a
greater number of more specific terms can also lead to more inconsistency in
the indexing/tagging, whether manual or automated.
Having taxonomy terms that are not specific enough means
that each taxonomy term is indexed to a relatively large number of content
items, and the users may have to scroll through multiple screens of returned
results and look at multiple items to find what they really want. The availability
of additional filters or facets can help limit the results, though. Having
terms that are not specific enough also makes it more difficult for users to
“discover” potential related topics of interest, whether the terms have “related-term”/”see
also” relationships between them or whether “related” terms are suggested by
shared tagged occurrence among content items.
Taxonomists sometimes refer to term specificity as “granularity”
or a taxonomy being “granular.” There is the irony that, although the scope and
meaning of specific terms is granular/narrow/small, the terms themselves are
not small. The “granular” terms tend to be longer, more complex, multi-word
terms. If combining multiple concepts into a single term, such terms might also be called "pre-coordinated" terms. Following are examples of specific, granular taxonomy terms from
different specialized taxonomies:
- Possessed object access systems (in an information technology taxonomy)
- Fingerstick blood sugar testing (in a health care taxonomy)
- Standard manufacturing overhead cost (in a business taxonomy)
The taxonomist typically creates specific/granular terms,
based on the concepts of sample content to be tagged. There may be a document
with the phrase in the title, an image with the phrase in its caption, a
product with this description as its type, a department with the phrase in its
name, etc. Obviously, source phrases would need to be edited to become
well-formed taxonomy terms, but they may still be multi-word, complex terms.
Creating a taxonomy from scratch usually involves a combination of a top-down
and bottom-up approach in the development of terms and hierarchical
relationships. The specific/granular terms are the result of the bottom-up component
of taxonomy development.
Taxonomies available for license might be appropriate in
their subject area and scope, but chances are that their terms get either too
specific or not specific enough for different implementations. Thus, if you
choose to license a taxonomy, make sure your license allows you to customize the
taxonomy so that you can either delete terms that are too specific or add more
terms, as narrower terms to existing terms, that are more specific to suit your
content
Creating or deleting specific terms is also part of periodic
taxonomy maintenance. If a term, which has no narrower terms, is heavily used
in indexing, it might be time to “break it up” be creating a few more specific,
narrower terms so that the large content set is indexed and retrieved with more
specific terms for more manageable result numbers. If, over a period of time, a
specific terms has been applied in indexing very few times, or not at all, it
should probably be deleted. The deleted term can be changed to a
variant/nonpreferred term/alternative label for an existing broader concept. The
specificity of a taxonomy should match the specificity of the content being
tagged with it, and this can change over time.
Friday, March 17, 2017
Taxonomies as Knowledge Organization Systems
Controlled vocabularies comprise simple term lists, synonym
rings (search thesauri), authority files, taxonomies, and thesauri. Knowledge
organization systems comprise all of these, plus categorization schemes,
classification schemes, dictionaries, gazetteers, glossaries, ontologies,
semantic networks, subject heading schemes, and terminologies. As such,
knowledge organization systems can be considered to be broader than controlled
vocabularies, including all kinds of controlled vocabularies and more.
Yet, it’s not simply a matter of more types that distinguish
knowledge organization systems. Knowledge organization systems include
“schemes” that go beyond how the terms are organized and related to each other.
Categorization schemes, classification schemes, semantic networks, ontologies
present not only terms and relationships but also models of how
information/knowledge can be managed and organized. These typically involve
additional specifications and documentation on how they are to be used. There
is indeed something to the name “knowledge organization system.” A “system” is
more than just terms and their relationships.
As such, there is more discourse around knowledge
organization systems than controlled vocabularies, per se (separate from
discussions specifically about taxonomies or thesauri). Conference sessions of
the Association for Information Science & Technology (ASIS&T) more
often have “knowledge organization systems” in their titles than “controlled
vocabularies.” There is even a professional association dedicated to knowledge
organization systems, the International Society for Knowledge Organization (ISKO). There is no comparable organization for controlled
vocabularies or just taxonomies or thesauri. ISKO holds conferences with
sessions around the various issues of knowledge organization systems, including
taxonomies. Recognizing that taxonomies are an important kind of knowledge
organization system, the ISKO UK chapter co-sponsors the Taxonomy Boot Camp
London conference.
Taxonomies are not only included within knowledge
organization systems, but they are also a part of the field of knowledge
management. As a consultant, I worked with clients who managed taxonomies
within their knowledge management services, headed by a manager or director of
knowledge management. Also, at a consultancy where I previously worked,
taxonomy consulting was part of the larger knowledge management consulting
practice
I used to describe taxonomies as only a kind of controlled
vocabulary, but now I will start referring to them as knowledge organization
systems as well.
Sunday, February 19, 2017
Avoiding Mistakes in Taxonomy Hierarchical Relationships
Perhaps the most important issue in
designing a hierarchical taxonomy is creating hierarchical relationships
between terms correctly. This makes the taxonomy intuitively easy to understand
and navigate by all kinds of users, regardless of whether they have had any
training on using a taxonomy.
The basic principles of the hierarchical
relationship are described in the ANSI/NISO Z39.19 and ISO 25964-1 standards for thesauri.
As a quick summary, the relationship is created between terms in the following
circumstances:
- a broader term which is generic and a narrower term which is a more specific type of the generic broader term,
- a broader term which is generic and the narrower term is a named instance (proper noun) of the generic broader term,
- a broader term which is a whole entity and a narrower term which is an integral part.
It is the first, generic-specific type
that is most common, but is also most prone to errors by those not experienced
in creating taxonomies. Typical errors include confusing refinement and
narrower terms, too closely reflecting the source content hierarchy, and
creating narrower terms that are applications, uses, or examples of a broader
term.
Confusing Refinements with Narrower Terms
We envision users browsing a
hierarchical taxonomy from top down, from broad topic to more specific topic. A
more specific topic is a narrower term (NT) of a broader topic. However,
instead of providing more specific topics, the creator of a taxonomy might
mistakenly provide refinements of the broader topic, which are aspects of the
topic, but not actually narrower terms. A term that is an aspect or refinement
is not a unique stand-alone term/concept, but rather it is meant to be used in
combination with its parent term.
An example of such an erroneous
hierarchy would be:
Eye diseases
--Diagnosis
--Diagnosis
Diagnosis is an aspect or refinement of
Eye diseases (and of other disease-type terms), and not a narrower term. A narrower
term would be specific type of eye disease:
Eye diseases
NT: Glaucoma
A refinement term might not be as
obvious as it is in the above example. If the same term, however, appears duplicated
as a narrower term to different broader terms, but with a different implied/contextual
meaning in each case, this should be red flag that the duplicated narrower term
is really a refinement term. For example, the duplication of the term Waiver in
a legal taxonomy as:
Objections to evidence
--Waiver
Right to jury trial
Right to jury trial
--Waiver
In this case, the duplicate narrower
term should be changed to be specific in each case, such as: Objections to
evidence waiver and Right to jury trial waiver.
Novice taxonomists might create such
incorrect broader term-narrower term relationships because they have seen them formed
as such elsewhere, such as Library of Congress Subject Headings plus
Subdivisions or back-of-the-book index main entries plus subentries. A
subheading or a subentry is not the same as a narrower term, because a
subheading or a subentry only has usage and meaning in the context of the main
heading it is associated with (appears under). A taxonomy narrower term, on the
other hand, is not a different kind of term, but is rather a description of a
relationship between terms. The meaning of a term in a taxonomy is constant and
not dependent on its location in the taxonomy.
Too Closely Reflecting the Source Content Hierarchy
Some taxonomies are based heavily on
certain text sources, such as the table of contents of one or a limited number
of books or manuals, where the text is structured into units, chapters, main
heading sections, subheading sections, etc. It is thus natural to make use of
the structure of the text as a basis for the structure of the hierarchy. But
there can be issues.
In the following example of a chapter
and its headings from a textbook, greater hierarchical structure is needed for
the corresponding taxonomy terms, and one of the topics (Units of Measure) does
not belong within this hierarchy.
Microbiology Laboratory
--Microbiology Lab Personnel
--Introduction to the Microscope
--Introduction to the Microscope
--Units of Measure
--Types of Microscopes
--Laboratory Staining Methods
--Culture Media
--Serology
These concepts may appear in a taxonomy arranged hierarchically as follows:
These concepts may appear in a taxonomy arranged hierarchically as follows:
Medical laboratory technology
NT: Laboratory equipment and supplies
NT: Culture media
NT: Microscopes
NT: Microscope types
NT: Laboratory personnel
NT: Microscope use
NT: Microscopy stains
NT: Serology
Procedures
Procedures
NT: Measurements and calculations
NT: Units of measure
Another issue is that, even when the the
hierarchy from the source is acceptable, the subheading-based terms are short,
generic, and without context. An example is as follows:
Eye Medications
--Anti-infectives
--Anti-inflammatory Agents
--Antiglaucoma Agents
--Local Anesthetics
The only correct narrower term above is
Antiglaucoma Agents, as the other terms are not specific to eye medications. They
could be linked as related terms instead.
Applications, Uses, or Example-Type Terms
Relying too much on certain text sources
for the taxonomy may also result in erroneously creating narrower terms for the
applications, uses, or examples of the broader term concept, because the text
presents content that way.
Following are several examples:
Web Applications
--Tourism and Travel
--Publishing
--Higher Education
--Higher Education
--Employment
--Financial Institutions
--Software Distribution
--Health Care
Decision making issues
--Complexity
--Ethical conflicts
--Information sources
--Intraorganizational conflicts
--Social influences
Globalization challenges
--Cultural differences
--Economic risk
--Political risk
--Managerial limitations
Each of these so-called narrower terms
are merely examples within the context of the broader term. All "narrower
terms" could have other uses beyond the context of the broader term. To
make the hierarchy correct, either:
1) the relationship should be changed
from narrower-term (NT) to related-term (RT). This would be the case, if these
terms can logically exist elsewhere in the taxonomy. Also, indexing of the
concepts may require a pair of terms (such as Globalization challenges AND
Economic risk),
or
2) the narrower terms should be
modified and clarified, such as Cultural challenges to globalization, Economic
risk challenges to globalization, Political challenges to globalization, and
Managerial challenges to globalization. This would be the case, if these terms
did not exist elsewhere in the taxonomy.
In conclusion, hierarchical
relationships need to be constructed independent of any sources for terms, and they
need to be universal and not subject to certain contexts.
Friday, January 20, 2017
Orphan Terms in a Taxonomy
A taxonomy has hierarchical relationships between all of its terms, so one of the quality control checks on a taxonomy is to ensure that there are no “orphan” terms, which are terms that lack hierarchical relationships. One of the purposes of a taxonomy is for users to be able to navigate it (whether it is fully displayed or whether the links between only the selected terms are displayed), in order to find terms of interest. An orphan term, thus, cannot be found by browsing, only by searching.
Taxonomy/thesaurus management software can generate orphan term reports. However, as there are different kinds or definitions of taxonomies or thesauri, there are also different kinds or definitions of orphan terms. Certain definitions of orphans may be permitted, other kinds of orphans may be permitted in only certain kinds of controlled vocabularies, and some kinds of orphans are never permitted in any taxonomy or thesaurus.
There are two main differences between strictly defined taxonomies and thesauri that have an impact on orphan terms.
The loosest and easiest to remember definition of an orphan term is a term which lacks a “parent”. In other words, the term has no broader term, but it may have other kinds of relationships to terms. A “top term” report of taxonomy/thesaurus management software will get this result, since all top terms are, by this definition, orphans.
An orphan term could also be defined as a term that has no hierarchical relationships, whether broader or narrower. In a thesaurus, such terms could have associative relationships only. In a taxonomy (lacking associative relationships), these terms then would have no relationships to other terms in the taxonomy.
At the strictest definition, an orphan term is defined as a term which lacks any relationships to any other term. This would be the same in a taxonomy or a thesaurus.
Finally, taxonomy/thesaurus management software may have the feature to allow you to define your own orphans, that is to designate a relationship type and then generate a list of terms that lack that relationship type to any other terms.
Orphans defined merely as those lacking broader terms, are not necessarily a problem, since every taxonomy or thesaurus has top terms. For quality control, you would want to ensure that these parent-less “orphans” are indeed the top terms that you want. For a taxonomy, there are strict criteria for top terms. They must be broad-meaning categories under which are extensive hierarchical trees, perhaps even of a similar depth and breadth for each top term. For thesauri, the requirement for top terms are usually not strict, but it is still a good idea to review the top terms to ensure that there really is no appropriate broader term move them under.
An orphan report of the kind that indicates terms that lack any hierarchical relationship (narrower or broader) but may have associative (related-term) relationships is quite helpful when editing thesauri. It will depend on the thesaurus owner whether the policy should permit such “hierarchical orphans.” Generally, such orphans should at least be avoided and perhaps permitted in only exceptional circumstances.
Orphans defined as terms that lack any relationships to other terms in the taxonomy should not be permitted in any circumstance. They don’t serve the navigation feature of a taxonomy, as there is no way to find them without search. If a suitable broader term within the taxonomy cannot be found, then they may be out of scope of the taxonomy/thesaurus. Usually, though, such orphan terms are the results of taxonomist error. If the taxonomy management software permits duplicate terms, these orphans could be duplicates of synonyms/nonpreferred terms/alternative labels.
In the case of orphan terms that lack broader terms but are not obviously top terms, the taxonomist should search the taxonomy/thesaurus for a suitable broader term. If one cannot be found, careful consideration should be made whether a new term should be added that would both serve as a broader term for the orphan term but also have a suitable broader term of its own already in the taxonomy/thesaurus. If dealing with a thesaurus rather than a taxonomy, then it may be OK to leave the term without a broader term, but then the related-term relationships should be checked and possibly enhanced so that there are multiple related-term relationships.
Sometimes stretching the thesaurus rules for hierarchical relationships may be desired to provide a broader term to an orphan. This is generally acceptable in a taxonomy but not in a thesaurus. Following are examples of former orphan terms whose candidate broader terms are not 100% correct broader terms (the narrower term is not a kind of or a part of its broader term), but they are close, so these relationships could be made, even in a thesaurus. What follows in parentheses are theoretical broader terms which are not practical terms to create.
Orphans that lack any relationships are usually the result of taxonomist error. Perhaps the taxonomist got interrupted and did not complete the process of relating a term and then forgot. In many cases these orphans should have been made as synonyms/nonpreferred terms/alternative labels. The taxonomist should run orphan reports frequently enough to remember whether the orphan term was intended to be a preferred or a nonpreferred name.
More examples of how to resolve orphan terms are in a PDF of a PowerPoint presentation “Managing Mature Taxonomies: Resolving Orphan Terms” I gave as an SLA Taxonomy Division webinar in December 2016.
Taxonomy/thesaurus management software can generate orphan term reports. However, as there are different kinds or definitions of taxonomies or thesauri, there are also different kinds or definitions of orphan terms. Certain definitions of orphans may be permitted, other kinds of orphans may be permitted in only certain kinds of controlled vocabularies, and some kinds of orphans are never permitted in any taxonomy or thesaurus.
Differences between taxonomies and thesauri
There are two main differences between strictly defined taxonomies and thesauri that have an impact on orphan terms.
- A taxonomy has only hierarchical (broader-narrower) relationships between its terms, whereas a thesaurus has both hierarchical and associative (related-term) relationships between terms.
- In a taxonomy, all terms belong to a single or limited number of hierarchies, each with a designated, broad-meaning “top term,” whereas in a thesaurus hierarchical relationships are created between terms merely as appropriate, without regard to any larger hierarchies or top terms. A taxonomy thus has a top-down inverted tree structure, whereas a thesaurus does not necessarily have an over-arching hierarchical structure.
Different kinds of orphan terms
The loosest and easiest to remember definition of an orphan term is a term which lacks a “parent”. In other words, the term has no broader term, but it may have other kinds of relationships to terms. A “top term” report of taxonomy/thesaurus management software will get this result, since all top terms are, by this definition, orphans.
An orphan term could also be defined as a term that has no hierarchical relationships, whether broader or narrower. In a thesaurus, such terms could have associative relationships only. In a taxonomy (lacking associative relationships), these terms then would have no relationships to other terms in the taxonomy.
At the strictest definition, an orphan term is defined as a term which lacks any relationships to any other term. This would be the same in a taxonomy or a thesaurus.
Finally, taxonomy/thesaurus management software may have the feature to allow you to define your own orphans, that is to designate a relationship type and then generate a list of terms that lack that relationship type to any other terms.
Which kind of orphans to avoid
Orphans defined merely as those lacking broader terms, are not necessarily a problem, since every taxonomy or thesaurus has top terms. For quality control, you would want to ensure that these parent-less “orphans” are indeed the top terms that you want. For a taxonomy, there are strict criteria for top terms. They must be broad-meaning categories under which are extensive hierarchical trees, perhaps even of a similar depth and breadth for each top term. For thesauri, the requirement for top terms are usually not strict, but it is still a good idea to review the top terms to ensure that there really is no appropriate broader term move them under.
An orphan report of the kind that indicates terms that lack any hierarchical relationship (narrower or broader) but may have associative (related-term) relationships is quite helpful when editing thesauri. It will depend on the thesaurus owner whether the policy should permit such “hierarchical orphans.” Generally, such orphans should at least be avoided and perhaps permitted in only exceptional circumstances.
Orphans defined as terms that lack any relationships to other terms in the taxonomy should not be permitted in any circumstance. They don’t serve the navigation feature of a taxonomy, as there is no way to find them without search. If a suitable broader term within the taxonomy cannot be found, then they may be out of scope of the taxonomy/thesaurus. Usually, though, such orphan terms are the results of taxonomist error. If the taxonomy management software permits duplicate terms, these orphans could be duplicates of synonyms/nonpreferred terms/alternative labels.
Resolving orphan terms
In the case of orphan terms that lack broader terms but are not obviously top terms, the taxonomist should search the taxonomy/thesaurus for a suitable broader term. If one cannot be found, careful consideration should be made whether a new term should be added that would both serve as a broader term for the orphan term but also have a suitable broader term of its own already in the taxonomy/thesaurus. If dealing with a thesaurus rather than a taxonomy, then it may be OK to leave the term without a broader term, but then the related-term relationships should be checked and possibly enhanced so that there are multiple related-term relationships.
Sometimes stretching the thesaurus rules for hierarchical relationships may be desired to provide a broader term to an orphan. This is generally acceptable in a taxonomy but not in a thesaurus. Following are examples of former orphan terms whose candidate broader terms are not 100% correct broader terms (the narrower term is not a kind of or a part of its broader term), but they are close, so these relationships could be made, even in a thesaurus. What follows in parentheses are theoretical broader terms which are not practical terms to create.
- College applications BT College admissions (and not a BT of Applications)
- Behavior problems BT Behavior (and not a BT of Problems)
- Atmospheric composition BT Atmosphere (and not a BT of Composition)
- Conflict termination (Military science) BT Wars (and not a BT of Termination)
Orphans that lack any relationships are usually the result of taxonomist error. Perhaps the taxonomist got interrupted and did not complete the process of relating a term and then forgot. In many cases these orphans should have been made as synonyms/nonpreferred terms/alternative labels. The taxonomist should run orphan reports frequently enough to remember whether the orphan term was intended to be a preferred or a nonpreferred name.
More examples of how to resolve orphan terms are in a PDF of a PowerPoint presentation “Managing Mature Taxonomies: Resolving Orphan Terms” I gave as an SLA Taxonomy Division webinar in December 2016.
Subscribe to:
Posts (Atom)