In considering best practices for developing taxonomy term labels or names, there is the question about the use of the word “and” within taxonomy terms. My previous two blog posts were called “Tags and Categories” and “Card Sorting and Taxonomies,” which demonstrate how common it is to have the word “and” in titles, headings, or other labels. By extension, does it work in taxonomy terms?
The standards for taxonomies,
ANSI/NSIO Z39.19 Guidelines for the Construction, Format, and Management of
Monolingual Controlled Vocabularies and ISO 25964-1 Thesauri and Interoperability
with Other Vocabularies make no mention of terms with the word “and.” While it
is not explicitly prohibited, it is neither mentioned as an acceptable form
among the rather exhaustive list of term format types. Even the section on
compound terms makes no mention of terms with the word “and.” So, one might
conclude that terms should not have the word “and” within them. Yet it is not
uncommon, especially in larger, more specialized taxonomies and thesauri.
The simple little word “and”
can actually have two different meanings:
1) the intersection of two concepts, to include only that
which belongs to both, which is the Boolean operator AND
2) the combination or union of two concepts, to include
any of either, which is actually the Boolean operator OR.
When it comes to taxonomy
terms, the word “and” could have either of the above two usages, and it’s very
important to know which it is in which case.
“And” meaning AND
My blog post title “Card
Sorting and Taxonomies” involves the first meaning, the intersection of both
concepts, which in this case is the use and suitability of card sorting
specifically for taxonomies. “Card Sorting and Taxonomies” is more concise than
saying “the suitability of card sorting for taxonomies,” and taxonomy terms
need to be concise. Examples of the use of “and” in this (Boolean AND) meaning
in taxonomy terms that I have run across include:
Children and Television
Gender and Poverty
The choice of using “and” is
significant. It means any intersection/relation of these two concepts. “Children
and Television” comprises all of the following: children’s television shows,
the impact of television (not just children’s programming) on children, the depiction
of children in television, etc. Similarly “Gender and Poverty” covers various
issues, such as data on poverty rates by gender, how poverty effects the
genders differently, and reasons why more women are poor in developing
countries.
It is easy to identify this
meaning of the word “and” when the two concepts linked by the conjunction are
quite distinct. In many taxonomies, the preferred policy is to avoid creating
such terms, lest the taxonomy become too large and complex.
“And” meaning OR
My blog post title “Tags and
Categories” involves the second meaning, the combination of both concepts. I
described what tags were and what categories were and compared them. Examples
of the use of “and” in this (Boolean OR) meaning in taxonomy terms that I have
run across include:
Measurement and Analysis
Laws and Regulations
Roads and Highways
Maintenance and Repair
An additional example is the
title of the online course I teach: “Taxonomies
and Controlled Vocabularies.”
The main reason to create such
terms is that, while some content deals with one or the other of the two linked
words, a significant amount of content really has to do with both, and users
probably don’t care to make the distinction either, so it’s better to have just
a single concept in the taxonomy. But one word is not equivalent to the other,
so a taxonomy term cannot be created from just one word and the other designated
as its nonpreferred term/synonym. Another situation for these types of taxonomy
terms is a small browsable taxonomy that does not utilize/support synonyms. An
additional reason to create them is that they can boost SEO (search engine optimization) in website labels
by giving more words prominence. Finally, the combined terms can also appease
competing stakeholders who both want their preferred label as part of the term
name.
The difference in a taxonomy
If you have taxonomy terms
with the word “and” in them, it needs to be clear which of these two Boolean meanings
it is, not only to ensure accurate content tagging, but also to ensure the proper
relationship of the term to other terms in the taxonomy. Recently I was
reviewing a taxonomy with the term “Investment and Trade” and by itself, I
could not determine whether it meant the intersection of combination of these
two words, so I didn’t not know how it should be related to terms of
“Investment” and “Trade.”
A term with the Boolean AND
is a narrower term to terms of both its component parts, what is known as
polyhierarchy. “Children and Television” is narrower to both “Children” and to
“Television.” When there occurs a term with Boolean OR, such as “Measurement
and Analysis,” it is expected that the component words to not exist as
preferred terms in the taxonomy. Rather, each word “Measurement” and “Analysis”
could be nonpreferred terms/synonyms for “Measurement and Analysis.
Good point. The distinction is relevant in classification too. In UDC each "and" has a separate symbol:
ReplyDeleteA and B = A:B
A or B = A+B
While A and B is narrower than A, A or B is broader than A, so should be listed before A in browsable menus (something needing additional scripts in interfaces).
There are also some weird implications for relevance: a resource indexed as A:B will be retrieved more likely than one indexed just as A, as it will be a result of both a search for A and a search for B. This also means that as you add tags to a document, as more likely it will be retrieved. This is paradoxical, as in principle narrower subjects should be less often relevant than broader subjects (a fruit encyclopedia is also relevant to a search for apples).
I think that part of the problem is the use of the woolly term "taxonomy", which I dislike because it means so many different things to different people.
ReplyDeleteIf we are talking about a thesaurus, then we define individual concepts and give them unique labels. If the concept is broad, it might be necessary to use a multi-word label, such as "laws, rules and regulations", as you suggest; this is still a label for a single concept, defined in its scope note. The multiple words in the label all indicate concepts of the same fundamental category, i.e. they are all from the same facet.
If we are talking about a classification scheme, then we can combine concepts from different facets, like "children AND television" to indicate the intersection of these concepts.
A difficulty arises if we use combinations like "farms and farmers" or "animals and zoos". These cannot be built into a proper hierarchical structure, because "farms and farmers" is not a type of organization or a type of person, so it cannot have either of these as broader terms. Such a term should not be admitted in a thesaurus, though it can be expressed in a classification scheme by the A+B notation that Claudio refers to.
Thanks, Leonard, for bringing up the examples of problematic terms that attempt to combine different kinds/classes of concepts, such as "farms and farmers."
Delete