Tuesday, March 22, 2022

Taxonomy Quotes

Taxonomies are very valuable, but not always easy to define, and they are described in various ways. They are also interdisciplinary, as taxonomies are developed by people in different fields for slightly different, yet similar purposes. I have heard various comments about taxonomies over the decades.

 

In the earlier years of the Taxonomy Community of Practice discussion group, a Yahoo group, which was the precursor of the current Taxonomy and Ontology Community of Practice LinkedIn group, the group’s moderator, Seth Earley, put out a call to the group’s members for a motto for the group. The winning quote, which became the group’s motto, was: “Taxonomies: That’s classified information,” by Jordan Cassel.

 

 

There were over a dozen other good suggestions for the motto which were posted in the group in January 2009. That turned out to be shortly before I wrote the first edition of my book, The Accidental Taxonomist, so, with permission, I took additional motto-quotes as opening headers to each of the 12 chapters of my book. The same quotes continued in publication of my second edition in 2016.

 

As I now am preparing a third edition (expected out in late fall 2022), I decided to refresh the chapter head quotes. Last month I put out a call for quotes in both the Taxonomy and Ontology Community of Practice LinkedIn group and in my own network. Some quotes were lengthier than before, as they were no longer submissions for a motto. I received far more submissions than I have chapters, and I have also decided to keep some of the original quotes (including the first one). Yet many of these quotes are quite thoughtful and/or clever, so I would like to share these new quotes here.

 

In true taxonomist fashion, I have categorized these quotes as about taxonomies, about taxonomy creation, about ontologies as compared to taxonomies, about taxonomies, and the a few particularly witty quotes at the end.

 

About taxonomies

 

Taxonomies: organizing the disorganized.
—June Tsang

 

Without Taxonomies; entropy!

—Hakan Strom

 

Ambiguity is the thief of Knowledge.

—Robert Vane

 

Good taxonomy is a love letter to the future.

—Gary Carlson

 

Taxonomies - organised, effective tagging. 

—Alison Jones

 

Taxonomy: Levels in the Playing Field

—Merridy Cox (Bradley)

 

Knowledge organisation, search, and use combine to enable us to navigate the workplace.

—Bill Proudfit

 

Your Taxonomy, like all metadata, is an expression of what's important to you and to the collection.

—Peter Krogh

 

Taxonomies are, first of all, an act of self discovery on how we understand the world.

—Andrea Splendiani

 

 

About taxonomy creation

 

Taxonomy: generalize or specify, that is the question.

—Fabiola Aparecida Vizentim

 

Taxonomy: The perfect mix of art and science.

—Mollee Marcus

 

Taxonomies: Normalizing to help you find, report and aggregate across data & content

—Rita M. Benitez

 

Regardless of domain, taxonomy is the science of sorting and labelling information so it can be retrieved for future use.

—Leah B.

 

Do your best to ignore even your most strongly held convictions. If you want to create a user-friendly taxonomy/ontology system, follow the data, not your heart.
—Rebecca B. Weiss

 

Taxonomy is such a great battleground to focus consistently on improving the user experience; it’s a first key activity to drive the user experience.

—Vellaichamy Shunmugavel

 

To ontologize or not to ontologize, that is the question you should ask yourself in the first place.

—Erick Antezana

 

 

 

About ontologies (or ontologies compared with taxonomies)

 

Taxonomies tell stories, ontologies create worlds.

—Fran Alexander

 

Taxonomies classify; ontologies reify.

—Beatrice Larentis

 

Ontology: generating knowledge by connecting the dots.

Taxonomy: is like a drawer organizer for kitchen cutlery.
—Brigita Perchutkaite Vollstedt

 

If a taxonomy is an elevator, an ontology is a Wonkavator!

—Caroline Coward

(Referencing Willy Wonka and the Chocolate Factory: like an elevator but also can go sideways and in all directions.)

 

Ontologies make the implications explicit.

—Michele Ann Jenkins

 

A good ontology maps the way out of chaosville.

—Mark Atkins

 

Ontologies: organizational substrate for your data, information, and know-how enzymes.

—Heather Fox

 

 

About taxonomists

 

—Meg Morrissey

I wanted to figure out my place in the world, so I hired a taxonomist.

 

Only when one’s data is all over the place is it discovered that a taxonomist is necessary.

—Rebecca Custis

 

Be the Taxonomy you want to see in the World!
— Elaine Chu

 

I say this categorically, taxonomists are an organized bunch.

Jordan Casell

 

Taxonomies: now you're where you belong.

—Alan S. Michaels

 


And the especially witty ones 😉

 

Ontology, Category, Property - Happy user will be! Try me, Find me, Surprise me :)

—Dorothee Balas

 

Year Make Model Engine Transmission Leather Navi Owners Accidents Miles Color: = my used-Taxi Taxonomy.

—Tony Mariella

 

Taxonomy is taxidermy for data -- mounted on a framework and stuffed for the purpose of display and study.

—Phil Taylor

 

Ontology: One graph to rule them all, one graph to find them, one graph to bring them all and in the semantic web bind them.
—Xeni Kechagioglou


I never metadata I didn't like

—Paul Belfanti

 

Taxonomy? Taxonoyou!

—Ron Cascella

 

Friday, February 4, 2022

Defining a Taxonomy’s Scope

In planning a taxonomy, I have often said that it is important at the beginning to define the taxonomy’s scope, specifically the subject area scope of the taxonomy’s terms, but without going into more detail. Recently I was asked by a client how to define a taxonomy’s scope. This is a good question. The taxonomy should be suited to the subject area scope of the content that will be tagged with the taxonomy and to the scope of the user’s expectations. Terms or topics only marginal to the subject scope, however, could occur in the content, and whether they should also be included in the taxonomy is a question. Ultimately, that should depend on whether user expectations justify it, as the needs of users should also be a factor in creating a taxonomy. A taxonomy should suit both its content and its users.

Sources for Taxonomy Terms

For content as a source of taxonomy terms, a combination of manual and automated approaches is recommended. By manually reviewing sample individual documents or content items, you can discern the main ideas and main topics, which should form the start and basic structure of the taxonomy and also help define its scope. Automated methods of extracting terms, through text analytics technologies, can bring in many additional terms from a much larger corpus of documents more quickly, picking up terms that a limited manual review would miss. Even though automated text analytics extracts terms based on relevancy and frequency of occurrence, such terms could be out of scope of the subject domain. That’s why it’s important to start first with a manual review of content to define the subject scope.  Then, when you enrich the taxonomy with automated extraction, you can approve terms that appear to be in scope or at least closely relevant and reject others. But should you reject all that are out of scope, even if they appear with sufficient frequency and relevancy? My advice is to try to assume the role of the user. Ask yourself: Might a user want to search for content on this term in this content collection?
 
For user needs and expectations as a contributing source of taxonomy terms, obtaining this information can be very direct, such as by creating a user questionnaire (at least for your internal users) that asks what the topics of importance are, how those users would define the scope, and what “marginal” topics would be acceptable for them to include. You could also request sample challenging (not expected, basic, typical) queries that the users would make.  Another good way to obtain input from the user side is to look at search query logs that list search strings that users have entered over a period of time, ranked by frequency. If a search phrase that is slightly out of scope of the subject occurs frequently, then the term should still be considered for inclusion in the taxonomy.

In either case, the scope of the subject gets better defined as the taxonomy is created. For example, a taxonomy for recipes may initially be scoped to comprise terms for the names of dishes, ingredients, and cooking method. But then a different term shows up significant frequency, “Nutrition Facts.” If it occurs in both the content and the user research, then it likely should be included.  If it shows up in the content only, but is not validated in user research, then it is more questionable.

Taxonomy Structure

The initial taxonomy structure itself tends to impose limits on scope. Taxonomies tend to be hierarchical with a limited number of top terms. If a candidate term appears in the content that does not seem to belong anywhere in the current taxonomic hierarchy, you might be inclined to exclude it. Factors of user needs (they might want to look up this term in this content), however, should take precedence. For example, the term “COVID-19” might be marginal but still of interest to be included many taxonomies on varied subjects, but there would exist no broader term for diseases in those taxonomies. Then adjustments need to be made, such as renaming or adding broader terms, or perhaps, more likely, the proposed term should be modified to fit the context of the taxonomy, such as becoming “COVID-19 impacts.”

Another thing to consider is adopting more a thesaurus structure than a taxonomy structure, at least for the facet or concept scheme of the taxonomy that is for miscellaneous “topics.” One characteristic of thesauri is to not rely so heavily on extensive hierarchical trees. What this means is that you could decide that it is acceptable that not all terms have broader terms and thus it’s OK to have a very large number of top terms, with the more specific terms linked to other terms only by related-term relationships, another feature of thesauri, if not by broader/narrower-term relationships. Abandoning the full hierarchical tree structure should only be considered if this hierarchy is not displayed as a navigation to the end users.

Documenting Policy

In any case, you need to define policies regarding what kinds of terms can be added and what kinds should not. This will evolve out of the activity of building the taxonomy, especially from evaluating what extracted terms to approve and what search log terms to approve. Whoever is doing this task (hopefully more than one person), should document each instance of uncertainty. While many term approvals and rejections will be obvious, there will be a gray area. This should be collected and discussed together, and then a policy can emerge.

Tuesday, January 11, 2022

Taxonomist Survey

In keeping with the title of this blog, it’s time to check in again to learn more about who taxonomists are and what they are doing. I conducted a survey of taxonomists (promoted through discussion lists, groups, and social media) in 2009 to gather information for my book, The Accidental Taxonomist, and again in 2015 for its second edition. I compared the results over those 6 years in a prior blog post, Taxonomist Trends. Now I have republished the identical taxonomist survey from 2015 on the SurveyMonkey platform at the start of this month January 2022, and have already gathered more responses than the 150 who responded in May 2015. So, I can provide a peak at preliminary results of a couple of questions, although the survey will remain open until January 28.


 

Preliminary responses

Following are the preliminary responses from questions 1, 4, and 5.

1. To what extent do you create and/or maintain taxonomies or other controlled vocabularies? 

  Responses
My primary job responsibility 55.48% 86
One of my job responsibilities, but secondary 16.77% 26
Manage taxonomists or taxonomy projects, while also doing at least some taxonomy review work 11.61% 18
A special project, not in my job description or an originally expected responsibility 7.74% 12
Work done as contract/freelance often 4.52% 7
Work done as contract/freelance only occasionally 3.87% 6

Answered 155

4. What is your current employment situation? 

  Responses
Employee of an organization that uses taxonomies primarily internally, for its website, or in ecommerce 62.75% 96
Employee of an organization that incorporates taxonomies into an information product or information service, which it sells/offers 15.69% 24
Employee of a company or agency that provides taxonomy services or custom taxonomies to clients 5.88% 9
Independent contractor or freelancer (obtaining work primarily through subcontracting, agencies, other third parties, or as a temp employee) 9.80% 15
Consultant or business owner/partner (obtaining work primarily from direct clients) 5.88% 9

Answered 153

5. If you selected either the first or second response in question #4 (if you are an employee but not in consulting), where do you fit into your organization?

  Responses
Content management/content strategy 19.44% 21
Documentation/technical writing 1.85% 2
Editorial 1.85% 2
IT 8.33% 9
Knowledge management 25.93% 28
Library 4.63% 5
Marketing 4.63% 5
Operations 3.70% 4
Product development/product management 19.44% 21
Search 2.78% 3
User experience 7.41% 8
Other (please specify)
16

Answered 108


Survey Questions

Following are the rest of the questions

2. How long have you been doing work on taxonomies or other controlled vocabularies?

  • Less than 1 year
  • 1-2 years
  • 2-4 years
  • 4-6 years
  •  6-8 years
  •  8-10 years
  • 10-15 years
  • 15-20 years
  • Over 20 years

3. How long have you been doing work specifically called “taxonomy”?

  • Less than 1 year
  • 1-2 years
  • 2-4 years
  • 4-6 years
  • 6-8 years
  • 8-10 years
  • 10-15 years
  • 15-20 years
  • Over 20 years

6. What is your job title?

7. What degree(s) do you hold?

  • Less than a BA/BS
  •  BA only (4-year college)
  • BS only (4-year college)
  • MA
  • MS/M Eng.
  • MLS/MLIS
  • MBA
  • PhD/doctorate
  • Other advanced degree

8. What is your study or training specifically in the field of taxonomy or classification?

  • Concentration/specialty within a degree program
  •  Two or three college/university credit courses (but not a specialization)
  •  One college/university credit course
  • Continuing education course or workshop
  • Conference or professional seminar workshop
  • On the job formal training
  • On the job informal learning and experience
  • Self-taught through reading

9. Prior to your work in taxonomies, which best describes your professional background?

  • Content management/Web content/Content strategy
  • Database design, development, or administration
  • Document management
  • Indexing
  • Knowledge management
  •  Librarian
  • Marketing/Sales
  • Project management
  • Records management
  • Software/IT
  • User experience/Information architecture
  • Writing, editing, or publishing
  • None/Student
  • Other (please specify)

10. In your current position, what are your primary taxonomy-related activities?

  • Design/model new taxonomies or other vocabularies, determining structure type and policies
  • Based on an established model, develop and build out new taxonomies or other vocabularies
  • Edit, update, or maintain taxonomies or other vocabularies
  • Map (such as crosswalks), merge, integrate, or restructure existing taxonomies or other vocabularies
  • Write auto-categorization rules for taxonomies or other vocabularies

11. What software do you primarily use to work on taxonomies or other controlled vocabularies?

  • Commercial, dedicated thesaurus/taxonomy/ontology management software
  • Open-source, dedicated thesaurus/taxonomy/ontology management software
  • Commercial software, of which taxonomy management is a feature, module, or component
  • An internally developed thesaurus/taxonomy management system
  • Other commercial software not intended for taxonomies (such as a word processor, spreadsheet, or database management software)

12. Which of the following describes the implementation and use of taxonomies or vocabularies you work on?

  • For content organization, search/findability, and retrieval by internal users (employees)
  • For content search/findability and retrieval by external users (customers, subscribers, members, partners, prospects, patrons, the public)
  • For both internal users and external users

13. What is the size of the controlled vocabularies you typically work on?

  • Under 50 concepts per vocabulary
  • 50-100 concepts
  • 100-500 concepts
  • 500-1500 concepts
  • 1500-5000 concepts
  • 5000-10,000 concepts
  • Over 10,000 concepts

14. How are your current taxonomies/vocabularies linked to content?

  • By manual tagging or indexing
  • By auto-categorization/auto-indexing
  • Some of each
  • Don’t know

15. Are you familiar with and generally try to follow any of the following national or international standards: ANSI/NISO Z39.19 (2005) Guidelines for Construction, Format, and Management of Monolingual Controlled Vocabularies and ISO 25964 Information and documentation—Thesauri and interoperability with other vocabularies?

  • Don’t know these standards and thus don’t follow them.
  • Have read at least some of these standards, but don’t follow them.
  • Generally, keep these standards in mind and apply what is relevant, but not strictly.
  • Attempt to follow these standards closely and refer to them as needed.

16. What do you enjoy about taxonomy work?

17. What are pain points or challenges in your taxonomy work?

18. How did you first get started doing taxonomy work?

More results may appear in future blog posts, but the full results will be published in The Accidental Taxonomist, 3rd edition.

Friday, December 17, 2021

Named Entities in Taxonomies

I have long felt that there is some uncertainty as to where named entities (names of specific people, places, organizations, products, etc.) fit into taxonomies. Standards suggest one way, and practice tends to follow different way in dealing with these proper nouns. As taxonomy trends evolve so does the position on these named entities. The fact that taxonomies are not well-defined leaves it open to question as whether to taxonomies should have any named entities in them, or if taxonomies should comprise only topics."Hello my Name Is" badge

Historical trends

A historical perspective is needed. Modern, digital information retrieval taxonomies evolved out of thesauri. Thesauri, which originally came out in print format, first appeared in the 1960s and then were formalized by various standards published in the 1970s. The thesaurus standards state clearly that the relationships between a named instance and its type is one of the three kinds of hierarchical relationships permitted and supported in thesauri (the other two being generic-specific and whole-part). While taxonomies may omit the associative (related term) relationship of thesauri, they tend to follow the hierarchical standards of thesauri. Thus, named entities could be included in the taxonomy as the narrowest terms, narrower to a term for whatever “type” they are. But should it always be this way?

Then faceted taxonomies started being implemented in the early 2000s, first in ecommerce and then by the end of the decade in intranets, content management systems, digital asset management systems, and various content-rich websites. Once facets became adopted in information retrieval applications (aside from ecommerce), it became obvious from a user design perspective that named entities belonged in a different facet than the subjects. Facets are for refining a complex search query by different aspects. Sometimes these aspects follow the types of questions: What? Who? Where? When? “What” is usually for a subject,” but “who,” “where,” and “when” (for taxonomy terms naming events, not date ranges) refer to named entities. Sometimes people start a query about a subject, and sometimes  people start a query about a named entity, and facets allow people to start off searching any way they wish.

Then in 2009 the World Wide Web Consortium published the Simple Knowledge Organization System (SKOS) recommendation for taxonomies, thesauri, and other controlled vocabularies, which over the following decade became adopted as the standard model for building machine-readable taxonomies. One of the elements described in SKOS is that of the concept scheme, which is defined merely as “an aggregation of one or more SKOS concepts.” There is nothing comparable in the thesaurus standards. While a taxonomist may choose what to do with an “aggregation” of concepts, it has proven practical to separate out different kinds of named entities into concept schemes separate from concept schemes for topics. Thus, the widespread adoption of SKOS has contributed to the trend of separating different named entity sets, which had already started with faceted taxonomies.

My initial, and longest, experience in the domain of taxonomies and controlled vocabularies was as a controlled vocabulary editor at the library database vendor Gale. At Gale (and its predecessor company), named entity controlled vocabularies ("name authorities") have been separate from the subjects, but there were reasons for this. The named entities (named persons, companies, organizations and agencies, named works, products, laws, events, and fictional characters), each have had different sets of attributes and rules for maintenance.  Some even have different customized relationships with other controlled vocabularies. Interestingly, it was not always this way. Before I joined in the mid-1990s, some of these named entities (agencies, organizations, works, geographics, and events) were mixed in with the “descriptors” in a Subject MegaFile. But eventually specific attributes and relations, not to mention the growing number of terms and a new vocabulary management system, combined to make it more logical to split off each of the named entity vocabularies. The Events were the last to be split out of the Subjects.  So, it’s not because the controlled vocabularies were named entities per se, but rather their growing specialized maintenance needs due to an increase in specific attributes that led to managing them as separate controlled vocabularies. Attributes include, for example, birth date and place for a person, latitude and longitude for a location, and website URL and address for companies and organizations, among many more.

Taxonomies and ontologies

This feature of attributes brings us to the most recent trend in taxonomies, which is the occasional, but growing, convergence of taxonomies and ontologies. Ontologies divide up a knowledge domain into classes, and each class (like the Gale named-entity controlled vocabularies) has its own set of attributes and customized relationships with other classes. Ontologies, according to the Web Ontology Language (OWL) standard, however, have a different perspective on named entities. Ontologies are comprised of classes and subclasses, in hierarchies, which, in turn contain “instances” or “individuals,” which are unique named entities. The relationships between an instance and a class (or subclass) is not, however, considered hierarchical, but rather of a “member” type. Thus, while thesauri make no distinction for named entities, and taxonomies separate out name entities when it’s practical, ontologies make a strict distinction.

Furthermore, for ontologies, which originated in the domains of philosophy and computer science, a named entity as a proper noun is not what matters. Rather, it’s the fact that the instance is unique, and there is only one. This is true for people, companies/organizations, and places. It is not true for brand name products, though. A named product is a proper noun, such as MacBook Pro or Honda Accord, but it is not a unique instance, because there are millions of individual MacBook Pros and Honda Accords in existence. It’s a similar matter for named works, such as books, where one title has millions of copies. “Named entities” or “proper nouns” are grammatical or linguistic designations, which are OK for taxonomies and thesauri, but are not a feature of ontologies, with their philosophical origins.

Fortunately, you don’t have to worry about this philosophical problem if you choose to follow the approach of applying a high-level ontology model to an existing taxonomy or set of controlled vocabularies to extend the ontology with specific terms and named entities (or, from the other direction, to extend the taxonomy with semantic relations and attributes). The OWL-based ontology then may comprise only as many classes and subclasses needed to designate the usage of distinct custom relations and attributes.  With this approach, a different ontology class is mapped to each subset or hierarchy or SKOS concept scheme of a larger taxonomy. Each named entity type would typically correspond to a different ontology class, based on the named entity’s own attributes and relations. So, each named entity type would be in its own controlled vocabulary or SKOS concept scheme.

Just because OWL ontologies may include named instances as members of a subclass, does not mean you have to set up your knowledge model that way. This is similar to the idea of the thesaurus standard, which permits named entities to be narrower terms to generic subjects, but you don’t have to set it up that way. Omitting an option described in the thesaurus or ontology standards does not mean you are not in compliance with those standards.  

So, in conclusion, while some things about taxonomies have remained constant, other things, such as where to put named entities, have changed over time.