Saturday, April 30, 2022

Polyhierarchy in Taxonomies

A defining characteristic of taxonomies is that terms/concepts are arranged in broader-narrower hierarchies, which may resemble tree structures. A limited number of top concepts each have narrower concepts, which in turn may have narrower concepts, etc., and the narrowest concepts at the bottom of the hierarchy are sometimes referred to as leaf nodes, as “leaf” extends the metaphor of “tree.” The tree model has its limits, though, because taxonomies may also have occasional cases of “polyhierarchy,” whereby a concept may have two or more broader concepts.

 

People who are new to taxonomies, however, might not consider polyhierarchies, because they tend to think of taxonomies as classification systems. Hierarchical information taxonomies have their origin in classification systems, such as the Linnean taxonomy of organisms, library classification systems, and industry classification systems. Classification systems, however, do not allow polyhierarchy within the system. Originally, classification systems were for physical things, such as books, which can belong in only one place, so there could be no polyhierarchy. Standard classification systems, such as industry classification systems, were developed by governmental, international, or nongovernmental organizations with a primary purpose of gathering and organizing statistical data about classes, and thus polyhierarchy is not permitted, as it would lead to double-counting of members of a class.

 

The primary purpose of hierarchy in a taxonomy is to provide guided browsing of topics to end-users, who may start out looking at broad categories and then drill down to find the narrowest concept of interest. Thus, polyhierarchy serves the same purpose. The idea is that different people will start at different points at the top of the hierarchy to arrive at the same concept of interest, which is tagged to the same content set. A polyhierarchy should be implemented if the concept’s relationship is correctly and inherently hierarchical in both of its cases. An example of a polyhierarchy is Educational software, which has both Software and Educational products as broader concepts. Educational software is a kind of software, fully included within Software, and Educational software is a kind of educational product, fully included within Educational products.

 



 

Taxonomy standards and polyhierarchy issues

 

Taxonomy/thesaurus standards (ANSI/NISO Z39.19 and ISO 25964) describe three kinds of hierarchical relationships--generic-specific, generic-instance, and whole-part,--and polyhierarchy may exist within any of these types. Polyhierarchy that combines different hierarchical types, however, can be problematic, so it is best to avoid mixing hierarchical relationship types. For example, the following polyhierarchy mixes different types:

 

Washington, DC

Broader: United States (whole-part)

Broader: Capital cities (generic-instance)

 

The reason to avoid creating a mixed type polyhierarchyis simply that the browsable hierarchy user experience can get compromised and potentially confusing. Extensive hierarchies with large numbers of narrower concept relationships would result. A hierarchical taxonomy tree should be designed with a dominant hierarchy design. An exception is a thesaurus, which is not designed so much for top-down browsing but for browsing from term to term. Mixing hierarchical types within a thesaurus is thus acceptable.

 

It is also recommended to avoid creating hierarchical relationships across different facets in a faceted taxonomy. This is because facets are designed to be mutually exclusively, so that concepts from multiple facets can be used in combination to limit/filter/refine a search. As such, facets are designed to be distinct aspects. There could be an occasional exception of polyhierarchy, though, but more than 2-3 polyhierarchies across an entire faceted taxonomy should be a cause for review.

 

With the wider adoption of the SKOS (Simple Knowledge OrganizationSystem) model for taxonomies and in taxonomy management systems, taxonomies are more commonly organized into concept schemes. A concept scheme can be represented as a facet in a faceted taxonomy, but it is not limited to use as a facet. Utilizing concept schemes, it makes sense to have separate concept schemes with different hierarchical types, some for generic-specific (for type, categories, topics), one or more for whole-part (geography, organizational structures), and some containing lists of instances (named entities). In this model, Washington, DC, would be narrower only to the United States in the whole-part hierarchical concept scheme for geographic places. It could also be linked to Capital cities, which is in a different concept scheme for place types, with a different kind of relationship (“related” or perhaps a semantic relationship from an ontology).

 

Although SKOS permits hierarchical relationships across different concept schemes, it is best practice not to do this but rather to create hierarchical relationships and polyhierarchies confined within a concept scheme, just as it is recommended not to have polyhierarchy across facets.

 

Additional polyhierarchy considerations

Polyhierarchy concerns concepts in the taxonomy, and it is not about objects, items, or assets that get tagged with taxonomy concepts, such as an individual publication, document, image, product record, etc. Each of these may get tagged with multiple taxonomy concepts, and as such may have multiple “classifications” and thus can appear as if they are in a polyhierarchy, if a frontend application displays tagged items as if they are leaf nodes in a taxonomy.

A polyhierarchy usually involves only two broader concepts, not more. Having more than two broader concepts is extremely rare. If you find yourself creating polyhierarchies of three or more multiple times in a taxonomy, check to make sure you are not doing something wrong with the hierarchy design.

Some content management systems, which have built-in taxonomy management and tagging features, do not support polyhierarchy. The best known is SharePoint with taxonomies managed in its Term Store feature. Taxonomy terms may be “reused” across Term Sets, but they are not permitted within a Term Set, where it is most suitable. See my past post, Polyhierarchy in the SharePoint Term Store, for more details

Tuesday, March 22, 2022

Taxonomy Quotes

Taxonomies are very valuable, but not always easy to define, and they are described in various ways. They are also interdisciplinary, as taxonomies are developed by people in different fields for slightly different, yet similar purposes. I have heard various comments about taxonomies over the decades.

 

In the earlier years of the Taxonomy Community of Practice discussion group, a Yahoo group, which was the precursor of the current Taxonomy and Ontology Community of Practice LinkedIn group, the group’s moderator, Seth Earley, put out a call to the group’s members for a motto for the group. The winning quote, which became the group’s motto, was: “Taxonomies: That’s classified information,” by Jordan Cassel.

 

 

There were over a dozen other good suggestions for the motto which were posted in the group in January 2009. That turned out to be shortly before I wrote the first edition of my book, The Accidental Taxonomist, so, with permission, I took additional motto-quotes as opening headers to each of the 12 chapters of my book. The same quotes continued in publication of my second edition in 2016.

 

As I now am preparing a third edition (expected out in late fall 2022), I decided to refresh the chapter head quotes. Last month I put out a call for quotes in both the Taxonomy and Ontology Community of Practice LinkedIn group and in my own network. Some quotes were lengthier than before, as they were no longer submissions for a motto. I received far more submissions than I have chapters, and I have also decided to keep some of the original quotes (including the first one). Yet many of these quotes are quite thoughtful and/or clever, so I would like to share these new quotes here.

 

In true taxonomist fashion, I have categorized these quotes as about taxonomies, about taxonomy creation, about ontologies as compared to taxonomies, about taxonomies, and the a few particularly witty quotes at the end.

 

About taxonomies

 

Taxonomies: organizing the disorganized.
—June Tsang

 

Without Taxonomies; entropy!

—Hakan Strom

 

Ambiguity is the thief of Knowledge.

—Robert Vane

 

Good taxonomy is a love letter to the future.

—Gary Carlson

 

Taxonomies - organised, effective tagging. 

—Alison Jones

 

Taxonomy: Levels in the Playing Field

—Merridy Cox (Bradley)

 

Knowledge organisation, search, and use combine to enable us to navigate the workplace.

—Bill Proudfit

 

Your Taxonomy, like all metadata, is an expression of what's important to you and to the collection.

—Peter Krogh

 

Taxonomies are, first of all, an act of self discovery on how we understand the world.

—Andrea Splendiani

 

 

About taxonomy creation

 

Taxonomy: generalize or specify, that is the question.

—Fabiola Aparecida Vizentim

 

Taxonomy: The perfect mix of art and science.

—Mollee Marcus

 

Taxonomies: Normalizing to help you find, report and aggregate across data & content

—Rita M. Benitez

 

Regardless of domain, taxonomy is the science of sorting and labelling information so it can be retrieved for future use.

—Leah B.

 

Do your best to ignore even your most strongly held convictions. If you want to create a user-friendly taxonomy/ontology system, follow the data, not your heart.
—Rebecca B. Weiss

 

Successful data management requires a model-based architecture for operational efficiency, usability, and governance. Taxonomies extend these benefits to information and content.
—Vanessa Vavra-Laughlin

 

Taxonomy is such a great battleground to focus consistently on improving the user experience; it’s a first key activity to drive the user experience.

—Vellaichamy Shunmugavel

 

To ontologize or not to ontologize, that is the question you should ask yourself in the first place.

—Erick Antezana

 

 

 

About ontologies (or ontologies compared with taxonomies)

 

Taxonomies tell stories, ontologies create worlds.

—Fran Alexander

 

Taxonomies classify; ontologies reify.

—Beatrice Larentis

 

Ontology: generating knowledge by connecting the dots.

Taxonomy: is like a drawer organizer for kitchen cutlery.
—Brigita Perchutkaite Vollstedt

 

If a taxonomy is an elevator, an ontology is a Wonkavator!

—Caroline Coward

(Referencing Willy Wonka and the Chocolate Factory: like an elevator but also can go sideways and in all directions.)

 

Ontologies make the implications explicit.

—Michele Ann Jenkins

 

A good ontology maps the way out of chaosville.

—Mark Atkins

 

Ontologies: organizational substrate for your data, information, and know-how enzymes.

—Heather Fox

 

 

About taxonomists

 

—Meg Morrissey

I wanted to figure out my place in the world, so I hired a taxonomist.

 

Only when one’s data is all over the place is it discovered that a taxonomist is necessary.

—Rebecca Custis

 

Be the Taxonomy you want to see in the World!
— Elaine Chu

 

I say this categorically, taxonomists are an organized bunch.

Jordan Casell

 

Taxonomies: now you're where you belong.

—Alan S. Michaels

 


And the especially witty ones 😉

 

Ontology, Category, Property - Happy user will be! Try me, Find me, Surprise me :)

—Dorothee Balas

 

Year Make Model Engine Transmission Leather Navi Owners Accidents Miles Color: = my used-Taxi Taxonomy.

—Tony Mariella

 

Taxonomy is taxidermy for data -- mounted on a framework and stuffed for the purpose of display and study.

—Phil Taylor

 

Ontology: One graph to rule them all, one graph to find them, one graph to bring them all and in the semantic web bind them.
—Xeni Kechagioglou


I never metadata I didn't like

—Paul Belfanti

 

Taxonomy? Taxonoyou!

—Ron Cascella

 

Friday, February 4, 2022

Defining a Taxonomy’s Scope

In planning a taxonomy, I have often said that it is important at the beginning to define the taxonomy’s scope, specifically the subject area scope of the taxonomy’s terms, but without going into more detail. Recently I was asked by a client how to define a taxonomy’s scope. This is a good question. The taxonomy should be suited to the subject area scope of the content that will be tagged with the taxonomy and to the scope of the user’s expectations. Terms or topics only marginal to the subject scope, however, could occur in the content, and whether they should also be included in the taxonomy is a question. Ultimately, that should depend on whether user expectations justify it, as the needs of users should also be a factor in creating a taxonomy. A taxonomy should suit both its content and its users.

Sources for Taxonomy Terms

For content as a source of taxonomy terms, a combination of manual and automated approaches is recommended. By manually reviewing sample individual documents or content items, you can discern the main ideas and main topics, which should form the start and basic structure of the taxonomy and also help define its scope. Automated methods of extracting terms, through text analytics technologies, can bring in many additional terms from a much larger corpus of documents more quickly, picking up terms that a limited manual review would miss. Even though automated text analytics extracts terms based on relevancy and frequency of occurrence, such terms could be out of scope of the subject domain. That’s why it’s important to start first with a manual review of content to define the subject scope.  Then, when you enrich the taxonomy with automated extraction, you can approve terms that appear to be in scope or at least closely relevant and reject others. But should you reject all that are out of scope, even if they appear with sufficient frequency and relevancy? My advice is to try to assume the role of the user. Ask yourself: Might a user want to search for content on this term in this content collection?
 
For user needs and expectations as a contributing source of taxonomy terms, obtaining this information can be very direct, such as by creating a user questionnaire (at least for your internal users) that asks what the topics of importance are, how those users would define the scope, and what “marginal” topics would be acceptable for them to include. You could also request sample challenging (not expected, basic, typical) queries that the users would make.  Another good way to obtain input from the user side is to look at search query logs that list search strings that users have entered over a period of time, ranked by frequency. If a search phrase that is slightly out of scope of the subject occurs frequently, then the term should still be considered for inclusion in the taxonomy.

In either case, the scope of the subject gets better defined as the taxonomy is created. For example, a taxonomy for recipes may initially be scoped to comprise terms for the names of dishes, ingredients, and cooking method. But then a different term shows up significant frequency, “Nutrition Facts.” If it occurs in both the content and the user research, then it likely should be included.  If it shows up in the content only, but is not validated in user research, then it is more questionable.

Taxonomy Structure

The initial taxonomy structure itself tends to impose limits on scope. Taxonomies tend to be hierarchical with a limited number of top terms. If a candidate term appears in the content that does not seem to belong anywhere in the current taxonomic hierarchy, you might be inclined to exclude it. Factors of user needs (they might want to look up this term in this content), however, should take precedence. For example, the term “COVID-19” might be marginal but still of interest to be included many taxonomies on varied subjects, but there would exist no broader term for diseases in those taxonomies. Then adjustments need to be made, such as renaming or adding broader terms, or perhaps, more likely, the proposed term should be modified to fit the context of the taxonomy, such as becoming “COVID-19 impacts.”

Another thing to consider is adopting more a thesaurus structure than a taxonomy structure, at least for the facet or concept scheme of the taxonomy that is for miscellaneous “topics.” One characteristic of thesauri is to not rely so heavily on extensive hierarchical trees. What this means is that you could decide that it is acceptable that not all terms have broader terms and thus it’s OK to have a very large number of top terms, with the more specific terms linked to other terms only by related-term relationships, another feature of thesauri, if not by broader/narrower-term relationships. Abandoning the full hierarchical tree structure should only be considered if this hierarchy is not displayed as a navigation to the end users.

Documenting Policy

In any case, you need to define policies regarding what kinds of terms can be added and what kinds should not. This will evolve out of the activity of building the taxonomy, especially from evaluating what extracted terms to approve and what search log terms to approve. Whoever is doing this task (hopefully more than one person), should document each instance of uncertainty. While many term approvals and rejections will be obvious, there will be a gray area. This should be collected and discussed together, and then a policy can emerge.

Tuesday, January 11, 2022

Taxonomist Survey

In keeping with the title of this blog, it’s time to check in again to learn more about who taxonomists are and what they are doing. I conducted a survey of taxonomists (promoted through discussion lists, groups, and social media) in 2009 to gather information for my book, The Accidental Taxonomist, and again in 2015 for its second edition. I compared the results over those 6 years in a prior blog post, Taxonomist Trends. Now I have republished the identical taxonomist survey from 2015 on the SurveyMonkey platform at the start of this month January 2022, and have already gathered more responses than the 150 who responded in May 2015. So, I can provide a peak at preliminary results of a couple of questions, although the survey will remain open until January 28.


 

Preliminary responses

Following are the preliminary responses from questions 1, 4, and 5.

1. To what extent do you create and/or maintain taxonomies or other controlled vocabularies? 

  Responses
My primary job responsibility 55.48% 86
One of my job responsibilities, but secondary 16.77% 26
Manage taxonomists or taxonomy projects, while also doing at least some taxonomy review work 11.61% 18
A special project, not in my job description or an originally expected responsibility 7.74% 12
Work done as contract/freelance often 4.52% 7
Work done as contract/freelance only occasionally 3.87% 6

Answered 155

4. What is your current employment situation? 

  Responses
Employee of an organization that uses taxonomies primarily internally, for its website, or in ecommerce 62.75% 96
Employee of an organization that incorporates taxonomies into an information product or information service, which it sells/offers 15.69% 24
Employee of a company or agency that provides taxonomy services or custom taxonomies to clients 5.88% 9
Independent contractor or freelancer (obtaining work primarily through subcontracting, agencies, other third parties, or as a temp employee) 9.80% 15
Consultant or business owner/partner (obtaining work primarily from direct clients) 5.88% 9

Answered 153

5. If you selected either the first or second response in question #4 (if you are an employee but not in consulting), where do you fit into your organization?

  Responses
Content management/content strategy 19.44% 21
Documentation/technical writing 1.85% 2
Editorial 1.85% 2
IT 8.33% 9
Knowledge management 25.93% 28
Library 4.63% 5
Marketing 4.63% 5
Operations 3.70% 4
Product development/product management 19.44% 21
Search 2.78% 3
User experience 7.41% 8
Other (please specify)
16

Answered 108


Survey Questions

Following are the rest of the questions

2. How long have you been doing work on taxonomies or other controlled vocabularies?

  • Less than 1 year
  • 1-2 years
  • 2-4 years
  • 4-6 years
  •  6-8 years
  •  8-10 years
  • 10-15 years
  • 15-20 years
  • Over 20 years

3. How long have you been doing work specifically called “taxonomy”?

  • Less than 1 year
  • 1-2 years
  • 2-4 years
  • 4-6 years
  • 6-8 years
  • 8-10 years
  • 10-15 years
  • 15-20 years
  • Over 20 years

6. What is your job title?

7. What degree(s) do you hold?

  • Less than a BA/BS
  •  BA only (4-year college)
  • BS only (4-year college)
  • MA
  • MS/M Eng.
  • MLS/MLIS
  • MBA
  • PhD/doctorate
  • Other advanced degree

8. What is your study or training specifically in the field of taxonomy or classification?

  • Concentration/specialty within a degree program
  •  Two or three college/university credit courses (but not a specialization)
  •  One college/university credit course
  • Continuing education course or workshop
  • Conference or professional seminar workshop
  • On the job formal training
  • On the job informal learning and experience
  • Self-taught through reading

9. Prior to your work in taxonomies, which best describes your professional background?

  • Content management/Web content/Content strategy
  • Database design, development, or administration
  • Document management
  • Indexing
  • Knowledge management
  •  Librarian
  • Marketing/Sales
  • Project management
  • Records management
  • Software/IT
  • User experience/Information architecture
  • Writing, editing, or publishing
  • None/Student
  • Other (please specify)

10. In your current position, what are your primary taxonomy-related activities?

  • Design/model new taxonomies or other vocabularies, determining structure type and policies
  • Based on an established model, develop and build out new taxonomies or other vocabularies
  • Edit, update, or maintain taxonomies or other vocabularies
  • Map (such as crosswalks), merge, integrate, or restructure existing taxonomies or other vocabularies
  • Write auto-categorization rules for taxonomies or other vocabularies

11. What software do you primarily use to work on taxonomies or other controlled vocabularies?

  • Commercial, dedicated thesaurus/taxonomy/ontology management software
  • Open-source, dedicated thesaurus/taxonomy/ontology management software
  • Commercial software, of which taxonomy management is a feature, module, or component
  • An internally developed thesaurus/taxonomy management system
  • Other commercial software not intended for taxonomies (such as a word processor, spreadsheet, or database management software)

12. Which of the following describes the implementation and use of taxonomies or vocabularies you work on?

  • For content organization, search/findability, and retrieval by internal users (employees)
  • For content search/findability and retrieval by external users (customers, subscribers, members, partners, prospects, patrons, the public)
  • For both internal users and external users

13. What is the size of the controlled vocabularies you typically work on?

  • Under 50 concepts per vocabulary
  • 50-100 concepts
  • 100-500 concepts
  • 500-1500 concepts
  • 1500-5000 concepts
  • 5000-10,000 concepts
  • Over 10,000 concepts

14. How are your current taxonomies/vocabularies linked to content?

  • By manual tagging or indexing
  • By auto-categorization/auto-indexing
  • Some of each
  • Don’t know

15. Are you familiar with and generally try to follow any of the following national or international standards: ANSI/NISO Z39.19 (2005) Guidelines for Construction, Format, and Management of Monolingual Controlled Vocabularies and ISO 25964 Information and documentation—Thesauri and interoperability with other vocabularies?

  • Don’t know these standards and thus don’t follow them.
  • Have read at least some of these standards, but don’t follow them.
  • Generally, keep these standards in mind and apply what is relevant, but not strictly.
  • Attempt to follow these standards closely and refer to them as needed.

16. What do you enjoy about taxonomy work?

17. What are pain points or challenges in your taxonomy work?

18. How did you first get started doing taxonomy work?

More results may appear in future blog posts, but the full results will be published in The Accidental Taxonomist, 3rd edition.