The Accidental Taxonomist

Sunday, February 26, 2012

Business Taxonomies

It’s difficult enough for professionals to come to a consensus on the definition of “taxonomy.” As for “business taxonomy,” it’s even worse. There are varying ideas of taxonomy, varying ideas of “business,” and varying ideas on what the connection should be, in addition to the scope and purpose. Is it a taxonomy used by a for-profit enterprise? Is it a taxonomy of business processes for use in any enterprise? Is it the same as an “enterprise taxonomy”?

Just as the term “taxonomy” has both a specific and generalized meaning, so does the term “business taxonomy.” The specific meaning of a taxonomy is a controlled vocabulary of concepts (terms) that are organized into a hierarchy, based on hierarchical relationships (broader/narrower, parent/child, group/member, superordinate/subordinate) between the terms. The generalized meaning of taxonomy is any kind of controlled vocabulary or sets of controlled vocabularies (whether structured as lists, hierarchies, facets, thesauri, etc.) to support the organization and findability of content. The specific meaning of a business taxonomy, is a taxonomy that is specific for business use by dealing with business functions and processes. The generalized meaning of a business taxonomy is any taxonomy used by a business/enterprise, as opposed to a scientific discipline, to organize and manage its content.

I would caution that a taxonomy designed to define and describe business process and functions may not have the same objectives as the more common taxonomies whose purpose is to support the organization and findability of indexed content (documents, files, digital assets, etc.). In fact, even the term “taxonomy” in its purest sense does not mean that it has to be used for content management. The original taxonomies, such as the Linnean taxonomy of animals, plants and other organisms, were not designed for indexing and searching content associated with each concept in the taxonomy. Similarly Bloom’s Taxonomy of educational concepts is not for indexing educational content but rather to define the scope of educational objectives. Thus, a taxonomy could be just for classifying its term/concepts/members for sake of better understanding of its members and their relationships. In this way, a business taxonomy, in its more specific meaning, with the focus on functions and processes, could serve the purpose creating a better structure of an organization and improving business processes. The users of this kind of business taxonomy are the officers and managers of an organization with a goal of improving overall management, rather than all content users.

Furthermore, the business functions/process taxonomy can be more generic, and the same taxonomy, such as a Sales, General & Administrative (SG&A) taxonomy, with modifications, could be used by different organizations. In contrast, a taxonomy for content management and retrieval, especially when it is product/service-focused, should be custom-designed and developed to reflect the nature of the content and the goals of its users. The larger an enterprise is, the more unique its particular business mix and content is. That’s why the largest enterprises tend to have taxonomists on staff.

Yes, the more generic “business taxonomy” and “enterprise taxonomy” are terms often used interchangeably. However, I prefer it when the term “enterprise taxonomy” is used to mean specifically a taxonomy (or set of inter-related taxonomies) that is intended for use enterprise-wide. This is an important designation, because within an enterprise, taxonomies are often siloed. Integrating them and designing a unified taxonomy that cuts across all departments to support the broadest sharing of content across the enterprise is an important goal of an “enterprise taxonomy.”

The term “taxonomy” might sound too technical, scientific for business owners and managers who don’t understand exactly what it is or what it can do. Calling it a “business taxonomy” is sometimes a sort of marketing technique of taxonomy consultants to suggest that a taxonomy is something standard for businesses and something the business needs. It often works, but ultimately the term “business taxonomy” has resulted in confusion as well.

Friday, February 3, 2012

Taxonomy Training Workshops

I give a workshop in creating taxonomies in two formats, full-day in person and online. The question sometimes comes up from prospective participants as to the differences. Since a full-day onsite workshop is coming up soon, this would be a good time to address the similarities and differences.

Both workshops cover essentially the same content with a similar outline. Some of the examples are the same, and the participant exercises are the same, too. The workshops address the same diverse audience, comprising the range from quick-learning beginner who has at least a background in information science to someone already experienced in creating taxonomies but within a limited context and seeks to broaden those skills to more applications. In both kinds of workshops, the audience is also diverse in its professional backgrounds: librarians, corporate content managers and knowledge managers, indexers, web usability professionals and information architects; from industry, academia, nonprofits, and independent professionals. With such a wide diversity of backgrounds, the online workshop seems to resonate a little better with participants, none of whom then feels like a minority in a classroom of other types.

There is an organizational difference, whereby the outline of the onsite PowerPoint-based workshop has 10 topics, and online workshop comprises 5 weekly lessons: (1) an introduction of examples and applications, (2) software for creating taxonomies, (3) hierarchical and associative relationships, (4) preferred term wording and nonpreferred terms, and (5) miscellaneous topics of project processes, governance, folksonomies, and taxonomy jobs. Two onsite workshop topics may be covered in one weekly online lesson, although the onsite workshop does have the additional topics of the sources for terms and the comparison hierarchical taxonomies with alphabetical indexes (when presented as a pre-conference workshop for the American Society for Indexing). The order of topics is also different. The online workshop introduces software earlier on, so students have the option of using trial software to apply principles learned in later lessons.

The use of software is a significant difference in both workshops. In the onsite workshop, I give demos of Synaptica and Data Harmony Thesaurus Master, both web-based, and the PC software MultiTes. In the online workshop, participants access the demo software themselves, with the additional option to download the trial Mac software of Cognatrix (which I don’t demonstrate in my onsite workshop, since I don’t use a Mac.) Obviously, you can learn more when you try out the software yourself. Trial versions of MultiTes and Cognatrix are available to the public, but trials to Synaptica and Data Harmony are not and are made available by special arrangement for students of the workshop.

Q&A is more dynamic and engaging in the classroom setting. Although the online workshop has discussion forums, there is no simultaneous chat. Although the technology is there, the problem is that for a continuing education workshop this is in addition to everyone’s full time job and personal life. Spread out over different time zones too, it would be too difficult to get an agreeable time of day to chat. In the classroom it’s easier and less inhibiting to raise a question or make a comment. Online, it’s in writing, permanent for the duration of the course, and your name is attached to it. Thus, the online discussion of the workshop has usually been less than optimal.

Then there are the obvious differences. Some people learn better by listening to a speaker, and some people learn better by reading texts on their own. Convenience of location and timing will also make a difference. The onsite workshop is usually offered only once a year (although a customized corporate onsite version is an option), whereas the online workshop is offered every other month and is accessible by Internet globally. However, the latter tends to fill up 2-3 months in advance, and the onsite workshop usually has room for same-day registrations (at a higher cost).

Thursday, January 19, 2012

Taxonomy Merging or Mapping

Yesterday I gave a webinar presentation for members of Taxonomy Division of the SLA professional association, entitled “Taxonomy Updating, Combining, and Translating.” It was not the first time I presented on these topics and on the topic of taxonomy combining (mapping and merging) in particular. What was different this time is that I am currently involved in a project that involves taxonomy merging. But since I’m right in the middle of the project, I had not had the time to reflect on it and include takeaways from this project in yesterday’s presentation. Now I shall.

Merging and mapping are not the same thing. Merging brings together two taxonomies on the same subject, eliminating duplicate terms, supplementing each other with terms from one or the other taxonomy. The end result is a new and improved taxonomy taking the best of both of the legacy taxonomies. Mapping matches one taxonomy against another, so that terms in one taxonomy may be used for terms in another, such as a user interface taxonomy matching to another taxonomy that had been used to index the content. The end result is that one taxonomy can now retrieve more content.

The project I am involved in was described by the client as “mapping,” but then it became apparent that it was really the merging of taxonomies, not mapping. A second “mapping” component of the project turned out to be more about matching taxonomy to content. While in general it is good practice for the consultant to continue use the client’s own terminology, referring to “merging” as “mapping” was initially confusing. The distinction is important when it comes to activity of term-by-term comparisons, which is done in both merging and mapping.

Equivalent concepts, whether with the same terms or slightly different terms (such as Cars and Automobiles), are easily mapped or merged. So, the distinction between merging and mapping is not that important. In the case of merging you just have to decide which term to use as the preferred term.

The main difference in merging and mapping is how to deal with cases of a term in one taxonomy having a broader meaning that includes a term in the other taxonomy which lacks an equivalent in the first taxonomy. For example, Taxonomy A has the term “Precipitation,” and Taxonomy B has the terms “Snow” and “Rain” but does not have the term “Precipitation.” When you merge taxonomies, you take both terms and establish a hierarchical relationship between them so the merged taxonomy will have Precipitation and two narrower terms of Snow and Rain. As for mapping the taxonomies, you first need to know in which direction you are mapping. If you map from Taxonomy A to Taxonomy B, you cannot map these terms together. You cannot use Snow for Precipitation and you cannot use Rain for Precipitation, because the latter is broader in its meaning. (You could map Precipitation to Weather, if it existed in Taxonomy B, whereas Snow and Rain are left unmapped.) If you map from Taxonomy B to Taxonomy A, then these terms do map. Precipitation can be used for Snow and for Rain, since it includes both in its meaning.

Mapping needs to be precise, because it is relied upon to retrieve pre-indexed content. Merging, on the other hand does not need to be so precise, as a new taxonomy is the result. Therefore, initially mistaking a merging project for a mapping project, as I first did in this case, is not as bad as mistaking a mapping project for a merging project. I don’t think the latter, is likely, though.

Of course, you may have a project that involves merging two taxonomies and then mapping the resulting new taxonomy back to each of the two legacy taxonomies. This is actually not as much work as two consecutive projects. Adding an additional column in a spreadsheet, you can track merging and mapping at the same time. In fact, my current project involves that.

Thursday, December 29, 2011

From Folders to Facets

A recent taxonomy project I completed involved creating a new taxonomy for a financial services client who was migrating its internal content from shared drive folders to a SharePoint-based intranet, which also included automated indexing and a search engine (FAST). The new taxonomy will help support the search functionality, and taxonomy terms will also display in the left-hand margin (called the Refinement Panel), so that users can refine/narrower their initial search results by selecting terms from several attributes/filters/facets. The client had already made an attempt at the start of a taxonomy by the time I had become involved. Not surprisingly, the client-created taxonomy followed the structure of the existing folder names quite closely. After all, the folder structure was their only reference point. It became apparent that a taxonomy for folders and a taxonomy for facets, even for the same content, should be designed quite differently.

A hierarchy of nested folders has the following characteristics:

It is designed to gather and group similar documents together.
It is usually designed and created by a person who is uploading/storing documents with the frame of mind of “where can I put these so that I might find them later.”
A document can go into only one folder and thus under only one category.
A folder can be located within only one parent folder.
The hierarchy of nested folders thus may become quite deep, such as six of seven levels.
Folder names at deeper levels can become long and complex to describe a combination of criteria (a taxonomy design characteristic called pre-coordination).

A faceted taxonomy for search refinement has the following characteristics:

It is designed to refine and narrower a search by specific criteria.
It is designed to help all members of an enterprise find documents, including documents uploaded by different people in different departments.
A document can be assigned multiple taxonomy terms, even terms from within the same facet/broad category.
A taxonomy term may display “under” more than one parent taxonomy term, as long as it is a logical hierarchy. (This feature is called “polyhierarchy.”)
The displayed hierarchy of terms is not so deep, usually only three levels.
Taxonomy term names stay simple, since they are intended to be used in combination (a taxonomy design characteristic known as post-coordination).

With this many differences between hierarchical folders and refinement facets, it’s inevitable that the taxonomy for each will differ, even if the content/documents and the users remain the same. Actually, a nested folder structure may or may not even constitute a “taxonomy.” It depends on whether the folder system was designed with a consistent structure and folder names or whether it just grew ad hoc.

A year and a half ago I was involved with a similar taxonomy project for the wind energy company First Wind. In addition to designing a faceted taxonomy for the Refinement Panel to support search in SharePoint, I was also tasked with improving the nested folder structure and folder names already in use in SharePoint, and which was not going to go away. I remember being asked then, if I could just create a single taxonomy for both purposes. The answer was no, not entirely. There would be overlap, but there would also be differences. To the stakeholders, that seemed like a lot of additional work, but to me, the taxonomist, that’s simply the nature of my work, and I enjoy the diversity of building different kinds of taxonomies. In the end, more work put in the by the taxonomist means less work needed by the users.

Monday, November 28, 2011

Multilingual Taxonomies

We know that taxonomies help information-seekers browse or search for desired documents/information. Taxonomies provide the bridge between the user’s choice of words and the wording within the desired documents. But what if the user actually speaks a different language than that of the content? Documents can be translated (automatically if it’s just to get the general meaning or by human translators when accuracy is important), but that’s only done after the document is found. To support the findability of foreign language documents what is needed is a bilingual or multilingual taxonomy (“bilingual” meaning in two languages, and “multilingual” meaning in three or more languages).

This Thursday, December 1, I will be presenting on the topic of multilingual taxonomies at the Gilbane Conference in Boston, were the focus is web and enterprise content management. This session, which will be shared with the co-speaker Ross Lehrer of WAND, appears to be only one in the conference dedicated to taxonomies and the only presentation with the word “multilingual” in its name. The topic will be of interest to both those concerned with multilingual content but with no experience with taxonomies and to those with an interest in taxonomies but no experience with multilingual content.

The description of the session (which I did not write) on the conference website says: “Multilingual content dramatically expands the potential market for your products, and multilingual taxonomies often need to be part of your multilingual strategy.” This description applies better to my colleague’s presentation, especially since the taxonomies that his company builds are product taxonomies. My presentation, on the other hand, addresses taxonomies for more than just websites of products, such as taxonomies for retrieving articles written in different languages.

The issue is whether the multilingual content is created and managed internally or externally to your organization. If your multilingual content is what your organization creates, such as additional language versions of a public website for a global market, then it is likely that the content in the different languages is managed internally but separately, by separate language teams. The content is similar but not identical in each language, and the taxonomies that support search and browse may also be created and managed separately. Having taxonomies in different languages, however, is not exactly the same as a “multilingual taxonomy.”

A good analogy would be a translated book. The book’s index should not simply be translated; rather a new index is created by an indexer, who is a native-language speaker of the translated language, based on the newly translated text. Consulting the original language index is fine, but directly translating it will have less than ideal results. Similarly, if you have a website translated into another language, and the website has a taxonomy for browsing for specific content pages, that taxonomy should not simply be translated, but rather a new second-language taxonomy should be created, consulting the first taxonomy, of course.

By contrast, a truly multilingual taxonomy connects users who speak one language to content that is in another language. There needs to be a one-to-one correspondence between terms across both languages, and the different language versions need to be managed together. It’s somewhat complicated to design and create, but software tools are available for this, and the result is a powerful aid to searching and browsing across languages. What is important is to match your multilingual taxonomy design to the specific goals, either (1) service in different language markets, each with their own language content; or (2) users being able to access content in a language which they don’t speak.