The Accidental Taxonomist

Sunday, June 24, 2012

Enterprise Taxonomies vs. Traditional Taxonomies

A book that I have been reading (Structures for Organizing Knowledge: Exploring Taxonomies, Ontologies, and Other Schemas, by June Abbas, 2010) got me thinking about the comparison between corporate/enterprise taxonomies and other “traditional taxonomies”. I found it intriguing that Abbas presents corporate or “professional” taxonomies in the same chapter on personal information structures. Thus, a corporate taxonomy could more aptly be an extension of a personal knowledge organization system, rather than the customization of standard taxonomy or controlled vocabulary. So, how are corporate taxonomies or enterprise taxonomies (corporate taxonomies that are specifically for use enterprise-wide) different from traditional (library science type) taxonomies or thesauri?

There are, in fact, multiple ways in which a corporate or enterprise taxonomy differs from the traditional taxonomies or controlled vocabularies used in libraries or in particular subject disciplines. Enterprise taxonomies in particular are:

1. Relatively small in size

2. Multifaceted

3. Customized to an enterprise’s content

4. Customized to an enterprise’s users

5. Relatively informal

Size
An enterprise taxonomy tends to be relatively small in size with respect to the number of terms and depth of term levels. The size will depend largely on the complexity of an enterprise’s business (number of lines of business, for example), but the range of 1000-2000 terms in an taxonomy for an enterprise that has single line of business is typical. An organization may certainly supplement this enterprise taxonomy with additional subject-specialized controlled vocabularies, particularly in the areas of research & development or product catalogs.

Faceted Nature
An enterprise taxonomy deals with a variety of content which is differentiated in more than one way, not just by subject matter. Content is typically organized and searched not merely for what it is “about” but also what its purpose is, what its source is, what type of content it is, and perhaps also for what market or customer type it is relevant. Thus, an enterprise taxonomy is usually organized into several facets to support faceted search or faceted browse (see my April 2012 post), which include: document type, file format, department or functional area, line of business or product/service category, geographical region, and market segment, in addition to a topical facet.

Content Customized
A corporate or enterprise taxonomy should be highly customized to an enterprise’s own unique content. While two companies in the same industry may have nearly identical products and services, their customer or member base could vary slightly, and they probably do not have identical organizational structures, procedures, and workflows. Thus, no two companies or organizations would have identical content. Organizations also differ in the quantity of different kinds of content they own and in the importance they assign to different types of content.

User Customized

Just as important as content-customization is user-customization. Corporate or enterprise taxonomies are designed to help an organization’s users (employees, and often also partners and customers) find content. Users include both those who upload/publish content to the intranet or content management system, often manually tagging it, and users who are looking for content. These are sometimes the same people and sometimes not. Also in consideration of the users, there may be a workflow or business rule aspect that is taken into consideration. Thus, the process of designing an enterprise or corporate taxonomy involves gathering input from users, via interviews and workshops. For this reason, the author Abbas has combined corporate taxonomies into the same chapter as personal taxonomies, because they are both highly user-centered.

Informal

Traditional discipline taxonomies (such as for living organisms), thesauri, book cataloging and classification systems follow industry standards for their design and construction, which can be quite rigid and formal. For general-purpose controlled vocabularies, there are the ANSI/NISO Z39.19 guidelines and ISO 25964-1 standard (see my March 2012 post), which allow more flexibility than library cataloging rules. The design of corporate or enterprise taxonomies should adhere to ANSI/NISO or ISO standards at a high level, but in practice, other practicalities and user needs and expectations should take precedence over a strict following of every detail of the standards.

Monday, May 28, 2012

Digital Asset Management and Taxonomies

Earlier this month I attended a conference on digital asset management (DAM) for the first time: Henry Stewart DAM in New York, May 10-11. It revealed to me that the field of digital asset management is definitely an area where taxonomies are being applied and could be more even extensively utilized.

“Digital assets” refers to digitized content generally of images, video, and sound recordings, but could also be copyright text of publishers. As one speaker mentioned, digital assets are the intellectual property of certain enterprises, and hence the designation “assets.” The typical industries concerned with DAM are publishers, broadcasters, advertising (creative) agencies, and other media companies, which manage vast collections of media files. Additionally, large enterprises in any industry whose corporate communications departments manage sizeable collections of image or multimedia files are also concerned with DAM. The New York venue of this conference drew heavily on representatives of local media and advertising industries, but the annual fall venue of the same conference in Chicago, I am told, has a more diversified participation. The field is additionally defined and driven by vendors, digital asset management software products.

DAM is also a growing field. The 2012 Henry Stewart DAM conference in New York, its ninth year, drew an attendance of approximately 500, up from 400 the previous year. Last year, a new professional association was founded, the Digital Asset Management Foundation. A new quarterly journal from Henry Stewart Publications, Journal of Digital Media Management, just published its first issue this month. Also this month, the DAM Foundation and independent analyst firm, The Real Story Group, released a DAM Maturity Model, which provides a structured framework to address DAM implementation challenges.

As to where taxonomies fit into DAM, it’s not difficult to see. Digital assets tend to be structured content with various metadata fields (subject, purpose, format, location, copyright), which DAM software supports. Taxonomies (or more correctly, any controlled vocabularies) enable the consistent application of descriptive metadata. DAM software supports the inclusion of controlled vocabularies, but the tools to and especially the know-how to build the best controlled vocabularies/taxonomies is often lacking. Meanwhile, standard text search does not work on the non-text content that is typical of digital assets, so tagging and controlled vocabularies are all the more important.

DAM experts and consultants are not necessarily experts in taxonomies, and taxonomy experts may not be familiar with DAMs, so there is some learning for all of us. DAM systems, like other content management systems, often need to be configured, integrated, and customized for a specific enterprise’s use, with expertise and time spent first on system integration, pushing taxonomy design out to perhaps only an afterthought.

Taxonomies have various applications. I have been involved in taxonomies that tend to be either: (1) external facing, to allow customers or clients to search for content published by an organization, whether for research or for e-commerce, and (2) internal, as an enterprise or business taxonomy, to allow employees to find content within an intranet or enterprise content management system. A digital asset management system can manage content for either internal or external users, or often both at once. As such, designing DAM taxonomies often needs to take into consideration more varied users of the content. This is certainly an exciting growth area for taxonomies, and I hope to be more involved in DAM taxonomy projects in the future.

Thursday, April 12, 2012

Faceted Search vs. Faceted Browse

If you have considered different kinds of taxonomies, you have undoubtedly come across the faceted type. You can remember what a facet is by thinking of “face,” as in a multi-faceted diamond. Other names for facet include dimension, aspect, or attribute. It could be the set of characteristics that describe a product (category, size, color, price, intended user, etc.), an image (thing, persons, location, occasion, etc.), or a document (document type, topic, author, source, etc.). In a business or enterprise taxonomy, facets for content management may include content type, product or service line, department or function, and topic. Named entities, such as person names, company names, agency names, and names of laws might also each be a facet. Facets allow users to limit, restrict, or filter results by chosen criteria, one from each facet, that are combined in any order.

Are “faceted browse” and “faceted search” the same? These designations are often used interchangeably, and until recently I had not considered a difference, preferring to use the terminology of my client. Yet “browse” and “search” are clearly not the same thing. To browse is to skim or scan a displayed list of taxonomy terms, whether arranged alphabetically, hierarchically, or a combination. To search is to enter search terms into a search box (which may then be matched against a controlled vocabulary for more accurate results). The implementations of facets in a user interface vary greatly, so perhaps the different designations of “faceted browse” and “faceted search” should reflect these different implementations.

One implementation of facets is to allow the user to dynamically restrict, filter, or limit a data set , based on selecting values from each of multiple facets that are displayed, typically in the left-hand margin, while references to the data or content is displayed in the main screen area. Under each named facet are displayed the names of values (taxonomy terms) within the facet. Facets may need to be expanded to display all values under each, or there may be scroll bars of terms. This implementation of facets can be considered “browse” because the user browses the displayed facets and the displayed terms within each facet.

The data set that is filtered by the facets could be the entire set of content, but more likely it is a subset, based on a prior execution of either a category selection or a search. If the user’s first step was to initiate a search to obtain search results, and then uses facets to limit the search results, this might be called “faceted search.” Even though the user browses the facets, because the facets are introduced as a second step following search, this step might be called “faceted search.” If, however, the user’s first step was to browse subject categories and select a category to obtain the initial data set, then the use of facets in the second step would more likely be called “faceted browse.” I would consider it better practice to call the process “faceted browse” in either case, regardless of how the initial data set was obtained. However, if it’s less confusing to the users, I will defer to those who prefer to call this process “faceted search.”

Another implementation of facets is to allow the user to select among limiting criteria from the beginning, without first selecting a subject by browse or search. In order to achieve usable results (result sets that are not too large), the facets need to contain relatively large taxonomies: a large number and deep set of terms. While it is certainly possible to display a large taxonomy for browsing, it may be difficult to display multiple large, browsable taxonomies, one for each facet. Therefore, if facets are made available to the user from the start (without first requiring the user to select a limited data set based on a search or browse selection), it is more likely that that not all the facets will display the terms to the user. The user must then execute a search within a facet. This would correctly be called “faceted search.” It is also known as “fielded search” or “advanced search,” as a search field/box is made available for each facet “field.”

The distinction between faceted browse and faceted search is lost, however, where the distinction between browsing and searching is becoming blurred. Newer user interface implementations of taxonomies are combining search and browse, so that the difference is no longer as obvious. For example, I have seen cases where there is a search box, and as the user types in something, a type-ahead feature matches the search string against controlled vocabulary terms, which are displayed in a short list under the box, and the user can browse the list to select a term. I have also seen a case where a user may be presented with a search box to enter search terms, and there is a button next to the search box, which the user may optionally click, and then the search box becomes a scroll box to view and browse the entire controlled vocabulary for that field. When these kinds of advanced taxonomy-enhanced search boxes correspond to facets, the distinction between “faceted search” and “faceted browse” truly no longer exists.

Friday, March 16, 2012

Taxonomy Standards

I’ve written book reviews before, but recently a journal asked me to review a standard. It was ISO 25964-1 Thesauri and interoperability with other vocabularies, Part 1: Thesauri for information retrieval, which was published in 2011 by the International Organization for Standards. I was pleased to have the opportunity, because this way I obtained a copy which otherwise costs about US$260 (or whatever the current exchange rate equivalence of 238 Swiss Franc). Most taxonomists in the United States and beyond have some familiarity with the U.S. standard ANSI/NISO Z39.19 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, not merely because it is American, but because the PDF document is freely available from the National Information Standards Organization (NISO).

So, how do the two standards compare? They are very similar is style, format, level of detail, use of illustrative examples, suitability of a reference, etc. Although explanations are not identical, it is clear that there was some degree of cooperation, consultation, or at least communication among the author teams of each. The differences between the two standards are in their scope, and that is obvious from their titles. The ISO standard covers both monolingual and multilingual thesauri in a single standard, whereas the ANSI/NISO standard takes up multilingual vocabularies in a separate document. Additionally, the ISO standard focuses on thesauri, leaving other types of vocabularies in the yet-to-be published part 2 document, whereas the ANSI/NISO standard covers all kinds of controlled vocabularies within a single standard publication.

There are implications with these differences. By combining guidance on multilingual in addition to monolingual thesauri in a single document, monolingual taxonomists who read the ISO standard will broaden their awareness of the uses and possibilities of multilingual taxonomies, and that’s a good thing. On the other hand, a standard that appears from its title to be just about “thesauri” is likely to be overlooked by taxonomists who work with other kinds of controlled vocabularies.

The importance of the standards should not be overlooked. Taxonomies are only useful if they are well constructed, and decades of experience, practice, and use have indicated the conventions by which the most usable and useful taxonomies should be built. In addition to prescribing what works, the standards also encourage consistency. Consistently designed taxonomies thus become familiar to users, who then know how to use them with minimal training. Users don’t have to be told what a narrower term is and where to find it, or what a related term is and what its purpose is.

Taxonomy or thesaurus standards are a particularly useful resource to taxonomists. Other information management standards (such as for cataloging, indexing, bibliographic citations, etc.) have been reproduced, republished, disseminated, etc. by numerous professional organizations, nongovernmental institutes, educational institutions, and in numerous books. There is no need for the average information professional to look up the original, primary source standard. Taxonomy construction, however, is not such an established discipline or activity. In the field of taxonomies, professional membership organizations are lacking (except for divisions or special interest groups of larger organizations), academic courses are merely nonstandard electives, and books are fewer. The nature of the free-for-all style of the web, which is the platform for most taxonomies today, also poses challenges to conformity in style. Therefore, there is in fact a greater need for the average taxonomist to consult the original, primary source of standards.

For most individual taxonomists, I would suggest that the ANSI/NISO standard is sufficient, and there is no need to also read the ISO standard. However, for an organization or enterprise engaged in taxonomy building and implementation, the additional ISO standard is probably a good investment. Finally, any taxonomist involved in teaching or consulting would also find the ISO standard a valuable additional resource.

Sunday, February 26, 2012

Business Taxonomies

It’s difficult enough for professionals to come to a consensus on the definition of “taxonomy.” As for “business taxonomy,” it’s even worse. There are varying ideas of taxonomy, varying ideas of “business,” and varying ideas on what the connection should be, in addition to the scope and purpose. Is it a taxonomy used by a for-profit enterprise? Is it a taxonomy of business processes for use in any enterprise? Is it the same as an “enterprise taxonomy”?

Just as the term “taxonomy” has both a specific and generalized meaning, so does the term “business taxonomy.” The specific meaning of a taxonomy is a controlled vocabulary of concepts (terms) that are organized into a hierarchy, based on hierarchical relationships (broader/narrower, parent/child, group/member, superordinate/subordinate) between the terms. The generalized meaning of taxonomy is any kind of controlled vocabulary or sets of controlled vocabularies (whether structured as lists, hierarchies, facets, thesauri, etc.) to support the organization and findability of content. The specific meaning of a business taxonomy, is a taxonomy that is specific for business use by dealing with business functions and processes. The generalized meaning of a business taxonomy is any taxonomy used by a business/enterprise, as opposed to a scientific discipline, to organize and manage its content.

I would caution that a taxonomy designed to define and describe business process and functions may not have the same objectives as the more common taxonomies whose purpose is to support the organization and findability of indexed content (documents, files, digital assets, etc.). In fact, even the term “taxonomy” in its purest sense does not mean that it has to be used for content management. The original taxonomies, such as the Linnean taxonomy of animals, plants and other organisms, were not designed for indexing and searching content associated with each concept in the taxonomy. Similarly Bloom’s Taxonomy of educational concepts is not for indexing educational content but rather to define the scope of educational objectives. Thus, a taxonomy could be just for classifying its term/concepts/members for sake of better understanding of its members and their relationships. In this way, a business taxonomy, in its more specific meaning, with the focus on functions and processes, could serve the purpose creating a better structure of an organization and improving business processes. The users of this kind of business taxonomy are the officers and managers of an organization with a goal of improving overall management, rather than all content users.

Furthermore, the business functions/process taxonomy can be more generic, and the same taxonomy, such as a Sales, General & Administrative (SG&A) taxonomy, with modifications, could be used by different organizations. In contrast, a taxonomy for content management and retrieval, especially when it is product/service-focused, should be custom-designed and developed to reflect the nature of the content and the goals of its users. The larger an enterprise is, the more unique its particular business mix and content is. That’s why the largest enterprises tend to have taxonomists on staff.

Yes, the more generic “business taxonomy” and “enterprise taxonomy” are terms often used interchangeably. However, I prefer it when the term “enterprise taxonomy” is used to mean specifically a taxonomy (or set of inter-related taxonomies) that is intended for use enterprise-wide. This is an important designation, because within an enterprise, taxonomies are often siloed. Integrating them and designing a unified taxonomy that cuts across all departments to support the broadest sharing of content across the enterprise is an important goal of an “enterprise taxonomy.”

The term “taxonomy” might sound too technical, scientific for business owners and managers who don’t understand exactly what it is or what it can do. Calling it a “business taxonomy” is sometimes a sort of marketing technique of taxonomy consultants to suggest that a taxonomy is something standard for businesses and something the business needs. It often works, but ultimately the term “business taxonomy” has resulted in confusion as well.