The Accidental Taxonomist

Saturday, July 28, 2012

The Accidental Taxonomy Consultant

It’s well known that most taxonomists become taxonomists by accident, as the title of my book attests. As I look back on my career, I see this progression continuing one step further in accidentally becoming a taxonomy consultant.

Not all consultants are accidental, though. Bright college graduates in the social sciences with strong analytical skills are often attracted to entry level jobs at consulting firms. They then pick up technical consulting skills by practice over time, and these could even involve taxonomy work. As such, they are not accidental consultants, but they may become accidental taxonomy consultants.

Those who are already taxonomists, as myself, often end up as consultants, because that’s where they find the work. Full-time taxonomist jobs are still relatively rare and are often not in one’s geographical location. So, if an experienced taxonomist loses a job due to a layoff or relocation, and looks around and cannot find another conveniently located taxonomist job, consulting becomes an option. Employers of full-time taxonomists tend to be limited to either certain industries (publishing, media, ecommerce, etc.) or to very large companies in any industry with large internal content management needs, but then the taxonomist job is only at their headquarters location. However, companies of all industries and various medium to large sizes have taxonomy needs and can often afford a taxonomy consultant on a temporary project if not a full-time staff member. Thus, taxonomy consultants are in greater demand than are full-time employed taxonomists.

In seeking to contract a taxonomy consultant, you may wonder whether it is better to hire a consultant-turned-taxonomist or a taxonomist-turned-consultant. If you hire a skilled taxonomist who is less experienced in consulting, you ought to get a good taxonomy, although the process might not be that smooth. More likely, though, the experienced taxonomist who is inexperienced in consulting will not likely make as good a first impression and sell the services as well as professional consultant. The professional consultant-turned-taxonomist will provide a better project experience, although the end-result taxonomy may not be as good. If you can plan and manage the project yourself, then it is the experienced taxonomist you want, but if you want the entire project managed by a consultant, you need a good consultant.

You might not have to compromise, though. A senior enough consultant could be sufficiently skilled in both consulting and taxonomies, that the career sequence does not matter. If you can afford to hire a firm or partnership, or even a consultant with subcontractors, you may not need to make the choice of experience either, because you can hopefully get some of each on the consultant team serving you. That’s why you should look at the resumes of each member of a consulting team, to ensure that at least one member has very solid taxonomy experience, while at least another member has considerable consulting and project management experience.

Among the things I have learned about consulting is that it helps to have standard consulting processes and procedures, including standard questions that the consultant should ask the client at the very beginning of a project to clarify the scope and understand the context. Consulting firms may additionally have standard deliverables, reports, etc. But in the particular field of taxonomy consulting, the variables are too great, and standard deliverables rarely fit.

There are a lot of books on consulting, but none about taxonomy consulting. When I came across a potential title, Information Consulting: Guide to Good PracticeI (Chandos Publishing, 2011), I found that even this book addressed consulting more generally, and when it occasionally discussed “information consulting” it was more about the work of independent research librarians. So, accidental taxonomy consultants lack written guidance that is just for them.

This is my story. I became a taxonomist by accident. Then after getting laid off, more than once, I became a taxonomy consultant by accident. Then I joined a consulting company of intentional consultants, some turned taxonomy-consultant by accident, but I did not feel I fit in with them or their choice of projects, since I was a taxonomist first. So, I recently chose to go on my own again as an independent consultant or partnering with another on a case-by-case basis.

Saturday, July 7, 2012

Deviating from Taxonomy Standards

In my last blog post, I suggested that enterprise taxonomies need not follow the standards for controlled vocabularies and thesuari (ANSI/NISO Z39.19 guidelines and ISO 25964-1) to the same extent as “traditional” discipline taxonomies and thesauri. I say this cautiously, though. Standards should not be ignored for any taxonomy, but rather followed in general, and any deviations made should be for good reason. Enterprise taxonomies (taxonomies custom-designed for the content and users of a specific enterprise, and for the entire enterprise) and also ecommerce taxonomies (taxonomies of products for sale) often have good reasons to deviate from standards in certain areas.

Hierarchical Relationships
An important part of the taxonomy standards are the criteria for creating hierarchical relationships. Hierarchical relationships should be one of three types: generic-specific, generic-instance, or whole-part. Any other relationship among posted/displayed terms is not hierarchical, but rather associaciative. A “good reason” to relate terms hierarchically even when they do not exactly meet the criteria, is when the pair of terms are clearly related, but the taxonomy does not include any associative terms. Enterprise and ecommerce taxonomies often are simple hierarchical taxonomies and do not support associative relationships common in standard thesauri. For example, the following two hierarchies are not correct by the standards, but the first may be acceptable in an enterprise taxonomy and the second in an ecommerce taxnoomy:

Information Technology
> Telecommunications
> > Cell phones

Cameras

> Camera accessories

Plural/Singular
The standard is to use plural for terms that are countable nouns. The idea is is that when users select a term they will find multiple documents, records, or digital assets (in plural) indexed with or categorized by the term. Enterprise and ecommerce taxonomies, however, tend to be comprised of multiple taxonomy facets, whereby the user selects terms from a combination of facets. Taxonomy terms within facets then appear to user to be filters, scopes, aspects, or attributes, rather than simply a category of plural objects. For example, a document type facet might have terms in the singular describing the type of document: Article, Report, Form, Application, Interview, etc., all in the singular to answer the question “what kind of document.” The names of the facets themselves may also be in singular, rather than plural, so as to “limit by” a facet, such as: Document type, Location, Topic, Department, etc.

Compound Terms
The standards present criteria to consider in retaining or breaking apart compound terms. For example “A compound term should be split when its focus refers to a property or part, and its modifier represents the whole or possessor of that property or part.” (ANSI/NISO Z39.19-2005 section 7.6.2.1) While such guidelines are useful and certainly within the scope of taxonomy design, the highly customized nature of enterprise or ecommerce taxonomies obviate following such guidelines for compound terms. ANSI/NISO gives the example of aircraft + engines rather than aircraft engines, but aircraft engines, or other such compound terms, would be perfectly acceptable in an enterprise or ecommerce taxonomy. It is worth noting that both the ANSI/NISO and ISO standards state that these criteria are just guidelines and do not have to be strictly followed.

An enterprise or ecommerce taxonomy can be a challenge to create. Just because adherence to taxonomy standards may be less strict for a corporate or retail taxonomy than it is for a subject/discipline taxonomy, should not suggest that it is easier to design or that non-trained taxonomists can design it. Only with a good understanding of the standards would one know when and where it is acceptable not to adhere to a specific guideline.

Sunday, June 24, 2012

Enterprise Taxonomies vs. Traditional Taxonomies

A book that I have been reading (Structures for Organizing Knowledge: Exploring Taxonomies, Ontologies, and Other Schemas, by June Abbas, 2010) got me thinking about the comparison between corporate/enterprise taxonomies and other “traditional taxonomies”. I found it intriguing that Abbas presents corporate or “professional” taxonomies in the same chapter on personal information structures. Thus, a corporate taxonomy could more aptly be an extension of a personal knowledge organization system, rather than the customization of standard taxonomy or controlled vocabulary. So, how are corporate taxonomies or enterprise taxonomies (corporate taxonomies that are specifically for use enterprise-wide) different from traditional (library science type) taxonomies or thesauri?

There are, in fact, multiple ways in which a corporate or enterprise taxonomy differs from the traditional taxonomies or controlled vocabularies used in libraries or in particular subject disciplines. Enterprise taxonomies in particular are:

1. Relatively small in size

2. Multifaceted

3. Customized to an enterprise’s content

4. Customized to an enterprise’s users

5. Relatively informal

Size
An enterprise taxonomy tends to be relatively small in size with respect to the number of terms and depth of term levels. The size will depend largely on the complexity of an enterprise’s business (number of lines of business, for example), but the range of 1000-2000 terms in an taxonomy for an enterprise that has single line of business is typical. An organization may certainly supplement this enterprise taxonomy with additional subject-specialized controlled vocabularies, particularly in the areas of research & development or product catalogs.

Faceted Nature
An enterprise taxonomy deals with a variety of content which is differentiated in more than one way, not just by subject matter. Content is typically organized and searched not merely for what it is “about” but also what its purpose is, what its source is, what type of content it is, and perhaps also for what market or customer type it is relevant. Thus, an enterprise taxonomy is usually organized into several facets to support faceted search or faceted browse (see my April 2012 post), which include: document type, file format, department or functional area, line of business or product/service category, geographical region, and market segment, in addition to a topical facet.

Content Customized
A corporate or enterprise taxonomy should be highly customized to an enterprise’s own unique content. While two companies in the same industry may have nearly identical products and services, their customer or member base could vary slightly, and they probably do not have identical organizational structures, procedures, and workflows. Thus, no two companies or organizations would have identical content. Organizations also differ in the quantity of different kinds of content they own and in the importance they assign to different types of content.

User Customized

Just as important as content-customization is user-customization. Corporate or enterprise taxonomies are designed to help an organization’s users (employees, and often also partners and customers) find content. Users include both those who upload/publish content to the intranet or content management system, often manually tagging it, and users who are looking for content. These are sometimes the same people and sometimes not. Also in consideration of the users, there may be a workflow or business rule aspect that is taken into consideration. Thus, the process of designing an enterprise or corporate taxonomy involves gathering input from users, via interviews and workshops. For this reason, the author Abbas has combined corporate taxonomies into the same chapter as personal taxonomies, because they are both highly user-centered.

Informal

Traditional discipline taxonomies (such as for living organisms), thesauri, book cataloging and classification systems follow industry standards for their design and construction, which can be quite rigid and formal. For general-purpose controlled vocabularies, there are the ANSI/NISO Z39.19 guidelines and ISO 25964-1 standard (see my March 2012 post), which allow more flexibility than library cataloging rules. The design of corporate or enterprise taxonomies should adhere to ANSI/NISO or ISO standards at a high level, but in practice, other practicalities and user needs and expectations should take precedence over a strict following of every detail of the standards.

Monday, May 28, 2012

Digital Asset Management and Taxonomies

Earlier this month I attended a conference on digital asset management (DAM) for the first time: Henry Stewart DAM in New York, May 10-11. It revealed to me that the field of digital asset management is definitely an area where taxonomies are being applied and could be more even extensively utilized.

“Digital assets” refers to digitized content generally of images, video, and sound recordings, but could also be copyright text of publishers. As one speaker mentioned, digital assets are the intellectual property of certain enterprises, and hence the designation “assets.” The typical industries concerned with DAM are publishers, broadcasters, advertising (creative) agencies, and other media companies, which manage vast collections of media files. Additionally, large enterprises in any industry whose corporate communications departments manage sizeable collections of image or multimedia files are also concerned with DAM. The New York venue of this conference drew heavily on representatives of local media and advertising industries, but the annual fall venue of the same conference in Chicago, I am told, has a more diversified participation. The field is additionally defined and driven by vendors, digital asset management software products.

DAM is also a growing field. The 2012 Henry Stewart DAM conference in New York, its ninth year, drew an attendance of approximately 500, up from 400 the previous year. Last year, a new professional association was founded, the Digital Asset Management Foundation. A new quarterly journal from Henry Stewart Publications, Journal of Digital Media Management, just published its first issue this month. Also this month, the DAM Foundation and independent analyst firm, The Real Story Group, released a DAM Maturity Model, which provides a structured framework to address DAM implementation challenges.

As to where taxonomies fit into DAM, it’s not difficult to see. Digital assets tend to be structured content with various metadata fields (subject, purpose, format, location, copyright), which DAM software supports. Taxonomies (or more correctly, any controlled vocabularies) enable the consistent application of descriptive metadata. DAM software supports the inclusion of controlled vocabularies, but the tools to and especially the know-how to build the best controlled vocabularies/taxonomies is often lacking. Meanwhile, standard text search does not work on the non-text content that is typical of digital assets, so tagging and controlled vocabularies are all the more important.

DAM experts and consultants are not necessarily experts in taxonomies, and taxonomy experts may not be familiar with DAMs, so there is some learning for all of us. DAM systems, like other content management systems, often need to be configured, integrated, and customized for a specific enterprise’s use, with expertise and time spent first on system integration, pushing taxonomy design out to perhaps only an afterthought.

Taxonomies have various applications. I have been involved in taxonomies that tend to be either: (1) external facing, to allow customers or clients to search for content published by an organization, whether for research or for e-commerce, and (2) internal, as an enterprise or business taxonomy, to allow employees to find content within an intranet or enterprise content management system. A digital asset management system can manage content for either internal or external users, or often both at once. As such, designing DAM taxonomies often needs to take into consideration more varied users of the content. This is certainly an exciting growth area for taxonomies, and I hope to be more involved in DAM taxonomy projects in the future.

Thursday, April 12, 2012

Faceted Search vs. Faceted Browse

If you have considered different kinds of taxonomies, you have undoubtedly come across the faceted type. You can remember what a facet is by thinking of “face,” as in a multi-faceted diamond. Other names for facet include dimension, aspect, or attribute. It could be the set of characteristics that describe a product (category, size, color, price, intended user, etc.), an image (thing, persons, location, occasion, etc.), or a document (document type, topic, author, source, etc.). In a business or enterprise taxonomy, facets for content management may include content type, product or service line, department or function, and topic. Named entities, such as person names, company names, agency names, and names of laws might also each be a facet. Facets allow users to limit, restrict, or filter results by chosen criteria, one from each facet, that are combined in any order.

Are “faceted browse” and “faceted search” the same? These designations are often used interchangeably, and until recently I had not considered a difference, preferring to use the terminology of my client. Yet “browse” and “search” are clearly not the same thing. To browse is to skim or scan a displayed list of taxonomy terms, whether arranged alphabetically, hierarchically, or a combination. To search is to enter search terms into a search box (which may then be matched against a controlled vocabulary for more accurate results). The implementations of facets in a user interface vary greatly, so perhaps the different designations of “faceted browse” and “faceted search” should reflect these different implementations.

One implementation of facets is to allow the user to dynamically restrict, filter, or limit a data set , based on selecting values from each of multiple facets that are displayed, typically in the left-hand margin, while references to the data or content is displayed in the main screen area. Under each named facet are displayed the names of values (taxonomy terms) within the facet. Facets may need to be expanded to display all values under each, or there may be scroll bars of terms. This implementation of facets can be considered “browse” because the user browses the displayed facets and the displayed terms within each facet.

The data set that is filtered by the facets could be the entire set of content, but more likely it is a subset, based on a prior execution of either a category selection or a search. If the user’s first step was to initiate a search to obtain search results, and then uses facets to limit the search results, this might be called “faceted search.” Even though the user browses the facets, because the facets are introduced as a second step following search, this step might be called “faceted search.” If, however, the user’s first step was to browse subject categories and select a category to obtain the initial data set, then the use of facets in the second step would more likely be called “faceted browse.” I would consider it better practice to call the process “faceted browse” in either case, regardless of how the initial data set was obtained. However, if it’s less confusing to the users, I will defer to those who prefer to call this process “faceted search.”

Another implementation of facets is to allow the user to select among limiting criteria from the beginning, without first selecting a subject by browse or search. In order to achieve usable results (result sets that are not too large), the facets need to contain relatively large taxonomies: a large number and deep set of terms. While it is certainly possible to display a large taxonomy for browsing, it may be difficult to display multiple large, browsable taxonomies, one for each facet. Therefore, if facets are made available to the user from the start (without first requiring the user to select a limited data set based on a search or browse selection), it is more likely that that not all the facets will display the terms to the user. The user must then execute a search within a facet. This would correctly be called “faceted search.” It is also known as “fielded search” or “advanced search,” as a search field/box is made available for each facet “field.”

The distinction between faceted browse and faceted search is lost, however, where the distinction between browsing and searching is becoming blurred. Newer user interface implementations of taxonomies are combining search and browse, so that the difference is no longer as obvious. For example, I have seen cases where there is a search box, and as the user types in something, a type-ahead feature matches the search string against controlled vocabulary terms, which are displayed in a short list under the box, and the user can browse the list to select a term. I have also seen a case where a user may be presented with a search box to enter search terms, and there is a button next to the search box, which the user may optionally click, and then the search box becomes a scroll box to view and browse the entire controlled vocabulary for that field. When these kinds of advanced taxonomy-enhanced search boxes correspond to facets, the distinction between “faceted search” and “faceted browse” truly no longer exists.