Monday, November 28, 2011

Multilingual Taxonomies


We know that taxonomies help information-seekers browse or search for desired documents/information. Taxonomies provide the bridge between the user’s choice of words and the wording within the desired documents. But what if the user actually speaks a different language than that of the content? Documents can be translated (automatically if it’s just to get the general meaning or by human translators when accuracy is important), but that’s only done after the document is found. To support the findability of foreign language documents what is needed is a bilingual or multilingual taxonomy (“bilingual” meaning in two languages, and “multilingual” meaning in three or more languages).

This Thursday, December 1, I will be presenting on the topic of multilingual taxonomies at the Gilbane Conference in Boston, were the focus is web and enterprise content management. This session, which will be shared with the co-speaker Ross Lehrer of WAND, appears to be only one in the conference dedicated to taxonomies and the only presentation with the word “multilingual” in its name.  The topic will be of interest to both those concerned with multilingual content but with no experience with taxonomies and to those with an interest in taxonomies but no experience with multilingual content.

The description of the session (which I did not write) on the conference website says: “Multilingual content dramatically expands the potential market for your products, and multilingual taxonomies often need to be part of your multilingual strategy.” This description applies better to my colleague’s presentation, especially since the taxonomies that his company builds are product taxonomies. My presentation, on the other hand, addresses taxonomies for more than just websites of products, such as taxonomies for retrieving articles written in different languages.

The issue is whether the multilingual content is created and managed internally or externally to your organization. If your multilingual content is what your organization creates, such as additional language versions of a public website for a global market, then it is likely that the content in the different languages is managed internally but separately, by separate language teams. The content is similar but not identical in each language, and the taxonomies that support search and browse may also be created and managed separately. Having taxonomies in different languages, however, is not exactly the same as a “multilingual taxonomy.”

A good analogy would be a translated book. The book’s index should not simply be translated; rather a new index is created by an indexer, who is a native-language speaker of the translated language, based on the newly translated text. Consulting the original language index is fine, but directly translating it will have less than ideal results. Similarly, if you have a website translated into another language, and the website has a taxonomy for browsing for specific content pages, that taxonomy should not simply be translated, but rather a new second-language taxonomy should be created, consulting the first taxonomy, of course.

By contrast, a truly multilingual taxonomy connects users who speak one language to content that is in another language. There needs to be a one-to-one correspondence between terms across both languages, and the different language versions need to be managed together. It’s somewhat complicated to design and create, but software tools are available for this, and the result is a powerful aid to searching and browsing across languages. What is important is to match your multilingual taxonomy design to the specific goals, either (1) service in different language markets, each with their own language content; or (2) users being able to access content in a language which they don’t speak.

Sunday, November 20, 2011

Taxonomies: Not New, but Growing

 
What’s new in the field of taxonomies? I am asked this question following my attendance at the two-day Taxonomy Boot Camp conference (October 31 – November 1, Washington, DC), the only conference dedicated to information management taxonomies.  There is actually not a lot that is new in taxonomies, which is OK. Rather, taxonomies are new in increasingly more applications, organizations, and implementations; and that is more significant

We actually don’t want anything significantly new in taxonomy design, because taxonomies serve users with predictable, standard methods of navigation.  For example, the nontrained user should be able to understand a display of broader and narrower terms. Taxonomies have actually been around a lot longer than most people realize (and I don’t mean the Linnaean taxonomy of living organisms). Taxonomies (known as controlled vocabularies) have been around since the late 1800s for cataloging books and other library materials, such as Library of Congress Subject Headings, and indexing journal articles in the Reader’s Guide to Periodical Literature published by the H.W. Wilson Company. For generations, library science students have been able to take courses in designing and using thesauri for indexing periodical literature.

Taxonomies now, however, are showing up in more and more places. These include public websites that contain numerous data records, such ecommerce sites that list all their products or databases of movies, music, recipes; a proliferation of new niche subscription database vendors; business to business databases; and most significantly in the growing content and document management repositories of any medium-to-large enterprise. Taxonomy consultants as myself are increasingly finding that taxonomy projects are not merely those of building new taxonomies from scratch, but also revising, improving, integrating, and repurposing existing taxonomies that have been created in the past 5-10 years.

It was good to have a presentation at Taxonomy Boot Camp, perhaps its first, that dealt with taxonomies for managing image files (known as digital asset management or DAM), rather than just text-based documents. Additional applications of taxonomies would be welcome topics at future TBC conferences.

The switch in scheduling Taxonomy Boot Camp with its co-located conferences KM World, Enterprise Search Summit, and SharePoint Symposium from following those three conferences to preceding them  seems to signify a shift in perspective, too. Taxonomies are no longer seen as just an add-on specialization, but rather a basic system that information professionals need to understand as a component of knowledge management, search and SharePoint implementations.

Finally, the spread in adoption of taxonomies is indicated by the fact that Taxonomy Boot Camp for the first time included both a basic and a “Beyond the Basics” track for one of its two days. More taxonomies are in place and there are more people experienced in taxonomies, that more advanced topics now can have their own audience. Despite compelling speakers in the consecutive basic track, the advanced sessions were well attended. I look forward to hearing more about what taxonomies can do at the next Taxonomy Boot Camp conference, October 16-17, 2012.

Saturday, November 19, 2011

Introduction to a New Blog on Taxonomies


I have posted a number of blog posts on taxonomy topics, but until now those posts have not been on a blog of my own, but elsewhere: of an employer Project Performance Corporation’s blog, The Taxonomy Blog of my colleague Marlene Rockmore, and that of Earley & Associates' blog where I did some contracting work.

At first it was not certain if I had enough to say to start my own taxonomy blog. Upon completing my book, The Accidental Taxonomist, at the end of 2009, I certainly did not have much more to say on the subject after writing over 400 pages. Now in the meantime I am gaining additional experiences with taxonomies and am attending more conferences and other events, so finally feel that there are indeed more new ideas I can share about taxonomies and also more than I could post on my employer’s blog. (I have to give my co-workers turns to post, too!)  I do not plan to write another entire book on taxonomies (maybe just a chapter somewhere), so I don’t have to keep the thoughts to myself for later.

Where will my new blog post ideas come from?

As a consultant, I am constantly engaging in new taxonomy projects with new experiences, new lessons to be learned, and new insights into the field. My client names should be kept confidential, so writing complete case studies may not be feasible, but the short informal nature of a blog post is quite appropriate to share some thoughts.

I also attend a number of conferences during the course of a year, and there are always new ideas coming out of these events. Some of my blog posts will be based on my own presentation topics, but not a repeat of the slide bullets, though. Instead I will provide some commentary about the presentation topic, such as why it is significant, timely, of interest, or what my concerns are. Other posts will be my observations an ideas gleaned form what others presented.

I may decide to revisit a topic in my book for a blog post. But I could also explore some new direction of topics related to taxonomies, such as content management, information architecture, search, or digital asset management.