Friday, October 19, 2012

Taxonomies for Multiple Kinds of Users

This week, I again attended the annual Taxonomy Boot Camp conference held in Washington, DC, the only conference dedicated to taxonomies. The main theme I came away with this year is that taxonomies serve diverse audiences and users.

The theme of different users was best exemplified in a session dedicate to comparing taxonomies for internal and external use. Representatives from Johnson Space Center (JSC), Astra-Zeneca, the Associated Press (AP), and Sears gave examples in panel “Representing Internal and External Taxonomy Requirements in a Taxonomy Model,” moderated by Gary Carlson. While still remaining connected, internal and external taxonomies not only have different terms for the same concept but they may also have different structure. According to Joel Summerlin of AP, internal taxonomies can be more specialized and complex than external taxonomies, and internal taxonomies need to support greater precision in retrieval results, whereas external taxonomies need to support greater recall.

Even within either the internal or external users of a taxonomy, there is great variety. But unlike the situation of internal and external taxonomies, where you can have different taxonomies linked together, you will have a single taxonomy serving a diverse audience. The use of taxonomy features of polyhierarchy and nonpreferred (aka synonym) terms can help diverse users with different vocabularies, perspectives, and approaches find their way to the desired content.

In the session on internal and external taxonomies, the diversity of internal users was mentioned by Sarah Berndt as a characteristic of JSC. In another session, Helen Clegg described the process of building an enterprise taxonomy at the consulting firm AT Kearney, which has employees in different countries and in different industry specialties. As for external users, Jenny Benevento of Sears described how the customers of its retail website range widely, from repeat shoppers of clothing to those making one-time purchases of engagement rings to those buying large appliances. From the audience, Paula McCoy of ProQuest commented on the importance of knowing, before planning the indexing, who the users are of its different database products.

Other sessions, such as “Taxonomy & Information Architecture,” also addressed the multiple uses and users of taxonomies. Panelist Gary Carlson explained how different personas are used in designing websites, and that the kinds of things that the user-persona seeks or needs can then become taxonomies or facets.
Overall in various sessions of the conference there was a great diversity of taxonomy types, and thus taxonomy users, described. These included:
  • Enterprise taxonomies for internal users, with a set of three presentations under the title of “Enterprise Taxonomies in Action”
  • Public web site taxonomies, as in the case study example of the Consumer Products Safety Commission and additional examples from in the keynote.
  • Retail ecommerce taxonomies, as in the example of Sears and additional mentions of Target and REI in other presentations.
  • Taxonomies used in for article indexing and then retrieval by library patrons of periodical/reference databases, as described in a presentation about Proquest.

Not only may the same taxonomy be targeted at different users at once, but also different users over time. In the closing keynote, Patrick Lamb observed that taxonomies can further add value when we make them available for re-use.

Finally, the conference itself attracted a diverse audience: taxonomists, information architects, data warehouse managers, search specialists, knowledge managers, and others; those from corporations in all industries, government, and nonprofits; and those both new to and experienced with taxonomies. In fact, it’s rare that you would find such a diverse audience at a professional conference. They are united in their need to make information findable, and they understand the value of taxonomies to make that happen.

Tuesday, October 9, 2012

Text Analytics and Taxonomies

What does text analytics have to do with taxonomies? Not so much, I had previously assumed, other than serving a similar objective of information retrieval. After all, text analytics is known as a natural language processing technology designed to obtain meaning for text without the traditional process of indexing to a taxonomy. At the recent Text Analytics World conference in Boston October 3 and 4, however, I learned that text analytics is much more and that the ties between text analytics and taxonomies are greater than I assumed.

The concept of text analytics is used more broadly than I realized, and, as defined in the opening keynote given by conference chair Tom Reamy, encompasses:
  • Text mining, based on natural language processing, statistics, and machine learning
  • Entity extraction, semantic technology that enables "fact extraction”
  • Sentiment analysis, comprising various method to look for positive and negative words
  • Auto-categorization, which is often rules-based
I was a presenter at this conference, and since I always talk about what I know, which is taxonomies, I endeavored to make a connection between taxonomies and text analytics. But to my surprise I was not the only one talking about taxonomies at Text Analytics World.  Two other presentations featured “taxonomies” in their titles thus comprising with mine a half afternoon “Text Analytics and Taxonomies” track. Furthermore, the subject of taxonomies was central to four other presentations and mentioned in a couple others.

My presentation, "Taxonomies for Text Analytics and Auto-Indexing," described how text analytics can be used with auto-categorization and taxonomies to achieve relatively high quality automated indexing results. Auto-categorization is a type of automated indexing that tends to make use of taxonomies, as categorization requires categories (taxonomy terms). Text analytics can be used as a technology to generate meaningful terms from texts, which in turn can be used auto-categorize content against a pre-existing taxonomy. Auto-categorization typically involves technologies of either complex rules to match terms or algorithms and machine learning. In either case, the terms picked up in auto-categorization would be more meaningful if they were first extracted with text analytics technologies based on natural language processing.

Another presentation looked at a different side to the relationship taxonomies and text analytics. Text analytics is also used as means to build taxonomies in the first place, by providing suggested terms that a taxonomist can then edit. Edee Edwards and Rena Morse of Silverchair Information Systems presented a case study on using text analytics to generate terms for taxonomy development. It required multiple iterations and refinements.

Other presenters on the subject of taxonomies and text analytics included the following:
  • Heather Edwards of the Associated Press explained how AP classifies the news using a custom-build taxonomy and rule-based auto-classification system.
  • Evelyn Kent of MCT SmartContent also presented how news items are classified  using a “context-based language” (taxonomy), and even demonstrated how the taxonomy is managed in the taxonomy tool (SmartLogic Semaphore Ontology Manager).
  • Anna Divoli of Pingar presented survey results of taxonomy user interface preferences from cases that involved automatically generated hierarchical and faceted taxonomies.
  • Alyona Medelyan also of Pingar discussed “controlled indexing” in her case study, which featured results of comparing human versus automated indexing (using machine learning and training sets) using the same taxonomy (the Agrovoc agriculture thesaurus of the FAO).
  • Sarah Ann Berndt of the Johnson Space Center spoke about “automatic generation of semantic markup” in a presentation that turned out to be mostly about the application of a taxonomy.
The subject of taxonomies had also come up in the opening keynote. Tom Reamy described three themes in text analytics: big data, sentiment analysis of social media, and enterprise text analytics. In all three areas he mentioned taxonomies. In the area of text mining and big data, text analytics can serve as a semi-automated taxonomy development. In sentiment analysis, new kinds of taxonomies are being developed for emotional sentiments. In enterprise search, text analytics bridges the gap between taxonomies and documents.

Even if text analytics and taxonomies are combined in different ways, what is common is that combining techniques, tools, and technologies in more challenging situations achieves better results. Techniques, tools, and technologies in this field do not have to compete, but can complement each other.