The Accidental Taxonomist

Monday, November 26, 2012

E-Commerce Taxonomies

Happy Cyber-Monday! Coincidentally, this week, which is cyber-week for some retailers, I am giving a conference presentation, at Gilbane in Boston on November 29, on “Taxonomies for E-Commerce.”

As online shopping grows, the organization of products for sale on e-commerce websites becomes increasingly important, and there is also more standardization. Websites present the option to either search (used by customers who know what they want and what to call it), and browse (used by customers who are not sure about what they want or what to call it). For holiday gift shopping, browsing tends to be more common than usual, so displayed taxonomies take on a particularly high visibility at this time.

For browsing, e-commerce websites typically organize their products into hierarchical categories, which are then narrowed by the use of facets. Top level categories correspond to “departments” and could be as few as 2-3 for a specialty retailer or as many as 12-17 for a general/mass merchandize retailer. Usually the hierarchy extends one or two more levels deeper, although a very large retailer may find the need for an occasional fourth level.

At the lower levels of the hierarchy, the customer may then refine the set of products by use of facets (also known as attributes, filters, refinements, dimensions, “limit by,” or “narrow by”). The facets are for characteristics that cut across multiple categories. Facets may be for size, color, price range, material, brand, style, special features, and perhaps even customer rating. These facets will vary depending on the department or broader category type. The terms within a facet, known as “facet values” or “attribute values,” are usually in a flat list The user selects a value from each of multiple facets in combination. In some cases, if check boxes are provided, the user is permitted to select more than one value from within the same facet.

Typically retailers are more concerned about the selection and implementation of technology than in the design of the taxonomy. After all, a hierarchical taxonomy of products would appear simple to design, and even the facets are not too challenging to develop, especially with lots of competitor e-commerce websites to analyze and compare. However, my experience working as a taxonomy consultant on e-commerce taxonomies has led me to realize that creating and editing e-commerce taxonomies is not as easy as it seems.

My conference presentation discusses seven challenges:

1. Distinguishing a subcategory from a facet value
At the higher levels, categories are obvious. Standard facets (size, color, price range, etc.) are also obvious. But the distinction between the most specific subcategories and specialized facets can get blurred. Can “type” be a facet? Is a “plaid shirt” a subcategory of shirts, or is plaid a value in a “pattern/type” facet? Are gas and electric stoves subcategories of stoves, or is “energy source” a facet of stoves? Factors to consider in making these decisions include user perceptions and the number of existing levels of subcategories and numbers of facets.

2. Different categorization options
There are often product categories that are difficult to classify. For example, do video games belong in “Toys and Games” or in “Electronics”? Does Home Theater belong in the “Television/Video” or the “Audio/Stereo department? Having the category in both locations, as the polyhierarchy feature of a taxonomy, is possible. But a breadcrumb trail might follow only a single path, not both, and too many polyhierarchies can be confusing to users.

3. Related items
E-commerce taxonomies are hierarchical and generally do not have associative/non-hierarchical relationships between categories. It is not needed in most cases, but accessories to products and related services (installation, repair, etc.) are clearly related to specific product categories. Taxonomic standards might have to be ignored if making such categories narrower to their main product is the only option. But other, creative display options might be possible.

4. Sort order options
Generally a long list of terms, over a dozen, is easier to scan if alphabetized, whereas a short list of under a dozen terms is better suited to some other prescribed “logical” order. Sort order inconsistency will result, however, if the number of subcategories fluctuates. Determining the “logical” order is also a challenge and often centers around what is most important or popular.

5. Competitor website comparisons
For e-commerce taxonomies (unlike enterprise taxonomies), it’s great to be able to compare with competitors. However, often a retailer is somewhat unique, and no single competitor has exactly the same product categories. Furthermore, it’s important to distinguish between category and content comparison from design comparison. Design may be an extension of a retailer’s overall unique brand graphic design.

6. Web site vs. physical store organization
Physical (“brick and mortar”) stores have their own organization for products that might not work online, but there may be pressure to mimic physical store organization to provide a consistent user experience. While it may make sense to have the biggest sellers up front or at the top of the list, product size (a factor in physical store organization), should not necessarily be a factor in online organization.

7. Business needs vs. taxonomy best practices
Online merchants might want to make certain product categories more prominent, by changing the sort order, adding polyhierarchy locations, or even moving a subcategory up a level. It’s important to keep the integrity of the taxonomy intact, though, so that it remains intuitive for the customers to use.

In sum, product taxonomies are not as simple to create as might be expected. Taxonomy design may be under constraints, and business needs can challenge taxonomy standards. Creative solutions may be needed, and customer perspectives need to be considered through creating personas and/or through user testing.

Thursday, November 1, 2012

From Taxonomies to Ontologies: Customized and Semantic Relationships

At this year’s Taxonomy Boot Camp conference, I was invited to present on the panel giving 5-minute “Pecha Kucha” lightning talks, for which this year’s theme was ontology. Just as there are different understandings and usages of “taxonomy,” so are there different understandings and usages of “ontology.” You can come to if from different angles. If you come to ontologies from the experience of taxonomies and the field of information management, then, most simply, an ontology is a more complex type of taxonomy that contains richer information.

In my brief presentation, “From Accidental Taxonomist to Accidental Ontologist,” I summed up the differences between taxonomies and ontologies as follows:

Relationships: Taxonomies have hierarchical and sometimes a simple “related term” associative, but ontologies have semantic relationships, which are custom-created.
Term Attributes: Taxonomies generally don’t have term attributes, but ontologies do.
Term Classes: Taxonomies generally don’t have classes for terms, unless you consider facets as classes, but ontologies do.
Guidelines/Standards: Taxonomies should follow the ANSI/NISO Z39.19 (2005) or ISO 25964, whereas ontologies are expected to follow the Web Ontology Language (OWL) guidelines and make use of the Resource Description Framework (RDF).
Purposes: Taxonomies support indexing/tagging, categorization, and/or classification of content, and in turn information findability and retrieval. The primary purpose of an ontology is to describe a domain of knowledge, and support of indexing/tagging, categorization, classification, findability, and retrieval can be secondary.
Tools: Some software supports the creation of only taxonomies, some software is for ontologies, and some software can do both quite well. Additionally, some taxonomy/thesaurus software can support most, if not all, features of ontologies.

Coming at ontologies from taxonomies, the biggest distinguishing feature of ontologies is the semantic nature of the relationships.

In a taxonomy or thesaurus, you may have generic relationships, such as:

     Automobile industry RT (related term) Cars, and
     Cars RT (related term) Automobile industry

     Ford Motor Company NT (narrower term) Lincoln Division, and
     Lincoln Division BT (broader term) Ford Motor Company

In an ontology, you may have customized, semantic relationships, such as:

     Automobile industry MAN (manufactures) Cars, and
     Cars IND (manufactured by the industry) Automobile industry

     Ford Motor Company SUB (has subsidiary or division) Lincoln Division, and
     Lincoln Division PAR (has parent) Ford Motor Company

If you can customize the relationships, does this change a taxonomy into a ontology? No. Customized relationships are just one feature of an ontology, although perhaps the most important feature. In my online course on taxonomies, although I don’t teach how to create ontologies, I do provide a lesson on customized/semantic relationships. It is often desirable to create a more complex taxonomy without necessarily meeting all the requirements of an ontology.

Furthermore, a customized relationship might not be fully semantic. In the example above, the second set of relationships are customized, because they are designated by the ontologist for the particular case. The relationships are also “semantic” because they contain specific meaning. (Semantic means “has meaning.”) It is possible to customize relationships while still not making them fully semantic. You may decide to simply rename the standard relationships for your particular application and audience. For example, you might rename broader term (BT)/narrower term (NT) as “parent/child,” or rename Related Term as “see also.” If your taxonomy/thesaurus software is more sophisticated, it will allow you to specify any number of customized relationships, and thus you can add more nuances of meaning.

A key component of truly semantic relationships as expected in ontologies is the ability to create directional relationships that are distinct in each direction, with reciprocity. Most of these semantic relationships will be variants of “related term” (RT), rather than variants of the hierarchical relationship. The generic RT relationship, however, is singularly bidirectional. If you simply customized it by renaming it, it would have to be the same in both directions, such has “has partner.” To create a semantic relationship pair, such as MAN (manufactures) and IND (manufactured by the industry), you need a tool that supports ontological relationships and not just “customized” relationships.

If your tool supports customized relationships but not the ability to create distinct pairs of directional relationships that are associative rather than hierarchical, the results cans still be very useful. You may have a “near ontology” if not a strictly defined ontology. For example, you could rename the singular “related term” (RT) as “Manufacturer-Product” with an abbreviation such as MAN-PRO (Credit to Alice Redmond-Neal of Access Innovations, Inc. for the example). Thus, the relationship is the same in either direction:

     Automobile industry MAN-PRO Cars, and
     Cars MAN-PRO Automobile industry

It is not completely semantic, with the directional details missing, but this may be good enough for your purposes. After all, it should be obvious which is the manufacturer and which is the product. Therefore, taxonomy/thesaurus software that provides most, if not all, features of an ontology may be sufficient, too.

What matters is serving your needs. Rather than calling it an “ontology” when it does not meet all the definitions of an ontology (and causing confusion or disagreement), it may be safer to say your sophisticated taxonomy “has features of an ontology.”

Friday, October 19, 2012

Taxonomies for Multiple Kinds of Users

This week, I again attended the annual Taxonomy Boot Camp conference held in Washington, DC, the only conference dedicated to taxonomies. The main theme I came away with this year is that taxonomies serve diverse audiences and users.

The theme of different users was best exemplified in a session dedicate to comparing taxonomies for internal and external use. Representatives from Johnson Space Center (JSC), Astra-Zeneca, the Associated Press (AP), and Sears gave examples in panel “Representing Internal and External Taxonomy Requirements in a Taxonomy Model,” moderated by Gary Carlson. While still remaining connected, internal and external taxonomies not only have different terms for the same concept but they may also have different structure. According to Joel Summerlin of AP, internal taxonomies can be more specialized and complex than external taxonomies, and internal taxonomies need to support greater precision in retrieval results, whereas external taxonomies need to support greater recall.

Even within either the internal or external users of a taxonomy, there is great variety. But unlike the situation of internal and external taxonomies, where you can have different taxonomies linked together, you will have a single taxonomy serving a diverse audience. The use of taxonomy features of polyhierarchy and nonpreferred (aka synonym) terms can help diverse users with different vocabularies, perspectives, and approaches find their way to the desired content.

In the session on internal and external taxonomies, the diversity of internal users was mentioned by Sarah Berndt as a characteristic of JSC. In another session, Helen Clegg described the process of building an enterprise taxonomy at the consulting firm AT Kearney, which has employees in different countries and in different industry specialties. As for external users, Jenny Benevento of Sears described how the customers of its retail website range widely, from repeat shoppers of clothing to those making one-time purchases of engagement rings to those buying large appliances. From the audience, Paula McCoy of ProQuest commented on the importance of knowing, before planning the indexing, who the users are of its different database products.

Other sessions, such as “Taxonomy & Information Architecture,” also addressed the multiple uses and users of taxonomies. Panelist Gary Carlson explained how different personas are used in designing websites, and that the kinds of things that the user-persona seeks or needs can then become taxonomies or facets.

Overall in various sessions of the conference there was a great diversity of taxonomy types, and thus taxonomy users, described. These included:

Enterprise taxonomies for internal users, with a set of three presentations under the title of “Enterprise Taxonomies in Action”
Public web site taxonomies, as in the case study example of the Consumer Products Safety Commission and additional examples from in the keynote.
Retail ecommerce taxonomies, as in the example of Sears and additional mentions of Target and REI in other presentations.
Taxonomies used in for article indexing and then retrieval by library patrons of periodical/reference databases, as described in a presentation about Proquest.

Not only may the same taxonomy be targeted at different users at once, but also different users over time. In the closing keynote, Patrick Lamb observed that taxonomies can further add value when we make them available for re-use.

Finally, the conference itself attracted a diverse audience: taxonomists, information architects, data warehouse managers, search specialists, knowledge managers, and others; those from corporations in all industries, government, and nonprofits; and those both new to and experienced with taxonomies. In fact, it’s rare that you would find such a diverse audience at a professional conference. They are united in their need to make information findable, and they understand the value of taxonomies to make that happen.

Tuesday, October 9, 2012

Text Analytics and Taxonomies

What does text analytics have to do with taxonomies? Not so much, I had previously assumed, other than serving a similar objective of information retrieval. After all, text analytics is known as a natural language processing technology designed to obtain meaning for text without the traditional process of indexing to a taxonomy. At the recent Text Analytics World conference in Boston October 3 and 4, however, I learned that text analytics is much more and that the ties between text analytics and taxonomies are greater than I assumed.

The concept of text analytics is used more broadly than I realized, and, as defined in the opening keynote given by conference chair Tom Reamy, encompasses:

Text mining, based on natural language processing, statistics, and machine learning
Entity extraction, semantic technology that enables "fact extraction”
Sentiment analysis, comprising various method to look for positive and negative words
Auto-categorization, which is often rules-based

I was a presenter at this conference, and since I always talk about what I know, which is taxonomies, I endeavored to make a connection between taxonomies and text analytics. But to my surprise I was not the only one talking about taxonomies at Text Analytics World. Two other presentations featured “taxonomies” in their titles thus comprising with mine a half afternoon “Text Analytics and Taxonomies” track. Furthermore, the subject of taxonomies was central to four other presentations and mentioned in a couple others.

My presentation, "Taxonomies for Text Analytics and Auto-Indexing," described how text analytics can be used with auto-categorization and taxonomies to achieve relatively high quality automated indexing results. Auto-categorization is a type of automated indexing that tends to make use of taxonomies, as categorization requires categories (taxonomy terms). Text analytics can be used as a technology to generate meaningful terms from texts, which in turn can be used auto-categorize content against a pre-existing taxonomy. Auto-categorization typically involves technologies of either complex rules to match terms or algorithms and machine learning. In either case, the terms picked up in auto-categorization would be more meaningful if they were first extracted with text analytics technologies based on natural language processing.

Another presentation looked at a different side to the relationship taxonomies and text analytics. Text analytics is also used as means to build taxonomies in the first place, by providing suggested terms that a taxonomist can then edit. Edee Edwards and Rena Morse of Silverchair Information Systems presented a case study on using text analytics to generate terms for taxonomy development. It required multiple iterations and refinements.

Other presenters on the subject of taxonomies and text analytics included the following:

Heather Edwards of the Associated Press explained how AP classifies the news using a custom-build taxonomy and rule-based auto-classification system.
Evelyn Kent of MCT SmartContent also presented how news items are classified using a “context-based language” (taxonomy), and even demonstrated how the taxonomy is managed in the taxonomy tool (SmartLogic Semaphore Ontology Manager).
Anna Divoli of Pingar presented survey results of taxonomy user interface preferences from cases that involved automatically generated hierarchical and faceted taxonomies.
Alyona Medelyan also of Pingar discussed “controlled indexing” in her case study, which featured results of comparing human versus automated indexing (using machine learning and training sets) using the same taxonomy (the Agrovoc agriculture thesaurus of the FAO).
Sarah Ann Berndt of the Johnson Space Center spoke about “automatic generation of semantic markup” in a presentation that turned out to be mostly about the application of a taxonomy.

The subject of taxonomies had also come up in the opening keynote. Tom Reamy described three themes in text analytics: big data, sentiment analysis of social media, and enterprise text analytics. In all three areas he mentioned taxonomies. In the area of text mining and big data, text analytics can serve as a semi-automated taxonomy development. In sentiment analysis, new kinds of taxonomies are being developed for emotional sentiments. In enterprise search, text analytics bridges the gap between taxonomies and documents.

Even if text analytics and taxonomies are combined in different ways, what is common is that combining techniques, tools, and technologies in more challenging situations achieves better results. Techniques, tools, and technologies in this field do not have to compete, but can complement each other.

Wednesday, September 12, 2012

Mentoring Taxonomist Program

In my last blog post, I discussed the need for mentoring taxonomists and mentioned that I had volunteered to lead the new mentoring committee of the Taxonomy Division of SLA (Special Libraries Association) and establish its mentoring program (http://taxonomy.sla.org/get-involved/mentor). While some of the mentoring activities are available to members only, other mentoring services can involve anyone, so I will describe them here.

Frequently Asked Question Resources

In many cases, those new to taxonomies simply have questions about the taxonomy field. Therefore, the initial and primary activity of the SLA Taxonomy Division’s Mentoring Committee has been to develop a detailed list of Frequently Asked Questions (FAQs) and answers, which total 35 to date.

The issue as to whether the answers should be a service to Taxonomy Division members only or to public was resolved by having short answers of 1-3 sentences for the public, and longer answers of 150 – 250 words on separate web pages accessible to members only with their login. (Members also have the ability to submit additional questions to the FAQs.) The FAQs with the short answers are available under the Mentoring section of the public website: http://taxonomy.sla.org/get-involved/mentor/taxonomy-faqs

Mentor and Protégé Directories

Connecting aspiring taxonomists (whom we are calling protégés) with experienced taxonomists, who volunteer to be mentors, is another objective. While it is neither practical nor feasible for the Taxonomy Division to provide direct individual mentoring services nor match mentors to protégés, it can act as a clearinghouse in providing directories on its web site of both willing mentors and interested protégés. In the past few months, I have set up both a Mentor Directory and a Protégé Directory, and it is not required that people be listed in one directory in order to contact those listed in the other directory.

Mentor Directory

Access to mentors is, as expected, a membership benefit. Thus, the Mentor directory is accessible by membership login only. Mentors are SLA Taxonomy Division members with considerable experience in some aspect of taxonomies and are willing to volunteer limited time in mentoring for the benefit of their professional growth and prestige. Mentors listed in the Mentor Directory:

should be available for answering specific individual questions about the taxonomy field, education/training, and job prospects, which the general FAQs cannot answer.
probably could help out a protégé who brings his/her own project
most likely do not have projects to offer in an internship type of relationships (but might)

Protégé Directory

Taxonomy Division members who have had at least some training or exposure to taxonomies and would like to gain the benefits of mentoring may list their names in the Protégé Directory, which is displayed on the website:
http://taxonomy.sla.org/get-involved/mentor/directory-of-proteges

Protégés seeking a mentoring relationship could be for taxonomy projects in either of the following two scenarios:

The protégé is looking for a temporary internship or training arrangement, expecting lower than average pay or no pay in exchange for (1) the opportunity to work without prior experience, (2) useful feedback from the supervisor-mentor, and (3) the ability to use the supervisor-mentor as a future work reference.
The protégé has a pending or existing taxonomy project (whether at work, a freelance project, or a volunteer project) and is seeking advice on aspects of the taxonomy design and/or feedback on initial taxonomy work.

Responses to either of these two kinds of mentoring possibilities are still expected to be relatively low, so the Taxonomy Division is permitting nonmembers who can mentor to contact listed protégés. In the case of the first scenario in particular, many qualified taxonomists who are willing to mentor, simply don’t have suitable projects or company legal permission to bring on temporary interns or subcontractors at below-market rates. Non-profit organizations, though, are more likely to have arrangements for volunteers.

Therefore, if you are looking for a taxonomist intern whom you are willing to mentor, check out the Protégé Directory. If you are looking to be mentored, then join SLA and its Taxonomy Division and list yourself in the directory.