Saturday, January 31, 2015

Taxonomy Software Trends



I reviewed various taxonomy/thesaurus management software offerings recently, in preparation for the last of my 3-part webinar series, Practical Taxonomy Creation, and I noticed some trends since I last looked into software in such detail for my book over 5 years ago: more cloud/web-based software, more SKOS/RDF/Semantic web framework software, and more plugins to SharePoint, content management systems, and search engines.

The number of commercial vendors selling taxonomy/thesaurus management software is not significantly different, as some have left the market, and others have entered, and the rest have continued with updates and improvements. There are fewer commercial low-end, inexpensive, single-user desktop offerings, however. Products I have reviewed in the past and that have gone away include Webchoir TCS-10 Personal and Term Tree 2000. The Mac OS program Cognatrix has been unavailable for the past year, although the vendor intends to release it again as an Apple App Store program following the release of the next major version of Mac OS.

Subscription, web-based software


Synaptica pioneered web-based thesaurus management software when it introduced its product in 1995, when the Web was still young, but now other vendors also offer web-based subscription software. Data Harmony Thesaurus Master from Access Innovations was originally only available in a java-based multi-platform client-server installation. For the past several years a web version has also been available, and Access Innovations president Marjorie Hlava said in an email: “Increasingly our customers use the cloud version of the software.” Newer thesaurus management software products to the market have also been solely cloud-based. These include PoolParty, introduced by the Semantic Web Company in 2009, and TopBraid Enterprise Vocabulary Net (EVN), released by TopQuadrant in 2010. Meanwhile Synaptica began offering Synpatica Express, a cloud-computing solution for individuals and smaller businesses. Finally, the long affordable mainstay MultiTes Pro, a Windows-based desktop program that that has been available since 1983 in a single user version and then also for multiple users, introduced a multi-user cloud version about six years ago, which in 2013 was updated and renamed as MultiTes Online.

The cloud-based software offerings are, of course, priced on annual (and in one case, monthly) subscription fees, instead one-time license costs with lesser priced updates. Hopefully this means that more organizations will try out developing a taxonomy in the appropriate tool with the reduced commitment of cost for a shorter time.

SKOS/RDF/OWL Semantic web framework software


Supporting linked data and interoperability with Semantic Web content has become more important. Therefore, World Wide Web consortium (W3C) recommendations, such as the SKOS (Simple Knowledge Organization System) framework, RDF (Resource Description Framework) specifications, and OWL (Web Ontology Language) are being adopted by newer thesaurus/taxonomy software. The newer products, PoolParty and TopBraid EVN are both built around SKOS models. Synaptica and Data Harmony Thesaurus Master have been able to export to a SKOS and OWL schema for a long time, but it was only in 2013 that Data Harmony added user-defined fields to the SKOS export to include all fields in a term record. Additionally, in 2011 Synaptica introduced an Ontology Publishing Suite to publish an ontology or part of an ontology to the Web.

My first criterion for thesaurus management software is that is that in enforces relationship rules in accordance with the ANSI/NISO Z39.19-2005 standards. SKOS is not an alternative standard, but rather a framework that can be followed in addition to ANSI/NISO Z39.19-2005. Ideally a software product complies with both, and some now do.

Plug-ins and connectors for search and content management


The most common software for internal content management (even though it is not really a content management system) is SharePoint. Prior to 2010, SharePoint handled controlled vocabulary metadata in such a simple way (not even in hierarchies) that there was no point in trying to use taxonomies.  Starting with SharePoint 2010, with its Managed Metadata Services, taxonomies can now be utilized in its Term Store. However, despite Term Store improvements from SharePoint 2010 to 2013, it is still far from having the features and capabilities of a dedicated thesaurus management software product. Thus, ideally you create the taxonomy in the dedicated tool and port it over to SharePoint, and now almost all enterprise-level thesaurus management software products have methods to connect to SharePoint, whether through APIs, plug-ins, or dedicate “connector” modules.

There are also increasing numbers of content management systems and search software products being supported by thesaurus management connections. For example, SmartLogic Semaphore Ontology Manager has integrations with a greater number of applications than in the past, including SharePoint, Google Search Appliance, Apache Solr, OpenText, MarkLogic, and IBM Watson. PoolParty has a WordPress plugin, in addition to integrations with SharePoint and Drupal. Surely more such connections will be added, as I have recently heard of requests for taxonomy imports into Drupal.

Monday, December 8, 2014

Taxonomy Courses

Note: Since this blog post was written, Simmons College discontinued its continuing education program. I am now offering the 5-week online workshop, previously offered through Simmons, independently through Hedden Information Management. Information is at: http://www.hedden-information.com/courses-workshops/taxonomy-course/

__________________________________________________________________________

I have been teaching workshops on how to create taxonomies for over seven years. Coming up in the winter and spring of 2015 I am offering more kinds of workshops and learning options than ever before. I had offered customized corporate onsite workshops in the past, but since I don’t have the time for that any more, it makes sense to accept opportunities to offer general training in taxonomies. I don’t intend to offer directly competing course offerings, so this blog post aims to outline the differences between these various taxonomy courses.

The differences are primarily in learning approach (online or in-person, synchronous or asynchronous), the depth of instruction, cost, and convenience. The audience focus of each is not substantially different. The level in most cases is primarily “advanced beginner.” Prior exposure or use of taxonomies, general training in library/information science, and/or work in related fields such as information architecture, metadata, records management, indexing, content management, digital asset management, etc., is highly beneficial. No such background, however, does not preclude participation, but may make it a little more challenging. Prior experience in creating or editing taxonomies, on the other hand, does not necessarily make you too advance for the classes, as your experience may be limited to just one kind of taxonomy. The only difference is the SLA full-day course, which is aimed more directly at beginners. The workshops are also suitable for both practitioners and managers.

Simmons College School of Library and Information Science continuing education workshop

5-week online workshop with next available session in March 2015, and likely another two or three times later in the year.  Description, Registration
Benefits:
- Individual feedback on submitted assignments.
- Simmons College certificate and record of completion
- Access to free trial of taxonomy management software which you could not get on your own (in - additional to others, which you could get a 30-day trial on your own)
- Opportunity to email questions and get answers
- Greater learning opportunity through assignments and feedback and more material to read
Disadvantages:
- Limited space, usually filling up a month or more in advance (January 2015 session filled)
- A greater total time commitment and over a specific period of time
- Inability to easily save formatted lessons. While you can copy lesson content for your own purposes, the Moodle platform does not offer an easy way to save lessons in the original formatting.

American Society for Indexing Online Learning: “Practical Taxonomy Creation”

Three weekly one-hour sessions, January 14, 21, and 28, 2015, and/or recordings, Description, Registration
Benefits:
- Live phone Q&A
- Unlimited capacity. Sign up at the last minute and still get in. Or register after the live session for access to the recording.
- Limited time commitment
- Option to attend some live and some as recording, if not all sessions fit in your schedule.
- Access to the presentation and recording for unlimited repeated viewings/listening
Disadvantages:
- No individualized assignment feedback
- Topics that are not core not covered due to limited time
- Less learning time (excluding webinar replays)
- Limited time for questions

American Society for Indexing conference 3-hour workshop: “Topics in Taxonomy Creation”

Either April 30 or May 1 (TBD) in Seattle, WA, Description, Registration
Benefits:
- In-person learning experience
- Personal connection with me the instructor and other participants for better networking
- May interrupt the presentation with questions (unlike the webinar in which you must wait for the Q&A time)
- Live demos of taxonomy management software
- Copy of the slides and handouts to keep
Disadvantages:
- Travel time and costs to Seattle
- Required registration for conference (no separate workshop registration)
- Limited instruction time and content

SLA conference full-day (8 hours) conference continuing education workshop: “Introduction to Taxonomy”

Saturday, June 13, 2013, Boston, MA, Description, Registration
Benefits:
- Appropriate for complete beginners
- In-person learning experience
- Personal connection with me the instructor and other participants for better networking
- May interrupt the presentation with questions (unlike the webinar in which you must wait for the Q&A time)
- Live demos of taxonomy management software
- Copy of the slides and handouts to keep
- Ample live Q&A time
- Discounted student and retired SLA member pricing
Disadvantages:
- Travel time and costs to Boston
- A lot of material to digest if a short period of time

Why learn about taxonomies? It is a key tool/method/component of knowledge management and information management.

Saturday, November 8, 2014

Taxonomy Trends and Future

What are the trends in taxonomies, and where is the field going? The future of taxonomies turned out to be a unifying theme of last week’s annual Taxonomy Boot Camp conference, in Washington, DC, the premier event in taxonomies, from its opening keynote to its closing panel.

“From Cataloguer to Designer” was the title of the opening keynote, an excellent presentation by consultant Patrick Lambe of Straits Knowledge. He said that there are new opportunities for taxonomists, especially in the technology space, if they change their mindset and their role from that of cataloguers, who describe the world as it is, to that of designers, who plan things as they could be. New trends involving taxonomies that he described include search-based applications, autoclassification, and knowledge graphs (such as the automatically curated index card of key information on a topic, as appears in some Google search results).

As this was the 10th annual Taxonomy Boot Camp conference, the final session was “10 Years Back, 10 Years Forward,” a panel of consultants who had presented at the first Taxonomy Boot Camp conference in New York in 2004 (and at most of the conferences since), and who answered questions about how things of have changed and offered comments on various predictions.

The spread of greater understanding of taxonomies was a common theme of that panel. Gary Carlson of the consultancy Factor noted that now taxonomy can be discussed with the executives, whereas in the past only some people in an organization would show an interest in taxonomies. This was echoed by Seth Earley of Earley & Associates, who observed that organizations are beginning to understand that a taxonomy is more than just terms but is also a process: “Organizations are starting to get it.” Tom Reamy of KAPS Group recalled that in his earlier projects he had to help his clients strategize more to figure out how a taxonomy can help, but now they already know about taxonomies and just want to do it. He also pointed out that the early adopters of taxonomies were the large science and financial enterprises, but now smaller companies are also implementing taxonomies.

Looking to the future, the panelists’ shared predictions included greater use of linked data, taxonomy visualization, and text analytics. Joseph Busch of Taxonomy Strategies commented on the “power of re-use,” so that we will spend less time doing taxonomies on standard things, such as geographic places, and “not re-invent universals.” With respect to taxonomy visualization, he observed that it “helps people think.” Regarding text analytics, Tom Reamy, the conference’s biggest champion of the technology, explained that it fills the gap between the taxonomy and what it should do.

Other sessions, such as the panel “The Curious Lives of Full-Time Taxonomists” also addressed the issue of new themes in taxonomies. “Taxonomy is seeping into the culture, as part of the enterprise knowledge of the world, “explained Barbara McGlamery of Pearson. “People are asking for problem solving and not just a taxonomy, as they have more awareness of taxonomy,” observed Sarah Barrett of Factor.

New trends and technologies were discussed in individual presentations, too. Using the agile method for taxonomy development was described in two presentations: the main topic of “Using Agile to Build a Taxonomy/Ontology,” by Evelyn Kent of Smartlogic, and as a feature in “Developing Use Cases Before Developing the Taxonomy,” presented by Vivian Bliss of Taxonomy Strategies. Greater sophistication in sentiment analysis that enables leveraging of taxonomy was a key point in Tom Reamy’s presentation “Taxonomy and Social Media: Social Taxonomies.” Technology was also at the forefront sessions, such as “Taxonomies in Search” comprising four presentations, and “Automated Taxonomy Management,” comprising three presentations.

Finally, the growth in interest in taxonomies was reflected in the conference attendance (around 200). While exact numbers of attendees of Taxonomy Boot Camp cannot be counted, because some attendees have platinum passes allowing them the choice of co-located conference sessions to attend (including KM World and Enterprise Search & Discovery), Tom Hogan, CEO of Information Today Inc., the conference owner, informed me that dedicated Taxonomy Boot Camp registrations had doubled since last year and commented on how it had grown for just a small add-on to KM World, to a significant conference in its own right.

Tuesday, October 28, 2014

Taxonomy: A Profession, Not an Industry

I am looking forward to attending and presenting at my 8th Taxonomy Boot Camp conference next week. What makes this conference special is that it is very much both a professional and a commercial/industry conference, whereas most conferences tend to be one or the other. In other words, it is a commercial conference that serves a profession.

A professional conference is one that is usually organized by a nonprofit professional organization/society for its members for furthering the intellectual exchange in the field and otherwise serving the needs of its members. Professional conferences at which I have presented include those of SLA (Special Libraries Association) and the American Society for Indexing. A commercial conference, on the other hand, is one put on by a company (in publishing, advertising, research, consulting, or pure event management) to bring together clients and vendors in specialty area and promote business for all. Commercial conferences at which I have presented, in addition to those associated with KM World, include the Gilbane conference, Henry Stewart DAM, and Text Analytics World. Professional conferences do have vendor exhibits, too, but more as an aside to help finance the conference, and these exhibits can be very small. Commercial conferences do, of course, have informative and educational sessions, but the conference is organized with the primary purpose of earning a profit from selling exhibit space and registrations.

Commercial conferences are often based on an industry, with industry loosely defined as companies that sell related products or services for a defined market and thus potentially could be exhibiters. This “industry” could be as specialized as knowledge management, content management, or digital asset management. Taxonomy, however, is not an industry.

Taxonomy is a profession and is also an information management tool/technique. Sometimes an industry and a profession are almost the same, such as in medicine and law. Closer to the world of taxonomies are the industry/professions of software development, consulting, and librarianship. Taxonomy work comes closest to the latter, and many taxonomists were originally trained as librarians. So, if libraries are both an industry and a profession, then some might make the assumption that taxonomy is also both an industry and a profession.

To determine if there is an industry associated with a profession, to look at trade show/exhibit vendors at a conference or look for advertisement-supported trade journals. Taxonomy Boot Camp has a mini exhibit of usually half a dozen sponsors, in contrast with the co-located KM World conference showcase of over 30 sponsors. Indeed, commercial software vendors of pure taxonomy management tools (not a feature of a larger solution) can be counted on one hand. Taxonomy-related services, namely those of consultants, are also a significant business, but this cannot be considered its own “industry.” That is because any consulting firm (larger than a sole proprietorship) that consults on taxonomy also consults in other, related areas, such as knowledge management, data management, user experience design, content integration, etc. As for trade journals, there are none dedicated to taxonomy, simply because there are not enough companies that would advertise in this niche space. Libraries, on the other hand, do have lots of vendors, which exhibit at conferences, and there are also library trade journals.

Taxonomists work in all industries. I have worked in full-time permanent positions as a taxonomist in industries including publishing, software, consulting, and renewable energy, and have provided taxonomy consulting services to many more industries: financial services, retail, hospitality, biomedical research, manufacturing, and education. Despite my various industries of my employment, I have always applied the broad “Information services” or “Information technology and services” as my “industry” in my LinkedIn profile. For this reason, trying to analyze the industries used in taxonomist LinkedIn profiles might not be accurate or useful, due to the preference of those two industry designations. Nevertheless, I have found taxonomists in LinkedIn to use the following additional industries:

Libraries
Internet
Publishing
Research
Online Media
Higher Education
Computer Software
Public Relations & Communications
Marketing & Advertising
Management Consulting
Entertainment
Pharmaceuticals
Hospital & Health Care
Oil & energy

Indeed the Taxonomy Boot Camp conference has attendees from all of these varied industries, but all with a shared professional interest in taxonomies. That’s what makes this conference feel more like a professional conference. But unlike a professional conference (such as SLA for librarians, of which I am not, so I always feel like an outsider there) , you don’t have to be a member of an organization or professional taxonomist, just interested in taxonomies as a tool/technique. As such, the conference is both highly educational/informative, yet welcoming and open to all.

Tuesday, September 23, 2014

One or More Taxonomies


In the various definitions of taxonomy, one aspect of the definition that is often missing is what constitutes a single taxonomy (or thesaurus) versus multiple related taxonomies (or thesauri). If you hire a taxonomy consultant, they won’t tell you because they will defer to their client’s terminology. If you are designing a taxonomy/taxonomies for your own organization, however, this is often an issue of concern.

Hierarchies and other relationships

In simple hierarchical taxonomies, a single hierarchy could be a single taxonomy. Not all terms on the same subject, however, may fit neatly in one hierarchy while complying with ANSI/NISO hierarchical relationship guidelines. So, more often than not, a hierarchical taxonomy may have multiple top terms. For example, a taxonomy on health care might have top terms for hierarchies on conditions and diseases, diagnostic procedures, treatments, medical equipment and supplies. If for some reason you needed a single hierarchy, then you would bend the hierarchical-relationship rules to make such top terms narrower to the term that is the name of the taxonomy. Thus, whether there is one top term or multiple top terms, it is still considered one taxonomy.
Facets are a special case. Each facet consists of its own hierarchy of terms, or may even have multiple top-term hierarchies of similar-type terms on the same subject, and there are no relationships between terms in different facets. So, you might consider each facet to be a taxonomy. However, the facets are intended to be used only in combination, not in isolation. In fact, we often speak of a “faceted taxonomy,” implying a single taxonomy comprised of multiple facets. So, a single facet is not a taxonomy.

A more thesaurus-like structure, may have fewer large hierarchies and more smaller hierarchies with more numerous top terms, but it will also have associative relationships that link terms across hierarchies. So, a possible definition of a taxonomy or thesaurus is a set of terms where there is at least some kind of relationship between every term and at least one other in the same set. However, you could end up with a situation of just a couple of terms related to each other but none of them are related/linked to any other terms in the taxonomy. So, additional criteria are needed to define a single taxonomy as to include such terms.

Thus, at a minimum, a taxonomy comprises one or more hierarchies, but what about at a maximum? The question came up in my online course, in an assignment to create polyhierarchies, in which I suggest that the broader terms are from different hierarchies. A student asked: “Are the different hierarchies supposed to be within the same Taxonomy, or merely two different hierarchies from two different Taxonomies?” Generally, standard hierarchical and associative relationships do not transcend multiple taxonomies. An exception would be instance-type hierarchical relationships between topics in a taxonomy and named entities (proper nouns) maintained in a separate controlled vocabulary. Other types of relationships may link terms across multiple taxonomies, but they would likely be special-purpose relationships, such as equivalency mappings or translations.

Subject scope and purpose

In addition to considering the relationships between terms, another determining factor of what constitutes a single taxonomy is the subject area scope. One taxonomy is for one subject area, although that subject area could be very broad, especially if the taxonomy’s purpose is to support indexing of the topics in a daily national newspaper. More often, a taxonomy is more limited in scope, such as just technology topics or health topics.

Related to subject scope is how the taxonomy will be used in both indexing/tagging and retrieval. Generally, a single taxonomy is utilized in a single indexing/tagging method and with its own indexing policy. Policy, comprising both editorial style for terms and indexing rules, is often a defining factor for a single taxonomy. Different taxonomies will have different policies. For the end-user, a retrieval function is served by a single taxonomy, such as supporting a search function or providing a set of browse categories. If you want to enable multiple unrelated methods of retrieval (such as type-ahead for the search box, dynamic filtering facets, and a navigational browse), then you will need to create separate taxonomies for each. At a former employer I built taxonomies for SharePoint, and it turned out that I had to build three completely separate taxonomies: (1) the consistently labeled hierarchy of libraries and folders, (2) terms and their variants to support search with a third-party auto-classification tool, and (3) controlled vocabularies of terms for consistent tagging and metadata management of uploaded documents.

There is also the question of whether the content to be accessed by the taxonomy is together in one set or separated out for different purposes or different audiences. A taxonomy should be designed to suit its own content. This was the case in a current project I am working on. There are two distinct sets of content available on a web site. The content sets have many similarities, so could be browsed via the same one hierarchical taxonomy, but they are for potentially different audiences. If the content set were to remain separate, we would have created two separate taxonomies, each customized to best suit its own set of content. But the site owners decided that the two sets of content would be presented together, “blended,” to cross-sell content, in addition to standing on their own elsewhere on the site. Thus, a single taxonomy was the chosen option. The use of two content categories for terms within the taxonomy will enable the additional, separate content set option.

Conclusions


In sum, a single taxonomy:

  • Has standard relationships (BT/NT, RT, USE/UF) confined within it. Cross-taxonomy links, if any, are of non-standard types.
  • Has a defined, restricted subject scope.
  • Has its own indexing/tagging policy.
  • Could function in isolation, unlike a single facet (although may be supplemented by other controlled vocabularies/metadata).
  • Has its own implementation, function, and purpose (although taxonomies can be reused and repurposed).

It’s important for a taxonomist to determine what constitutes a single taxonomy versus multiple taxonomies, not so much for communicating with stakeholders, but rather to plan the initial design of the taxonomy within a taxonomy management tool. Taxonomy/thesaurus software allows for the designation of one or more taxonomies/thesauri that may be linked to each other or not. The use of multiple so-called files, thesauri, vocabularies, objects, classes, categories, etc. are different ways that the various software tools allow the taxonomist to control the divisions between and within taxonomies.