Tuesday, September 23, 2014

One or More Taxonomies


In the various definitions of taxonomy, one aspect of the definition that is often missing is what constitutes a single taxonomy (or thesaurus) versus multiple related taxonomies (or thesauri). If you hire a taxonomy consultant, they won’t tell you because they will defer to their client’s terminology. If you are designing a taxonomy/taxonomies for your own organization, however, this is often an issue of concern.

Hierarchies and other relationships

In simple hierarchical taxonomies, a single hierarchy could be a single taxonomy. Not all terms on the same subject, however, may fit neatly in one hierarchy while complying with ANSI/NISO hierarchical relationship guidelines. So, more often than not, a hierarchical taxonomy may have multiple top terms. For example, a taxonomy on health care might have top terms for hierarchies on conditions and diseases, diagnostic procedures, treatments, medical equipment and supplies. If for some reason you needed a single hierarchy, then you would bend the hierarchical-relationship rules to make such top terms narrower to the term that is the name of the taxonomy. Thus, whether there is one top term or multiple top terms, it is still considered one taxonomy.
Facets are a special case. Each facet consists of its own hierarchy of terms, or may even have multiple top-term hierarchies of similar-type terms on the same subject, and there are no relationships between terms in different facets. So, you might consider each facet to be a taxonomy. However, the facets are intended to be used only in combination, not in isolation. In fact, we often speak of a “faceted taxonomy,” implying a single taxonomy comprised of multiple facets. So, a single facet is not a taxonomy.

A more thesaurus-like structure, may have fewer large hierarchies and more smaller hierarchies with more numerous top terms, but it will also have associative relationships that link terms across hierarchies. So, a possible definition of a taxonomy or thesaurus is a set of terms where there is at least some kind of relationship between every term and at least one other in the same set. However, you could end up with a situation of just a couple of terms related to each other but none of them are related/linked to any other terms in the taxonomy. So, additional criteria are needed to define a single taxonomy as to include such terms.

Thus, at a minimum, a taxonomy comprises one or more hierarchies, but what about at a maximum? The question came up in my online course, in an assignment to create polyhierarchies, in which I suggest that the broader terms are from different hierarchies. A student asked: “Are the different hierarchies supposed to be within the same Taxonomy, or merely two different hierarchies from two different Taxonomies?” Generally, standard hierarchical and associative relationships do not transcend multiple taxonomies. An exception would be instance-type hierarchical relationships between topics in a taxonomy and named entities (proper nouns) maintained in a separate controlled vocabulary. Other types of relationships may link terms across multiple taxonomies, but they would likely be special-purpose relationships, such as equivalency mappings or translations.

Subject scope and purpose

In addition to considering the relationships between terms, another determining factor of what constitutes a single taxonomy is the subject area scope. One taxonomy is for one subject area, although that subject area could be very broad, especially if the taxonomy’s purpose is to support indexing of the topics in a daily national newspaper. More often, a taxonomy is more limited in scope, such as just technology topics or health topics.

Related to subject scope is how the taxonomy will be used in both indexing/tagging and retrieval. Generally, a single taxonomy is utilized in a single indexing/tagging method and with its own indexing policy. Policy, comprising both editorial style for terms and indexing rules, is often a defining factor for a single taxonomy. Different taxonomies will have different policies. For the end-user, a retrieval function is served by a single taxonomy, such as supporting a search function or providing a set of browse categories. If you want to enable multiple unrelated methods of retrieval (such as type-ahead for the search box, dynamic filtering facets, and a navigational browse), then you will need to create separate taxonomies for each. At a former employer I built taxonomies for SharePoint, and it turned out that I had to build three completely separate taxonomies: (1) the consistently labeled hierarchy of libraries and folders, (2) terms and their variants to support search with a third-party auto-classification tool, and (3) controlled vocabularies of terms for consistent tagging and metadata management of uploaded documents.

There is also the question of whether the content to be accessed by the taxonomy is together in one set or separated out for different purposes or different audiences. A taxonomy should be designed to suit its own content. This was the case in a current project I am working on. There are two distinct sets of content available on a web site. The content sets have many similarities, so could be browsed via the same one hierarchical taxonomy, but they are for potentially different audiences. If the content set were to remain separate, we would have created two separate taxonomies, each customized to best suit its own set of content. But the site owners decided that the two sets of content would be presented together, “blended,” to cross-sell content, in addition to standing on their own elsewhere on the site. Thus, a single taxonomy was the chosen option. The use of two content categories for terms within the taxonomy will enable the additional, separate content set option.

Conclusions


In sum, a single taxonomy:

  • Has standard relationships (BT/NT, RT, USE/UF) confined within it. Cross-taxonomy links, if any, are of non-standard types.
  • Has a defined, restricted subject scope.
  • Has its own indexing/tagging policy.
  • Could function in isolation, unlike a single facet (although may be supplemented by other controlled vocabularies/metadata).
  • Has its own implementation, function, and purpose (although taxonomies can be reused and repurposed).

It’s important for a taxonomist to determine what constitutes a single taxonomy versus multiple taxonomies, not so much for communicating with stakeholders, but rather to plan the initial design of the taxonomy within a taxonomy management tool. Taxonomy/thesaurus software allows for the designation of one or more taxonomies/thesauri that may be linked to each other or not. The use of multiple so-called files, thesauri, vocabularies, objects, classes, categories, etc. are different ways that the various software tools allow the taxonomist to control the divisions between and within taxonomies.

Monday, August 25, 2014

Independent Taxonomy Work


Are you an aspiring taxonomist looking for work?  Because taxonomies tend to be project-based tasks, a lot of taxonomy work is freelance, contract, or consulting. I have written on this topic  in my book, but that was over four years ago, and I have seen or experienced many taxonomy jobs since, so it’s time for an update.

Freelance taxonomy work

The freelance taxonomist works on portions of a taxonomy project but does not do everything required of taxonomy project. I now see that the greatest opportunity for freelance taxonomy work is to freelance/subcontract to independent taxonomy consultants or small taxonomy consultancies. These consultants take on projects too big for one person and need to subcontract parts of it. Freelancing directly to an end-client, meanwhile, has become increasingly rare.

The best way, and really the only way, to find this kind of work is through serious networking with taxonomy consultants. Make sure, however, that any freelance contract does not preclude you from serving competing consultants.

The freelance taxonomist does most of the work remotely from home, but could be called on to visit a client site, depending on the nature of the project. Being open to travel and having client relationship skills thus helps, but is not always a requirement. Work could be researching and creating a new taxonomy from scratch, editing an existing taxonomy, mapping two different taxonomies, or developing auto-categorization rules for a taxonomy. In any case, the freelancer is not the sole person responsible for the taxonomy.

People suited for this kind of taxonomy work tend to be those with at least one past employment in taxonomy, metadata, or classification work and already comfortably set up as a freelancer, such as through editorial work, indexing, or consulting in the information field. Basic office software is usually all that is needed, but prior experience with a taxonomy management tool is helpful, and, if needed, remote access to the system can be arranged

Contract taxonomy work

The contract taxonomist may do the same kind of work as the freelance taxonomist or may take on more responsibility in the taxonomy design strategy. The contractual relationship may be different, though, and instead of being treated as a freelance vendor the contract taxonomist may be treated as a temporary employee on a W-2 tax status.

When a company needs a taxonomist, it is often for a temporary assignment, so instead of posting a full-time position on the careers portion of their website, they turn to a staffing/recruiting firm for help. While it could be a general staffing firm they use for other assignments, experience has shown that finding taxonomists is difficult, so companies turn to specialized staffing firms in the areas of information technology/science. Sometimes a company has a large information technology project for which taxonomy development is only a piece, and they contract the entire project out to a large IT consulting firm. The IT consulting firm then seeks to fill the taxonomist slot by turning to a third-party recruiter. Recruiters from the staffing firms then search LinkedIn or Monster.com resumes or other sites. So, if you looking for taxonomy work, make sure you have a strong LinkedIn profile and a resume on Monster.com open for all to see, with “taxonomist” prominently in the title. It’s also important to get on the list of staffing firms/recruiters specialized in library science and information technology.

The contract taxonomist is generally expected to be more on the client site than the freelance taxonomist, but with some negotiating, part of the work could probably be done from the home. If the assignment is relatively short and the location does not have any qualified taxonomists, the client will pay for travel and lodging , sometimes for several weeks or even a couple months. So, being open to short-term (1-3 month) relocation can be an advantage.

The nature of the work can be the same as for a freelance taxonomist, or it could involve more taxonomy design, planning and strategy, similar to that of a consultant. The rate is comparable to freelancing, as there is an intermediate party in both cases, and rates vary based on one’s experience and level of responsibility. A third level of intermediary could result in a lower rate. On the other hand, difficulty in finding a taxonomist for a specific project is a specific location allows the contracting taxonomist room to negotiate.

People suited for this kind of taxonomy work need to have prior taxonomy experience, but often experience with a specific software tool is also desired, whether a taxonomy management system, auto-classification system, content management system, or digital assent management system. Location in a major metropolitan area or willingness to travel is also important.

Independent consulting work

Being an independent taxonomy consultant means not only do you need to know how to conduct every step of taxonomy development yourself (research and requirements gathering, design, developing, testing, governance planning, etc.), but you also need to keep track of deadlines, deliverables, meeting schedules, and other basic project management tasks. There is no need for major project management skills, as long as you are not subcontracting to others. The client may already have a project manager on staff if taxonomy development is part of a larger project.

The other major task in consulting is creating a proposal, involving estimating the costs and time requirements, and then meeting those expectations. The proposal-writing task is often an obstacle to new aspiring consultants, and the best preparation is to either work in a consultancy or subcontract extensively to other consultants first to get exposed to the proposal requirements. Fees are typically charged per project and not per hour, so this can be tricky.

Obtaining independent consulting work involves a lot of self-marketing: a web site, a blog, LinkedIn and other social media, speaking at industry and professional association conferences, publishing articles, and general networking. Even networking with competing consultants is good, because sometimes they hear of projects they do not want and will refer the work. It’s also good for you to refer work to other consultants when it comes at the wrong time or is in the wrong location, so they might return the favor.

Saturday, July 12, 2014

A Professional Association for Taxonomists



I recently attended the SLA annual conference, which this year was in Vancouver, BC, June 8 – 11. This year marked the 5th anniversary of the professional association’s Taxonomy Division, its newest and fastest growing special interest group. The Taxonomy Division plans the programming of all taxonomy-related sessions for the conference, enough sessions so that attendees interested in only taxonomies can find a session of interest for most of the programmed time slots.

The Taxonomy Division comes closest to a professional organization for taxonomists and provides a good networking opportunity. The founding of this Taxonomy Division five years ago was the reason that I joined SLA, since I am not a librarian. (I was an accidental taxonomist after all.) SLA stands for “Special Libraries Association” but the organization now favors the acronym over the name that it once stood for, and members are increasingly referred to as “information professional” or “info-pros” instead of librarians. This label better fits taxonomists. In addition to the annual conference programs, the Taxonomy Division also has bi-monthly webinars, a mentoring program, and other resources for its members.

A selection of half-day pre-conference workshops, called “continuing education” sessions, are an important part of the SLA annual conference, and this year two of the five such workshops were on taxonomy topics (“Introduction to Taxonomies” and “Taxonomy Integration: Content Management, Navigation and Search”) and were organized by the Taxonomy Division, despite the fact that SLA has 25 Divisions. Regular session topics included taxonomies and metadata, eDiscovery, semantics, SharePoint, and from-scratch taxonomy creation (my presentation).

Not only does the Taxonomy Division organize taxonomy-related conference sessions, but it also organizes networking events at the annual conference, including an informal no-host dinner and a more formal networking event that is part of the conference program. Both division members and anyone else interested in taxonomies are welcome to attend these events. There is typically a mixture of experienced taxonomists, who likely already know each other from previous conferences, and those new to taxonomies and would like to learn more.

The SLA conference is a great place for taxonomists to network and learn from each other, but it is not necessarily the place to hear the latest trends in taxonomies. “Current Topics in Taxonomies” was the title of an informal roundtable session, but its discussions were more about sharing experiences. At the four roundtables, with on average seven people per table, some of the discussions involved experienced taxonomists giving advice to the less experienced for specific taxonomy implementation issues. The latest topics or trends are not necessarily the subject of regular sessions either, since the program is planned close to a year in advance. On the other hand, the field of taxonomies is not one that changes that much year to year. It is rather business and technology trends that change.

If you are new to taxonomies, then the SLA conference is a great place to learn a lot, through both the various sessions and pre-conference continuing education workshops. If you are an experienced taxonomist then SLA is a great way to network with other taxonomists and get inspired to speak at future conferences. I am looking forward to speaking at SLA in Boston in June 13-16, 2015. See you there, in my home city!

Tuesday, May 20, 2014

Creating Taxonomies from Scratch

When I first got into taxonomy work, my impression was that the trend was increasingly to revise, redesign, merge, and update existing taxonomies and less for creating new taxonomies. As taxonomies became more common in large organizations, it seemed obvious that there would be less original taxonomy creation needs and more taxonomy improvement needs.  Taxonomies need to be updated when content changes, terminology changes, users change, indexing methods change,  content/document management systems change, etc. Older taxonomies may also need to be repurposed, merged, or mapped. While there is no shortage of work on existing taxonomies, to my pleasant surprise I have found recently that there are many projects for new taxonomies as well.

Who needs taxonomies from scratch

In the field of taxonomy consulting, different taxonomy projects go to different consultancies. Large organizations with large taxonomy projects tend to hire taxonomy consultancies with multiple consultants to handle their projects, and it is the large organizations that by now tend to already have some taxonomies, even if they need a lot of work. Smaller organizations tend to hire independent consultant-contractors, and smaller organizations more likely are new to taxonomies and need to have one built from scratch.  When I started out consulting, I was employed or subcontracted to consultancies that served larger clients and worked more on taxonomy redesign projects, but then when I became an independent consultant I was contacted by and often served smaller clients, including startups, and thus became involved with more projects to build original taxonomies.

The types of projects that start-ups have for taxonomies are really quite interesting and they reflect a trend in innovative content-based products and services. In the past couple of years I have been contacted about creating taxonomies (some of which I did) for the following:
  • A subscription, web-based software with taxonomy for photographers to tag and classify their own images
  • A web-based market place for craftspeople and customers to meet to buy/sell customized objects
  • A website of quotes by famous and not-so-famous women with related content
  • A web database of yoga poses associated with a yoga studio
  • A web service of sites for artists to promote themselves
  • A loyalty marketing and data software platform for retailers
  • A mobile app that pulls content from LinkedIn to help professionals and job seekers make connections and obtain career advice
Yet it may not even be the size of the organization seeking taxonomies that has an impact in the demand for new taxonomies from scratch. It could also be that taxonomies are becoming better known across all industries, not just the fields of publishing, information services, and ecommerce. There is also no doubt that the growing amount of content in all areas necessitates better methods of organization and retrieval.

How taxonomies are built from scratch

Even taxonomists with considerable experience in editing taxonomies might not know where to begin if they were to create a taxonomy from scratch. There is some uncertainty over whether to take a predominantly top-down or bottom-up approach.  I recommend a hybrid approach, with some initial top-level development, but most of the work on the specific taxonomy terms built from the bottom. If a navigational tree hierarchy is to be displayed to the users, then at least some initial top-down development is needed.

Developing the top terms (or facets, as the case may be) is based on best practices, understanding the users, adapting to any user interface constraints, and general experience as a taxonomist. Developing all the detailed terms within the taxonomy from below, however, is quite a different task that requires different taxonomist skills. Despite the fact that a spreadsheet, such as Excel, is inappropriate for managing taxonomies, I have found that even with taxonomy management software available, Excel is the most usable tool for the initial stage for gathering candidate terms along with information about their sources and/or for comparing terms side-by-side from multiple sources and at the same time putting them into a hierarchy. Finally, if a taxonomy is somewhat specialized and technical in nature and to be used by subject matter experts, it’s also possible to let the subject matter experts propose their own taxonomy and then review it with them and heavily revise it to bring it up to standards.

I will discuss this in more detail in my presentation, “Taxonomies: Everything you Need to Know to Start a Taxonomy from Scratch,” at the SLA conference in Vancouver, BC, on June 8.

Friday, April 11, 2014

Taxonomy Software Directories

It's difficult to find a list of taxonomy management software that is both comprehensive and up to date, yet not overwhelmed with related products and services. I define taxonomy management software as a tool to manually build and edit taxonomies, controlled vocabularies, and thesauri in accordance with industry standards. It should be the primary tool used by those who work as taxonomists. Lists of  “taxonomy software,” however, may include more than just tools for taxonomy management, such as auto-classification/auto-categorization/auto-indexing software, search software that utilizes taxonomies, or mind-mapping and other graphical categorization tools, etc.

Taxonomy maintenance, unfortunately, is just too small of a niche area for the major evaluators of software, whether consultancies, industry research firms, or trade publications, to find it worth their time to study. Companies that research the information technology market, such as Forrester Research, Gartner, International Data Corporation (IDC), and Real Story Group, won't get the commercial payoff from preparing studies of the taxonomy management software industry and products.

At the time I wrote my book, the most comprehensive directory of taxonomy software I found and refer my readers to was that of the British consultant Leonard Will, on the website of his consulting business Willpower Information, which lists 38 software packages, both commercial and freeware. Leonard Will had contacted each vendor and thus provided descriptive and contact information for each tool. The fact that this was a directory of "thesaurus" software and not “taxonomy” software is not an issue, and it was probably a good thing to include only software that meets thesaurus expectations. This directory was very comprehensive, including lesser-known free and open source software, which over time tended to become unsupported or even unavailable. With an interest in posterity, Leonard Will kept the unavailable software listed in his directory merely with a note to that effect. This may have been interesting for anyone thinking of developing their own thesaurus software, as they may be able to track down these other developers. For someone looking for a good commercial solution, however, there are far too many outdated products to weed through.

After Leonard Will retired, he decided he did not want to spend the time maintaining his directory, which he last updated in 2007, and in 2011 he offered the content of his directory to someone else, specifically contacting both Margie Hlava of Access Innovations and myself. Then Margie and I had to figure out which one of us would take it, fully aware that the rich content on a website would help our own respective business websites, yet it would also take quite a bit of time and effort to set up and maintain. After a year of hoping to find time, I finally relented that I would not and told Margie she could take it. The successor to the Willpower Thesaurus software directory, maintained by Margie’s employee Eric Ziecker, now resides at http://www.taxobank.org/content/thesauri-and-vocabulary-control-thesaurus-software

The core of TaxoBank's directory “Software for building and editing thesauri” at present is still essentially the same as the Willpower site, maintaining the original tabular content, style, colors, etc of that site, so visitors to the TaxoBank site may recognize it from Willpower. Posterity still seems to be valued, as all but one of the same 38 software packages are still there, although in two cases there is a note saying “The particular software referenced above is no longer available.” The notes section for many packages has been updated with additional content extracted from the vendor websites. More updating is still pending, though, as operating systems listed are dated, such as “Windows 95/98/NT/2000/XP.”

The main difference from the original Willpower site is the addition of 63 other products in a new section, separated by the note “Additional indexing, taxonomy, controlled vocabulary, thesaurus, classification, mapping and ontology software and services not referenced in Leonard Will's original listing follows below.” These additional products include many products not specific to “building and editing thesauri,” such as Apache Lucene, EMC Documentum, Oracle Endeca, Google site search, HP Autonomy, IBM Infosphere, and Microsoft SharePoint, along with one taxonomy consulting service. In my opinion, it might be better to have the related products and services on a separate web page to avoid possible confusion and to keep the list to a manageable length, as the total web page is currently 145 printed pages long. Despite these issues, I praise Margie and Eric for taking efforts to maintain this valuable resource.

As for a shorter list focused on current commercial software dedicated to supporting the manual creation and editing of thesauri and taxonomies, that may have to wait until the next edition (not yet started) of my book. For now, there are the products, as of early 2010, listed in Chapter 5 of  The Accidental Taxonomist book website links page. To this list, I would now add at least PoolParty and TopBraid Enterprise Vocabulary Net, both introduced since the book went to press. Meanwhile, taxonomy consultants still remain a valuable source of advice on taxonomy/thesaurus management software.