Monday, August 25, 2014

Independent Taxonomy Work


Are you an aspiring taxonomist looking for work?  Because taxonomies tend to be project-based tasks, a lot of taxonomy work is freelance, contract, or consulting. I have written on this topic  in my book, but that was over four years ago, and I have seen or experienced many taxonomy jobs since, so it’s time for an update.

Freelance taxonomy work

The freelance taxonomist works on portions of a taxonomy project but does not do everything required of taxonomy project. I now see that the greatest opportunity for freelance taxonomy work is to freelance/subcontract to independent taxonomy consultants or small taxonomy consultancies. These consultants take on projects too big for one person and need to subcontract parts of it. Freelancing directly to an end-client, meanwhile, has become increasingly rare.

The best way, and really the only way, to find this kind of work is through serious networking with taxonomy consultants. Make sure, however, that any freelance contract does not preclude you from serving competing consultants.

The freelance taxonomist does most of the work remotely from home, but could be called on to visit a client site, depending on the nature of the project. Being open to travel and having client relationship skills thus helps, but is not always a requirement. Work could be researching and creating a new taxonomy from scratch, editing an existing taxonomy, mapping two different taxonomies, or developing auto-categorization rules for a taxonomy. In any case, the freelancer is not the sole person responsible for the taxonomy.

People suited for this kind of taxonomy work tend to be those with at least one past employment in taxonomy, metadata, or classification work and already comfortably set up as a freelancer, such as through editorial work, indexing, or consulting in the information field. Basic office software is usually all that is needed, but prior experience with a taxonomy management tool is helpful, and, if needed, remote access to the system can be arranged

Contract taxonomy work

The contract taxonomist may do the same kind of work as the freelance taxonomist or may take on more responsibility in the taxonomy design strategy. The contractual relationship may be different, though, and instead of being treated as a freelance vendor the contract taxonomist may be treated as a temporary employee on a W-2 tax status.

When a company needs a taxonomist, it is often for a temporary assignment, so instead of posting a full-time position on the careers portion of their website, they turn to a staffing/recruiting firm for help. While it could be a general staffing firm they use for other assignments, experience has shown that finding taxonomists is difficult, so companies turn to specialized staffing firms in the areas of information technology/science. Sometimes a company has a large information technology project for which taxonomy development is only a piece, and they contract the entire project out to a large IT consulting firm. The IT consulting firm then seeks to fill the taxonomist slot by turning to a third-party recruiter. Recruiters from the staffing firms then search LinkedIn or Monster.com resumes or other sites. So, if you looking for taxonomy work, make sure you have a strong LinkedIn profile and a resume on Monster.com open for all to see, with “taxonomist” prominently in the title. It’s also important to get on the list of staffing firms/recruiters specialized in library science and information technology.

The contract taxonomist is generally expected to be more on the client site than the freelance taxonomist, but with some negotiating, part of the work could probably be done from the home. If the assignment is relatively short and the location does not have any qualified taxonomists, the client will pay for travel and lodging , sometimes for several weeks or even a couple months. So, being open to short-term (1-3 month) relocation can be an advantage.

The nature of the work can be the same as for a freelance taxonomist, or it could involve more taxonomy design, planning and strategy, similar to that of a consultant. The rate is comparable to freelancing, as there is an intermediate party in both cases, and rates vary based on one’s experience and level of responsibility. A third level of intermediary could result in a lower rate. On the other hand, difficulty in finding a taxonomist for a specific project is a specific location allows the contracting taxonomist room to negotiate.

People suited for this kind of taxonomy work need to have prior taxonomy experience, but often experience with a specific software tool is also desired, whether a taxonomy management system, auto-classification system, content management system, or digital assent management system. Location in a major metropolitan area or willingness to travel is also important.

Independent consulting work

Being an independent taxonomy consultant means not only do you need to know how to conduct every step of taxonomy development yourself (research and requirements gathering, design, developing, testing, governance planning, etc.), but you also need to keep track of deadlines, deliverables, meeting schedules, and other basic project management tasks. There is no need for major project management skills, as long as you are not subcontracting to others. The client may already have a project manager on staff if taxonomy development is part of a larger project.

The other major task in consulting is creating a proposal, involving estimating the costs and time requirements, and then meeting those expectations. The proposal-writing task is often an obstacle to new aspiring consultants, and the best preparation is to either work in a consultancy or subcontract extensively to other consultants first to get exposed to the proposal requirements. Fees are typically charged per project and not per hour, so this can be tricky.

Obtaining independent consulting work involves a lot of self-marketing: a web site, a blog, LinkedIn and other social media, speaking at industry and professional association conferences, publishing articles, and general networking. Even networking with competing consultants is good, because sometimes they hear of projects they do not want and will refer the work. It’s also good for you to refer work to other consultants when it comes at the wrong time or is in the wrong location, so they might return the favor.

Saturday, July 12, 2014

A Professional Association for Taxonomists



I recently attended the SLA annual conference, which this year was in Vancouver, BC, June 8 – 11. This year marked the 5th anniversary of the professional association’s Taxonomy Division, its newest and fastest growing special interest group. The Taxonomy Division plans the programming of all taxonomy-related sessions for the conference, enough sessions so that attendees interested in only taxonomies can find a session of interest for most of the programmed time slots.

The Taxonomy Division comes closest to a professional organization for taxonomists and provides a good networking opportunity. The founding of this Taxonomy Division five years ago was the reason that I joined SLA, since I am not a librarian. (I was an accidental taxonomist after all.) SLA stands for “Special Libraries Association” but the organization now favors the acronym over the name that it once stood for, and members are increasingly referred to as “information professional” or “info-pros” instead of librarians. This label better fits taxonomists. In addition to the annual conference programs, the Taxonomy Division also has bi-monthly webinars, a mentoring program, and other resources for its members.

A selection of half-day pre-conference workshops, called “continuing education” sessions, are an important part of the SLA annual conference, and this year two of the five such workshops were on taxonomy topics (“Introduction to Taxonomies” and “Taxonomy Integration: Content Management, Navigation and Search”) and were organized by the Taxonomy Division, despite the fact that SLA has 25 Divisions. Regular session topics included taxonomies and metadata, eDiscovery, semantics, SharePoint, and from-scratch taxonomy creation (my presentation).

Not only does the Taxonomy Division organize taxonomy-related conference sessions, but it also organizes networking events at the annual conference, including an informal no-host dinner and a more formal networking event that is part of the conference program. Both division members and anyone else interested in taxonomies are welcome to attend these events. There is typically a mixture of experienced taxonomists, who likely already know each other from previous conferences, and those new to taxonomies and would like to learn more.

The SLA conference is a great place for taxonomists to network and learn from each other, but it is not necessarily the place to hear the latest trends in taxonomies. “Current Topics in Taxonomies” was the title of an informal roundtable session, but its discussions were more about sharing experiences. At the four roundtables, with on average seven people per table, some of the discussions involved experienced taxonomists giving advice to the less experienced for specific taxonomy implementation issues. The latest topics or trends are not necessarily the subject of regular sessions either, since the program is planned close to a year in advance. On the other hand, the field of taxonomies is not one that changes that much year to year. It is rather business and technology trends that change.

If you are new to taxonomies, then the SLA conference is a great place to learn a lot, through both the various sessions and pre-conference continuing education workshops. If you are an experienced taxonomist then SLA is a great way to network with other taxonomists and get inspired to speak at future conferences. I am looking forward to speaking at SLA in Boston in June 13-16, 2015. See you there, in my home city!

Tuesday, May 20, 2014

Creating Taxonomies from Scratch

When I first got into taxonomy work, my impression was that the trend was increasingly to revise, redesign, merge, and update existing taxonomies and less for creating new taxonomies. As taxonomies became more common in large organizations, it seemed obvious that there would be less original taxonomy creation needs and more taxonomy improvement needs.  Taxonomies need to be updated when content changes, terminology changes, users change, indexing methods change,  content/document management systems change, etc. Older taxonomies may also need to be repurposed, merged, or mapped. While there is no shortage of work on existing taxonomies, to my pleasant surprise I have found recently that there are many projects for new taxonomies as well.

Who needs taxonomies from scratch

In the field of taxonomy consulting, different taxonomy projects go to different consultancies. Large organizations with large taxonomy projects tend to hire taxonomy consultancies with multiple consultants to handle their projects, and it is the large organizations that by now tend to already have some taxonomies, even if they need a lot of work. Smaller organizations tend to hire independent consultant-contractors, and smaller organizations more likely are new to taxonomies and need to have one built from scratch.  When I started out consulting, I was employed or subcontracted to consultancies that served larger clients and worked more on taxonomy redesign projects, but then when I became an independent consultant I was contacted by and often served smaller clients, including startups, and thus became involved with more projects to build original taxonomies.

The types of projects that start-ups have for taxonomies are really quite interesting and they reflect a trend in innovative content-based products and services. In the past couple of years I have been contacted about creating taxonomies (some of which I did) for the following:
  • A subscription, web-based software with taxonomy for photographers to tag and classify their own images
  • A web-based market place for craftspeople and customers to meet to buy/sell customized objects
  • A website of quotes by famous and not-so-famous women with related content
  • A web database of yoga poses associated with a yoga studio
  • A web service of sites for artists to promote themselves
  • A loyalty marketing and data software platform for retailers
  • A mobile app that pulls content from LinkedIn to help professionals and job seekers make connections and obtain career advice
Yet it may not even be the size of the organization seeking taxonomies that has an impact in the demand for new taxonomies from scratch. It could also be that taxonomies are becoming better known across all industries, not just the fields of publishing, information services, and ecommerce. There is also no doubt that the growing amount of content in all areas necessitates better methods of organization and retrieval.

How taxonomies are built from scratch

Even taxonomists with considerable experience in editing taxonomies might not know where to begin if they were to create a taxonomy from scratch. There is some uncertainty over whether to take a predominantly top-down or bottom-up approach.  I recommend a hybrid approach, with some initial top-level development, but most of the work on the specific taxonomy terms built from the bottom. If a navigational tree hierarchy is to be displayed to the users, then at least some initial top-down development is needed.

Developing the top terms (or facets, as the case may be) is based on best practices, understanding the users, adapting to any user interface constraints, and general experience as a taxonomist. Developing all the detailed terms within the taxonomy from below, however, is quite a different task that requires different taxonomist skills. Despite the fact that a spreadsheet, such as Excel, is inappropriate for managing taxonomies, I have found that even with taxonomy management software available, Excel is the most usable tool for the initial stage for gathering candidate terms along with information about their sources and/or for comparing terms side-by-side from multiple sources and at the same time putting them into a hierarchy. Finally, if a taxonomy is somewhat specialized and technical in nature and to be used by subject matter experts, it’s also possible to let the subject matter experts propose their own taxonomy and then review it with them and heavily revise it to bring it up to standards.

I will discuss this in more detail in my presentation, “Taxonomies: Everything you Need to Know to Start a Taxonomy from Scratch,” at the SLA conference in Vancouver, BC, on June 8.

Friday, April 11, 2014

Taxonomy Software Directories

It's difficult to find a list of taxonomy management software that is both comprehensive and up to date, yet not overwhelmed with related products and services. I define taxonomy management software as a tool to manually build and edit taxonomies, controlled vocabularies, and thesauri in accordance with industry standards. It should be the primary tool used by those who work as taxonomists. Lists of  “taxonomy software,” however, may include more than just tools for taxonomy management, such as auto-classification/auto-categorization/auto-indexing software, search software that utilizes taxonomies, or mind-mapping and other graphical categorization tools, etc.

Taxonomy maintenance, unfortunately, is just too small of a niche area for the major evaluators of software, whether consultancies, industry research firms, or trade publications, to find it worth their time to study. Companies that research the information technology market, such as Forrester Research, Gartner, International Data Corporation (IDC), and Real Story Group, won't get the commercial payoff from preparing studies of the taxonomy management software industry and products.

At the time I wrote my book, the most comprehensive directory of taxonomy software I found and refer my readers to was that of the British consultant Leonard Will, on the website of his consulting business Willpower Information, which lists 38 software packages, both commercial and freeware. Leonard Will had contacted each vendor and thus provided descriptive and contact information for each tool. The fact that this was a directory of "thesaurus" software and not “taxonomy” software is not an issue, and it was probably a good thing to include only software that meets thesaurus expectations. This directory was very comprehensive, including lesser-known free and open source software, which over time tended to become unsupported or even unavailable. With an interest in posterity, Leonard Will kept the unavailable software listed in his directory merely with a note to that effect. This may have been interesting for anyone thinking of developing their own thesaurus software, as they may be able to track down these other developers. For someone looking for a good commercial solution, however, there are far too many outdated products to weed through.

After Leonard Will retired, he decided he did not want to spend the time maintaining his directory, which he last updated in 2007, and in 2011 he offered the content of his directory to someone else, specifically contacting both Margie Hlava of Access Innovations and myself. Then Margie and I had to figure out which one of us would take it, fully aware that the rich content on a website would help our own respective business websites, yet it would also take quite a bit of time and effort to set up and maintain. After a year of hoping to find time, I finally relented that I would not and told Margie she could take it. The successor to the Willpower Thesaurus software directory, maintained by Margie’s employee Eric Ziecker, now resides at http://www.taxobank.org/content/thesauri-and-vocabulary-control-thesaurus-software

The core of TaxoBank's directory “Software for building and editing thesauri” at present is still essentially the same as the Willpower site, maintaining the original tabular content, style, colors, etc of that site, so visitors to the TaxoBank site may recognize it from Willpower. Posterity still seems to be valued, as all but one of the same 38 software packages are still there, although in two cases there is a note saying “The particular software referenced above is no longer available.” The notes section for many packages has been updated with additional content extracted from the vendor websites. More updating is still pending, though, as operating systems listed are dated, such as “Windows 95/98/NT/2000/XP.”

The main difference from the original Willpower site is the addition of 63 other products in a new section, separated by the note “Additional indexing, taxonomy, controlled vocabulary, thesaurus, classification, mapping and ontology software and services not referenced in Leonard Will's original listing follows below.” These additional products include many products not specific to “building and editing thesauri,” such as Apache Lucene, EMC Documentum, Oracle Endeca, Google site search, HP Autonomy, IBM Infosphere, and Microsoft SharePoint, along with one taxonomy consulting service. In my opinion, it might be better to have the related products and services on a separate web page to avoid possible confusion and to keep the list to a manageable length, as the total web page is currently 145 printed pages long. Despite these issues, I praise Margie and Eric for taking efforts to maintain this valuable resource.

As for a shorter list focused on current commercial software dedicated to supporting the manual creation and editing of thesauri and taxonomies, that may have to wait until the next edition (not yet started) of my book. For now, there are the products, as of early 2010, listed in Chapter 5 of  The Accidental Taxonomist book website links page. To this list, I would now add at least PoolParty and TopBraid Enterprise Vocabulary Net, both introduced since the book went to press. Meanwhile, taxonomy consultants still remain a valuable source of advice on taxonomy/thesaurus management software.

Saturday, March 15, 2014

Indexing vs. Thesaurus Creation


The activities of back-of-the-book indexing, document/digital asset indexing, and thesaurus/taxonomy creation all require similar skills, but each has its own unique requirements. Indeed a typical career path toward an accidental taxonomist is to first work as an indexer. You might think that the two kinds of indexing are similar to each other and thesaurus creation differs more, but having done all three, I can attest that back-of-the-book indexing and thesaurus/taxonomy creation are more similar to each other than the two kinds of indexing are.

What is indexing

In my previous blog post “Tagging vs. Indexing,” I explain that indexing involves designating descriptive terms or labels for what some content is about, and that these terms are organized into a browsable index.  There are two kinds of indexing:

  1. “Closed indexing,” or back-of-the-book indexing, where the index is created based solely on concepts that the indexer identifies within the text of a single monograph. The index is created for that one monograph and then is finished ("closed").
  2. “Open indexing”, or what has been called “database indexing,” for the indexing of articles, documents, content items, or digital assets, whereby the indexer pulls index terms from a controlled vocabulary or thesaurus and assigns them to multiple individual documents or digital assets. The set of content grows over time, and the same terms in the index will point to increasingly more documents over time. It is called “open” indexing, because the task is ongoing. The thesaurus helps ensure consistent indexing over time.

Both kinds of indexing require the skill of analyzing content to determine what concepts are important and deserve indexing. The biggest difference between back-of-the-book indexing and database indexing is that book indexing requires that the indexer additionally invent the index terms and not merely pull them off of a thesaurus.

What is a thesaurus

I use the designation thesaurus here, because I mean the type of taxonomy that features the full set of relationship types between its terms, with each term designating an unambiguous concept (noun or noun phrase). The relationship types are:
  • Hierarchical (broader term/narrower term)
  • Equivalence (use/used from “nonpreferred terms” or “synonyms”)
  • Associative (related terms)
To best support manual indexing, the existence of all these different kinds of relationships help direct the indexers to the most appropriate terms to describe the content they are indexing. The same thesaurus, or parts of it, may be displayed to the end-users to help guide them to find the most appropriate terms to describe the idea about which they are searching for information. The thesaurus thus not only standardizes the language for the concepts, but also provides a guiding structure.|

How they are related

Open/database indexing and thesaurus creation are obviously related, because the thesaurus is used to support this kind of indexing. In an organization which is involved in such indexing, it is not unusual for former indexers to become editors of the thesaurus, since they are already very familiar with it and understand the needs of the indexer-users.

Closed/book indexing and thesaurus creation are related, because they both involve the development of original terms and relationships between them.

Thesaurus and book index similarities and differences

Thesauri and back-of-the-book indexes both have what can be called multiple points of entry. In a book index these can be either See cross-references or “double-posts," whereby additional variant terms or synonyms are included in the index, and they all point to the same set of page numbers. In a thesaurus, this is the equivalence relationships, where nonpreferred terms or synonyms point to the preferred terms (Use/UF). The difference is that a thesaurus distinguishes between the preferred and nonpreferred terms, whereby double-posts in a book index are all of equal standing and none is ”preferred.”

Thesauri and back-of-the-book indexes both have hierarchical structure among their terms. In a thesaurus there are narrower terms to a broader term (BT/NT). In an index, there are subentries indented under a main entry. However, these hierarchies are not identical. In a thesaurus, narrower terms must be generic types, instances or integral parts of the broader term. In a book index, subentries are any aspect of the main entry or merely another concept in combination. In fact, an indexer may choose to switch the main entry and subentry (the subentry becoming a main entry and the main entry becoming its subentry) with no problems. Don’t try to do that in a thesaurus or taxonomy!

Finally, thesauri and back-of-the-book indexes both have indications of related concepts. Thesauri have the associative relationship called Related Term (RT), and book indexes have See also cross-references. While in general these function the same, the rules for thesauri are stricter. If the “related” terms are really hierarchical, then they must have the hierarchical relationship instead. In a book index, it is acceptable to have a See also between two terms where one is actually broader in meaning to the other.

I will be giving a presentation on this in greater detail at the annual conference of the American Society for Indexing, on April 30, 2015, in Seattle, WA.