The Accidental Taxonomist: 2014

Monday, December 8, 2014

Taxonomy Courses

Note: Since this blog post was written, Simmons College discontinued its continuing education program. I am now offering the 5-week online workshop, previously offered through Simmons, independently through Hedden Information Management. Information is at: http://www.hedden-information.com/courses-workshops/taxonomy-course/

__________________________________________________________________________

I have been teaching workshops on how to create taxonomies for over seven years. Coming up in the winter and spring of 2015 I am offering more kinds of workshops and learning options than ever before. I had offered customized corporate onsite workshops in the past, but since I don’t have the time for that any more, it makes sense to accept opportunities to offer general training in taxonomies. I don’t intend to offer directly competing course offerings, so this blog post aims to outline the differences between these various taxonomy courses.

The differences are primarily in learning approach (online or in-person, synchronous or asynchronous), the depth of instruction, cost, and convenience. The audience focus of each is not substantially different. The level in most cases is primarily “advanced beginner.” Prior exposure or use of taxonomies, general training in library/information science, and/or work in related fields such as information architecture, metadata, records management, indexing, content management, digital asset management, etc., is highly beneficial. No such background, however, does not preclude participation, but may make it a little more challenging. Prior experience in creating or editing taxonomies, on the other hand, does not necessarily make you too advance for the classes, as your experience may be limited to just one kind of taxonomy. The only difference is the SLA full-day course, which is aimed more directly at beginners. The workshops are also suitable for both practitioners and managers.

Simmons College School of Library and Information Science continuing education workshop

5-week online workshop with next available session in March 2015, and likely another two or three times later in the year. Description, Registration
Benefits:
- Individual feedback on submitted assignments.
- Simmons College certificate and record of completion
- Access to free trial of taxonomy management software which you could not get on your own (in - additional to others, which you could get a 30-day trial on your own)
- Opportunity to email questions and get answers
- Greater learning opportunity through assignments and feedback and more material to read
Disadvantages:
- Limited space, usually filling up a month or more in advance (January 2015 session filled)
- A greater total time commitment and over a specific period of time
- Inability to easily save formatted lessons. While you can copy lesson content for your own purposes, the Moodle platform does not offer an easy way to save lessons in the original formatting.

American Society for Indexing Online Learning: “Practical Taxonomy Creation”

Three weekly one-hour sessions, January 14, 21, and 28, 2015, and/or recordings, Description, Registration
Benefits:
- Live phone Q&A
- Unlimited capacity. Sign up at the last minute and still get in. Or register after the live session for access to the recording.
- Limited time commitment
- Option to attend some live and some as recording, if not all sessions fit in your schedule.
- Access to the presentation and recording for unlimited repeated viewings/listening
Disadvantages:
- No individualized assignment feedback
- Topics that are not core not covered due to limited time
- Less learning time (excluding webinar replays)
- Limited time for questions

American Society for Indexing conference 3-hour workshop: “Topics in Taxonomy Creation”

Either April 30 or May 1 (TBD) in Seattle, WA, Description, Registration
Benefits:
- In-person learning experience
- Personal connection with me the instructor and other participants for better networking
- May interrupt the presentation with questions (unlike the webinar in which you must wait for the Q&A time)
- Live demos of taxonomy management software
- Copy of the slides and handouts to keep
Disadvantages:
- Travel time and costs to Seattle
- Required registration for conference (no separate workshop registration)
- Limited instruction time and content

SLA conference full-day (8 hours) conference continuing education workshop: “Introduction to Taxonomy”

Saturday, June 13, 2013, Boston, MA, Description, Registration
Benefits:
- Appropriate for complete beginners
- In-person learning experience
- Personal connection with me the instructor and other participants for better networking
- May interrupt the presentation with questions (unlike the webinar in which you must wait for the Q&A time)
- Live demos of taxonomy management software
- Copy of the slides and handouts to keep
- Ample live Q&A time
- Discounted student and retired SLA member pricing
Disadvantages:
- Travel time and costs to Boston
- A lot of material to digest if a short period of time

Why learn about taxonomies? It is a key tool/method/component of knowledge management and information management.

Saturday, November 8, 2014

Taxonomy Trends and Future

What are the trends in taxonomies, and where is the field going? The future of taxonomies turned out to be a unifying theme of last week’s annual Taxonomy Boot Camp conference, in Washington, DC, the premier event in taxonomies, from its opening keynote to its closing panel.

“From Cataloguer to Designer” was the title of the opening keynote, an excellent presentation by consultant Patrick Lambe of Straits Knowledge. He said that there are new opportunities for taxonomists, especially in the technology space, if they change their mindset and their role from that of cataloguers, who describe the world as it is, to that of designers, who plan things as they could be. New trends involving taxonomies that he described include search-based applications, autoclassification, and knowledge graphs (such as the automatically curated index card of key information on a topic, as appears in some Google search results).

As this was the 10th annual Taxonomy Boot Camp conference, the final session was “10 Years Back, 10 Years Forward,” a panel of consultants who had presented at the first Taxonomy Boot Camp conference in New York in 2004 (and at most of the conferences since), and who answered questions about how things of have changed and offered comments on various predictions.

The spread of greater understanding of taxonomies was a common theme of that panel. Gary Carlson of the consultancy Factor noted that now taxonomy can be discussed with the executives, whereas in the past only some people in an organization would show an interest in taxonomies. This was echoed by Seth Earley of Earley & Associates, who observed that organizations are beginning to understand that a taxonomy is more than just terms but is also a process: “Organizations are starting to get it.” Tom Reamy of KAPS Group recalled that in his earlier projects he had to help his clients strategize more to figure out how a taxonomy can help, but now they already know about taxonomies and just want to do it. He also pointed out that the early adopters of taxonomies were the large science and financial enterprises, but now smaller companies are also implementing taxonomies.

Looking to the future, the panelists’ shared predictions included greater use of linked data, taxonomy visualization, and text analytics. Joseph Busch of Taxonomy Strategies commented on the “power of re-use,” so that we will spend less time doing taxonomies on standard things, such as geographic places, and “not re-invent universals.” With respect to taxonomy visualization, he observed that it “helps people think.” Regarding text analytics, Tom Reamy, the conference’s biggest champion of the technology, explained that it fills the gap between the taxonomy and what it should do.

Other sessions, such as the panel “The Curious Lives of Full-Time Taxonomists” also addressed the issue of new themes in taxonomies. “Taxonomy is seeping into the culture, as part of the enterprise knowledge of the world, “explained Barbara McGlamery of Pearson. “People are asking for problem solving and not just a taxonomy, as they have more awareness of taxonomy,” observed Sarah Barrett of Factor.

New trends and technologies were discussed in individual presentations, too. Using the agile method for taxonomy development was described in two presentations: the main topic of “Using Agile to Build a Taxonomy/Ontology,” by Evelyn Kent of Smartlogic, and as a feature in “Developing Use Cases Before Developing the Taxonomy,” presented by Vivian Bliss of Taxonomy Strategies. Greater sophistication in sentiment analysis that enables leveraging of taxonomy was a key point in Tom Reamy’s presentation “Taxonomy and Social Media: Social Taxonomies.” Technology was also at the forefront sessions, such as “Taxonomies in Search” comprising four presentations, and “Automated Taxonomy Management,” comprising three presentations.

Finally, the growth in interest in taxonomies was reflected in the conference attendance (around 200). While exact numbers of attendees of Taxonomy Boot Camp cannot be counted, because some attendees have platinum passes allowing them the choice of co-located conference sessions to attend (including KM World and Enterprise Search & Discovery), Tom Hogan, CEO of Information Today Inc., the conference owner, informed me that dedicated Taxonomy Boot Camp registrations had doubled since last year and commented on how it had grown for just a small add-on to KM World, to a significant conference in its own right.

Tuesday, October 28, 2014

Taxonomy: A Profession, Not an Industry

I am looking forward to attending and presenting at my 8th Taxonomy Boot Camp conference next week. What makes this conference special is that it is very much both a professional and a commercial/industry conference, whereas most conferences tend to be one or the other. In other words, it is a commercial conference that serves a profession.

A professional conference is one that is usually organized by a nonprofit professional organization/society for its members for furthering the intellectual exchange in the field and otherwise serving the needs of its members. Professional conferences at which I have presented include those of SLA (Special Libraries Association) and the American Society for Indexing. A commercial conference, on the other hand, is one put on by a company (in publishing, advertising, research, consulting, or pure event management) to bring together clients and vendors in specialty area and promote business for all. Commercial conferences at which I have presented, in addition to those associated with KM World, include the Gilbane conference, Henry Stewart DAM, and Text Analytics World. Professional conferences do have vendor exhibits, too, but more as an aside to help finance the conference, and these exhibits can be very small. Commercial conferences do, of course, have informative and educational sessions, but the conference is organized with the primary purpose of earning a profit from selling exhibit space and registrations.

Commercial conferences are often based on an industry, with industry loosely defined as companies that sell related products or services for a defined market and thus potentially could be exhibiters. This “industry” could be as specialized as knowledge management, content management, or digital asset management. Taxonomy, however, is not an industry.

Taxonomy is a profession and is also an information management tool/technique. Sometimes an industry and a profession are almost the same, such as in medicine and law. Closer to the world of taxonomies are the industry/professions of software development, consulting, and librarianship. Taxonomy work comes closest to the latter, and many taxonomists were originally trained as librarians. So, if libraries are both an industry and a profession, then some might make the assumption that taxonomy is also both an industry and a profession.

To determine if there is an industry associated with a profession, to look at trade show/exhibit vendors at a conference or look for advertisement-supported trade journals. Taxonomy Boot Camp has a mini exhibit of usually half a dozen sponsors, in contrast with the co-located KM World conference showcase of over 30 sponsors. Indeed, commercial software vendors of pure taxonomy management tools (not a feature of a larger solution) can be counted on one hand. Taxonomy-related services, namely those of consultants, are also a significant business, but this cannot be considered its own “industry.” That is because any consulting firm (larger than a sole proprietorship) that consults on taxonomy also consults in other, related areas, such as knowledge management, data management, user experience design, content integration, etc. As for trade journals, there are none dedicated to taxonomy, simply because there are not enough companies that would advertise in this niche space. Libraries, on the other hand, do have lots of vendors, which exhibit at conferences, and there are also library trade journals.

Taxonomists work in all industries. I have worked in full-time permanent positions as a taxonomist in industries including publishing, software, consulting, and renewable energy, and have provided taxonomy consulting services to many more industries: financial services, retail, hospitality, biomedical research, manufacturing, and education. Despite my various industries of my employment, I have always applied the broad “Information services” or “Information technology and services” as my “industry” in my LinkedIn profile. For this reason, trying to analyze the industries used in taxonomist LinkedIn profiles might not be accurate or useful, due to the preference of those two industry designations. Nevertheless, I have found taxonomists in LinkedIn to use the following additional industries:

Libraries
Internet
Publishing
Research
Online Media
Higher Education
Computer Software
Public Relations & Communications
Marketing & Advertising
Management Consulting
Entertainment
Pharmaceuticals
Hospital & Health Care
Oil & energy

Indeed the Taxonomy Boot Camp conference has attendees from all of these varied industries, but all with a shared professional interest in taxonomies. That’s what makes this conference feel more like a professional conference. But unlike a professional conference (such as SLA for librarians, of which I am not, so I always feel like an outsider there) , you don’t have to be a member of an organization or professional taxonomist, just interested in taxonomies as a tool/technique. As such, the conference is both highly educational/informative, yet welcoming and open to all.

Tuesday, September 23, 2014

One or More Taxonomies

In the various definitions of taxonomy, one aspect of the definition that is often missing is what constitutes a single taxonomy (or thesaurus) versus multiple related taxonomies (or thesauri). If you hire a taxonomy consultant, they won’t tell you because they will defer to their client’s terminology. If you are designing a taxonomy/taxonomies for your own organization, however, this is often an issue of concern.

Hierarchies and other relationships

In simple hierarchical taxonomies, a single hierarchy could be a single taxonomy. Not all terms on the same subject, however, may fit neatly in one hierarchy while complying with ANSI/NISO hierarchical relationship guidelines. So, more often than not, a hierarchical taxonomy may have multiple top terms. For example, a taxonomy on health care might have top terms for hierarchies on conditions and diseases, diagnostic procedures, treatments, medical equipment and supplies. If for some reason you needed a single hierarchy, then you would bend the hierarchical-relationship rules to make such top terms narrower to the term that is the name of the taxonomy. Thus, whether there is one top term or multiple top terms, it is still considered one taxonomy.
Facets are a special case. Each facet consists of its own hierarchy of terms, or may even have multiple top-term hierarchies of similar-type terms on the same subject, and there are no relationships between terms in different facets. So, you might consider each facet to be a taxonomy. However, the facets are intended to be used only in combination, not in isolation. In fact, we often speak of a “faceted taxonomy,” implying a single taxonomy comprised of multiple facets. So, a single facet is not a taxonomy.

A more thesaurus-like structure, may have fewer large hierarchies and more smaller hierarchies with more numerous top terms, but it will also have associative relationships that link terms across hierarchies. So, a possible definition of a taxonomy or thesaurus is a set of terms where there is at least some kind of relationship between every term and at least one other in the same set. However, you could end up with a situation of just a couple of terms related to each other but none of them are related/linked to any other terms in the taxonomy. So, additional criteria are needed to define a single taxonomy as to include such terms.

Thus, at a minimum, a taxonomy comprises one or more hierarchies, but what about at a maximum? The question came up in my online course, in an assignment to create polyhierarchies, in which I suggest that the broader terms are from different hierarchies. A student asked: “Are the different hierarchies supposed to be within the same Taxonomy, or merely two different hierarchies from two different Taxonomies?” Generally, standard hierarchical and associative relationships do not transcend multiple taxonomies. An exception would be instance-type hierarchical relationships between topics in a taxonomy and named entities (proper nouns) maintained in a separate controlled vocabulary. Other types of relationships may link terms across multiple taxonomies, but they would likely be special-purpose relationships, such as equivalency mappings or translations.

Subject scope and purpose

In addition to considering the relationships between terms, another determining factor of what constitutes a single taxonomy is the subject area scope. One taxonomy is for one subject area, although that subject area could be very broad, especially if the taxonomy’s purpose is to support indexing of the topics in a daily national newspaper. More often, a taxonomy is more limited in scope, such as just technology topics or health topics.

Related to subject scope is how the taxonomy will be used in both indexing/tagging and retrieval. Generally, a single taxonomy is utilized in a single indexing/tagging method and with its own indexing policy. Policy, comprising both editorial style for terms and indexing rules, is often a defining factor for a single taxonomy. Different taxonomies will have different policies. For the end-user, a retrieval function is served by a single taxonomy, such as supporting a search function or providing a set of browse categories. If you want to enable multiple unrelated methods of retrieval (such as type-ahead for the search box, dynamic filtering facets, and a navigational browse), then you will need to create separate taxonomies for each. At a former employer I built taxonomies for SharePoint, and it turned out that I had to build three completely separate taxonomies: (1) the consistently labeled hierarchy of libraries and folders, (2) terms and their variants to support search with a third-party auto-classification tool, and (3) controlled vocabularies of terms for consistent tagging and metadata management of uploaded documents.

There is also the question of whether the content to be accessed by the taxonomy is together in one set or separated out for different purposes or different audiences. A taxonomy should be designed to suit its own content. This was the case in a current project I am working on. There are two distinct sets of content available on a web site. The content sets have many similarities, so could be browsed via the same one hierarchical taxonomy, but they are for potentially different audiences. If the content set were to remain separate, we would have created two separate taxonomies, each customized to best suit its own set of content. But the site owners decided that the two sets of content would be presented together, “blended,” to cross-sell content, in addition to standing on their own elsewhere on the site. Thus, a single taxonomy was the chosen option. The use of two content categories for terms within the taxonomy will enable the additional, separate content set option.

Conclusions

In sum, a single taxonomy:

Has standard relationships (BT/NT, RT, USE/UF) confined within it. Cross-taxonomy links, if any, are of non-standard types.
Has a defined, restricted subject scope.
Has its own indexing/tagging policy.
Could function in isolation, unlike a single facet (although may be supplemented by other controlled vocabularies/metadata).
Has its own implementation, function, and purpose (although taxonomies can be reused and repurposed).

It’s important for a taxonomist to determine what constitutes a single taxonomy versus multiple taxonomies, not so much for communicating with stakeholders, but rather to plan the initial design of the taxonomy within a taxonomy management tool. Taxonomy/thesaurus software allows for the designation of one or more taxonomies/thesauri that may be linked to each other or not. The use of multiple so-called files, thesauri, vocabularies, objects, classes, categories, etc. are different ways that the various software tools allow the taxonomist to control the divisions between and within taxonomies.

Monday, August 25, 2014

Independent Taxonomy Work

Are you an aspiring taxonomist looking for work? Because taxonomies tend to be project-based tasks, a lot of taxonomy work is freelance, contract, or consulting. I have written on this topic in my book, but that was over four years ago, and I have seen or experienced many taxonomy jobs since, so it’s time for an update.

Freelance taxonomy work

The freelance taxonomist works on portions of a taxonomy project but does not do everything required of taxonomy project. I now see that the greatest opportunity for freelance taxonomy work is to freelance/subcontract to independent taxonomy consultants or small taxonomy consultancies. These consultants take on projects too big for one person and need to subcontract parts of it. Freelancing directly to an end-client, meanwhile, has become increasingly rare.

The best way, and really the only way, to find this kind of work is through serious networking with taxonomy consultants. Make sure, however, that any freelance contract does not preclude you from serving competing consultants.

The freelance taxonomist does most of the work remotely from home, but could be called on to visit a client site, depending on the nature of the project. Being open to travel and having client relationship skills thus helps, but is not always a requirement. Work could be researching and creating a new taxonomy from scratch, editing an existing taxonomy, mapping two different taxonomies, or developing auto-categorization rules for a taxonomy. In any case, the freelancer is not the sole person responsible for the taxonomy.

People suited for this kind of taxonomy work tend to be those with at least one past employment in taxonomy, metadata, or classification work and already comfortably set up as a freelancer, such as through editorial work, indexing, or consulting in the information field. Basic office software is usually all that is needed, but prior experience with a taxonomy management tool is helpful, and, if needed, remote access to the system can be arranged

Contract taxonomy work

The contract taxonomist may do the same kind of work as the freelance taxonomist or may take on more responsibility in the taxonomy design strategy. The contractual relationship may be different, though, and instead of being treated as a freelance vendor the contract taxonomist may be treated as a temporary employee on a W-2 tax status.

When a company needs a taxonomist, it is often for a temporary assignment, so instead of posting a full-time position on the careers portion of their website, they turn to a staffing/recruiting firm for help. While it could be a general staffing firm they use for other assignments, experience has shown that finding taxonomists is difficult, so companies turn to specialized staffing firms in the areas of information technology/science. Sometimes a company has a large information technology project for which taxonomy development is only a piece, and they contract the entire project out to a large IT consulting firm. The IT consulting firm then seeks to fill the taxonomist slot by turning to a third-party recruiter. Recruiters from the staffing firms then search LinkedIn or Monster.com resumes or other sites. So, if you looking for taxonomy work, make sure you have a strong LinkedIn profile and a resume on Monster.com open for all to see, with “taxonomist” prominently in the title. It’s also important to get on the list of staffing firms/recruiters specialized in library science and information technology.

The contract taxonomist is generally expected to be more on the client site than the freelance taxonomist, but with some negotiating, part of the work could probably be done from the home. If the assignment is relatively short and the location does not have any qualified taxonomists, the client will pay for travel and lodging , sometimes for several weeks or even a couple months. So, being open to short-term (1-3 month) relocation can be an advantage.

The nature of the work can be the same as for a freelance taxonomist, or it could involve more taxonomy design, planning and strategy, similar to that of a consultant. The rate is comparable to freelancing, as there is an intermediate party in both cases, and rates vary based on one’s experience and level of responsibility. A third level of intermediary could result in a lower rate. On the other hand, difficulty in finding a taxonomist for a specific project is a specific location allows the contracting taxonomist room to negotiate.

People suited for this kind of taxonomy work need to have prior taxonomy experience, but often experience with a specific software tool is also desired, whether a taxonomy management system, auto-classification system, content management system, or digital assent management system. Location in a major metropolitan area or willingness to travel is also important.

Independent consulting work

Being an independent taxonomy consultant means not only do you need to know how to conduct every step of taxonomy development yourself (research and requirements gathering, design, developing, testing, governance planning, etc.), but you also need to keep track of deadlines, deliverables, meeting schedules, and other basic project management tasks. There is no need for major project management skills, as long as you are not subcontracting to others. The client may already have a project manager on staff if taxonomy development is part of a larger project.

The other major task in consulting is creating a proposal, involving estimating the costs and time requirements, and then meeting those expectations. The proposal-writing task is often an obstacle to new aspiring consultants, and the best preparation is to either work in a consultancy or subcontract extensively to other consultants first to get exposed to the proposal requirements. Fees are typically charged per project and not per hour, so this can be tricky.

Obtaining independent consulting work involves a lot of self-marketing: a web site, a blog, LinkedIn and other social media, speaking at industry and professional association conferences, publishing articles, and general networking. Even networking with competing consultants is good, because sometimes they hear of projects they do not want and will refer the work. It’s also good for you to refer work to other consultants when it comes at the wrong time or is in the wrong location, so they might return the favor.

Saturday, July 12, 2014

A Professional Association for Taxonomists

I recently attended the SLA annual conference, which this year was in Vancouver, BC, June 8 – 11. This year marked the 5^th anniversary of the professional association’s Taxonomy Division, its newest and fastest growing special interest group. The Taxonomy Division plans the programming of all taxonomy-related sessions for the conference, enough sessions so that attendees interested in only taxonomies can find a session of interest for most of the programmed time slots.

The Taxonomy Division comes closest to a professional organization for taxonomists and provides a good networking opportunity. The founding of this Taxonomy Division five years ago was the reason that I joined SLA, since I am not a librarian. (I was an accidental taxonomist after all.) SLA stands for “Special Libraries Association” but the organization now favors the acronym over the name that it once stood for, and members are increasingly referred to as “information professional” or “info-pros” instead of librarians. This label better fits taxonomists. In addition to the annual conference programs, the Taxonomy Division also has bi-monthly webinars, a mentoring program, and other resources for its members.

A selection of half-day pre-conference workshops, called “continuing education” sessions, are an important part of the SLA annual conference, and this year two of the five such workshops were on taxonomy topics (“Introduction to Taxonomies” and “Taxonomy Integration: Content Management, Navigation and Search”) and were organized by the Taxonomy Division, despite the fact that SLA has 25 Divisions. Regular session topics included taxonomies and metadata, eDiscovery, semantics, SharePoint, and from-scratch taxonomy creation (my presentation).

Not only does the Taxonomy Division organize taxonomy-related conference sessions, but it also organizes networking events at the annual conference, including an informal no-host dinner and a more formal networking event that is part of the conference program. Both division members and anyone else interested in taxonomies are welcome to attend these events. There is typically a mixture of experienced taxonomists, who likely already know each other from previous conferences, and those new to taxonomies and would like to learn more.

The SLA conference is a great place for taxonomists to network and learn from each other, but it is not necessarily the place to hear the latest trends in taxonomies. “Current Topics in Taxonomies” was the title of an informal roundtable session, but its discussions were more about sharing experiences. At the four roundtables, with on average seven people per table, some of the discussions involved experienced taxonomists giving advice to the less experienced for specific taxonomy implementation issues. The latest topics or trends are not necessarily the subject of regular sessions either, since the program is planned close to a year in advance. On the other hand, the field of taxonomies is not one that changes that much year to year. It is rather business and technology trends that change.

If you are new to taxonomies, then the SLA conference is a great place to learn a lot, through both the various sessions and pre-conference continuing education workshops. If you are an experienced taxonomist then SLA is a great way to network with other taxonomists and get inspired to speak at future conferences. I am looking forward to speaking at SLA in Boston in June 13-16, 2015. See you there, in my home city!

Tuesday, May 20, 2014

Creating Taxonomies from Scratch

When I first got into taxonomy work, my impression was that the trend was increasingly to revise, redesign, merge, and update existing taxonomies and less for creating new taxonomies. As taxonomies became more common in large organizations, it seemed obvious that there would be less original taxonomy creation needs and more taxonomy improvement needs. Taxonomies need to be updated when content changes, terminology changes, users change, indexing methods change, content/document management systems change, etc. Older taxonomies may also need to be repurposed, merged, or mapped. While there is no shortage of work on existing taxonomies, to my pleasant surprise I have found recently that there are many projects for new taxonomies as well.

Who needs taxonomies from scratch

In the field of taxonomy consulting, different taxonomy projects go to different consultancies. Large organizations with large taxonomy projects tend to hire taxonomy consultancies with multiple consultants to handle their projects, and it is the large organizations that by now tend to already have some taxonomies, even if they need a lot of work. Smaller organizations tend to hire independent consultant-contractors, and smaller organizations more likely are new to taxonomies and need to have one built from scratch. When I started out consulting, I was employed or subcontracted to consultancies that served larger clients and worked more on taxonomy redesign projects, but then when I became an independent consultant I was contacted by and often served smaller clients, including startups, and thus became involved with more projects to build original taxonomies.

The types of projects that start-ups have for taxonomies are really quite interesting and they reflect a trend in innovative content-based products and services. In the past couple of years I have been contacted about creating taxonomies (some of which I did) for the following:

A subscription, web-based software with taxonomy for photographers to tag and classify their own images
A web-based market place for craftspeople and customers to meet to buy/sell customized objects
A website of quotes by famous and not-so-famous women with related content
A web database of yoga poses associated with a yoga studio
A web service of sites for artists to promote themselves
A loyalty marketing and data software platform for retailers
A mobile app that pulls content from LinkedIn to help professionals and job seekers make connections and obtain career advice

Yet it may not even be the size of the organization seeking taxonomies that has an impact in the demand for new taxonomies from scratch. It could also be that taxonomies are becoming better known across all industries, not just the fields of publishing, information services, and ecommerce. There is also no doubt that the growing amount of content in all areas necessitates better methods of organization and retrieval.

How taxonomies are built from scratch

Even taxonomists with considerable experience in editing taxonomies might not know where to begin if they were to create a taxonomy from scratch. There is some uncertainty over whether to take a predominantly top-down or bottom-up approach. I recommend a hybrid approach, with some initial top-level development, but most of the work on the specific taxonomy terms built from the bottom. If a navigational tree hierarchy is to be displayed to the users, then at least some initial top-down development is needed.

Developing the top terms (or facets, as the case may be) is based on best practices, understanding the users, adapting to any user interface constraints, and general experience as a taxonomist. Developing all the detailed terms within the taxonomy from below, however, is quite a different task that requires different taxonomist skills. Despite the fact that a spreadsheet, such as Excel, is inappropriate for managing taxonomies, I have found that even with taxonomy management software available, Excel is the most usable tool for the initial stage for gathering candidate terms along with information about their sources and/or for comparing terms side-by-side from multiple sources and at the same time putting them into a hierarchy. Finally, if a taxonomy is somewhat specialized and technical in nature and to be used by subject matter experts, it’s also possible to let the subject matter experts propose their own taxonomy and then review it with them and heavily revise it to bring it up to standards.

I will discuss this in more detail in my presentation, “Taxonomies: Everything you Need to Know to Start a Taxonomy from Scratch,” at the SLA conference in Vancouver, BC, on June 8.

Friday, April 11, 2014

Taxonomy Software Directories

It's difficult to find a list of taxonomy management software that is both comprehensive and up to date, yet not overwhelmed with related products and services. I define taxonomy management software as a tool to manually build and edit taxonomies, controlled vocabularies, and thesauri in accordance with industry standards. It should be the primary tool used by those who work as taxonomists. Lists of “taxonomy software,” however, may include more than just tools for taxonomy management, such as auto-classification/auto-categorization/auto-indexing software, search software that utilizes taxonomies, or mind-mapping and other graphical categorization tools, etc.

Taxonomy maintenance, unfortunately, is just too small of a niche area for the major evaluators of software, whether consultancies, industry research firms, or trade publications, to find it worth their time to study. Companies that research the information technology market, such as Forrester Research, Gartner, International Data Corporation (IDC), and Real Story Group, won't get the commercial payoff from preparing studies of the taxonomy management software industry and products.

At the time I wrote my book, the most comprehensive directory of taxonomy software I found and refer my readers to was that of the British consultant Leonard Will, on the website of his consulting business Willpower Information, which lists 38 software packages, both commercial and freeware. Leonard Will had contacted each vendor and thus provided descriptive and contact information for each tool. The fact that this was a directory of "thesaurus" software and not “taxonomy” software is not an issue, and it was probably a good thing to include only software that meets thesaurus expectations. This directory was very comprehensive, including lesser-known free and open source software, which over time tended to become unsupported or even unavailable. With an interest in posterity, Leonard Will kept the unavailable software listed in his directory merely with a note to that effect. This may have been interesting for anyone thinking of developing their own thesaurus software, as they may be able to track down these other developers. For someone looking for a good commercial solution, however, there are far too many outdated products to weed through.

After Leonard Will retired, he decided he did not want to spend the time maintaining his directory, which he last updated in 2007, and in 2011 he offered the content of his directory to someone else, specifically contacting both Margie Hlava of Access Innovations and myself. Then Margie and I had to figure out which one of us would take it, fully aware that the rich content on a website would help our own respective business websites, yet it would also take quite a bit of time and effort to set up and maintain. After a year of hoping to find time, I finally relented that I would not and told Margie she could take it. The successor to the Willpower Thesaurus software directory, maintained by Margie’s employee Eric Ziecker, now resides at http://www.taxobank.org/content/thesauri-and-vocabulary-control-thesaurus-software

The core of TaxoBank's directory “Software for building and editing thesauri” at present is still essentially the same as the Willpower site, maintaining the original tabular content, style, colors, etc of that site, so visitors to the TaxoBank site may recognize it from Willpower. Posterity still seems to be valued, as all but one of the same 38 software packages are still there, although in two cases there is a note saying “The particular software referenced above is no longer available.” The notes section for many packages has been updated with additional content extracted from the vendor websites. More updating is still pending, though, as operating systems listed are dated, such as “Windows 95/98/NT/2000/XP.”

The main difference from the original Willpower site is the addition of 63 other products in a new section, separated by the note “Additional indexing, taxonomy, controlled vocabulary, thesaurus, classification, mapping and ontology software and services not referenced in Leonard Will's original listing follows below.” These additional products include many products not specific to “building and editing thesauri,” such as Apache Lucene, EMC Documentum, Oracle Endeca, Google site search, HP Autonomy, IBM Infosphere, and Microsoft SharePoint, along with one taxonomy consulting service. In my opinion, it might be better to have the related products and services on a separate web page to avoid possible confusion and to keep the list to a manageable length, as the total web page is currently 145 printed pages long. Despite these issues, I praise Margie and Eric for taking efforts to maintain this valuable resource.

As for a shorter list focused on current commercial software dedicated to supporting the manual creation and editing of thesauri and taxonomies, that may have to wait until the next edition (not yet started) of my book. For now, there are the products, as of early 2010, listed in Chapter 5 of The Accidental Taxonomist book website links page. To this list, I would now add at least PoolParty and TopBraid Enterprise Vocabulary Net, both introduced since the book went to press. Meanwhile, taxonomy consultants still remain a valuable source of advice on taxonomy/thesaurus management software.

Saturday, March 15, 2014

Indexing vs. Thesaurus Creation

The activities of back-of-the-book indexing, document/digital asset indexing, and thesaurus/taxonomy creation all require similar skills, but each has its own unique requirements. Indeed a typical career path toward an accidental taxonomist is to first work as an indexer. You might think that the two kinds of indexing are similar to each other and thesaurus creation differs more, but having done all three, I can attest that back-of-the-book indexing and thesaurus/taxonomy creation are more similar to each other than the two kinds of indexing are.

What is indexing

In my previous blog post “Tagging vs. Indexing,” I explain that indexing involves designating descriptive terms or labels for what some content is about, and that these terms are organized into a browsable index. There are two kinds of indexing:

“Closed indexing,” or back-of-the-book indexing, where the index is created based solely on concepts that the indexer identifies within the text of a single monograph. The index is created for that one monograph and then is finished ("closed").
“Open indexing”, or what has been called “database indexing,” for the indexing of articles, documents, content items, or digital assets, whereby the indexer pulls index terms from a controlled vocabulary or thesaurus and assigns them to multiple individual documents or digital assets. The set of content grows over time, and the same terms in the index will point to increasingly more documents over time. It is called “open” indexing, because the task is ongoing. The thesaurus helps ensure consistent indexing over time.

Both kinds of indexing require the skill of analyzing content to determine what concepts are important and deserve indexing. The biggest difference between back-of-the-book indexing and database indexing is that book indexing requires that the indexer additionally invent the index terms and not merely pull them off of a thesaurus.

What is a thesaurus

I use the designation thesaurus here, because I mean the type of taxonomy that features the full set of relationship types between its terms, with each term designating an unambiguous concept (noun or noun phrase). The relationship types are:

Hierarchical (broader term/narrower term)
Equivalence (use/used from “nonpreferred terms” or “synonyms”)
Associative (related terms)

To best support manual indexing, the existence of all these different kinds of relationships help direct the indexers to the most appropriate terms to describe the content they are indexing. The same thesaurus, or parts of it, may be displayed to the end-users to help guide them to find the most appropriate terms to describe the idea about which they are searching for information. The thesaurus thus not only standardizes the language for the concepts, but also provides a guiding structure.|

How they are related

Open/database indexing and thesaurus creation are obviously related, because the thesaurus is used to support this kind of indexing. In an organization which is involved in such indexing, it is not unusual for former indexers to become editors of the thesaurus, since they are already very familiar with it and understand the needs of the indexer-users.

Closed/book indexing and thesaurus creation are related, because they both involve the development of original terms and relationships between them.

Thesaurus and book index similarities and differences

Thesauri and back-of-the-book indexes both have what can be called multiple points of entry. In a book index these can be either See cross-references or “double-posts," whereby additional variant terms or synonyms are included in the index, and they all point to the same set of page numbers. In a thesaurus, this is the equivalence relationships, where nonpreferred terms or synonyms point to the preferred terms (Use/UF). The difference is that a thesaurus distinguishes between the preferred and nonpreferred terms, whereby double-posts in a book index are all of equal standing and none is ”preferred.”

Thesauri and back-of-the-book indexes both have hierarchical structure among their terms. In a thesaurus there are narrower terms to a broader term (BT/NT). In an index, there are subentries indented under a main entry. However, these hierarchies are not identical. In a thesaurus, narrower terms must be generic types, instances or integral parts of the broader term. In a book index, subentries are any aspect of the main entry or merely another concept in combination. In fact, an indexer may choose to switch the main entry and subentry (the subentry becoming a main entry and the main entry becoming its subentry) with no problems. Don’t try to do that in a thesaurus or taxonomy!

Finally, thesauri and back-of-the-book indexes both have indications of related concepts. Thesauri have the associative relationship called Related Term (RT), and book indexes have See also cross-references. While in general these function the same, the rules for thesauri are stricter. If the “related” terms are really hierarchical, then they must have the hierarchical relationship instead. In a book index, it is acceptable to have a See also between two terms where one is actually broader in meaning to the other.

I will be giving a presentation on this in greater detail at the annual conference of the American Society for Indexing, on April 30, 2015, in Seattle, WA.

Friday, February 28, 2014

Tagging vs. Indexing

I have blogged before on the difference between tags and categories, but recently someone asked me about the difference between tagging and indexing (the manual kind). It's not a simple answer.

One important way in which tagging and indexing differ is that tagging involves any kind of designation about a piece of content, what it is or what it is about, whereas indexing is restricted to descriptive labels for what content is about. Tagging can include content type, date, creator, source, audience, location, rights, keywords, etc., whereas indexing is for the subjects of the content. In this sense, tagging is sort of the modern word for cataloging or the assignment of metadata.

But what if we are concerned with just the descriptive labeling of content and not other metadata? That might be called tagging or it might be called indexing. In this case, the difference is more nuanced, and to a certain extent it is historical.

When I first entered this field in early 1990s, the notion of "tagging" was not really known. Indexing, on the other hand, was a recognized activity. There are two kinds of indexing:
1) Closed indexing or back-of-the-book indexing, where the index is created based solely on concepts found in a single monograph, and the index is created for that one monograph and is then finished ("closed").
2) Open indexing, or what was then called database indexing, whereby index terms taken from a controlled vocabulary or thesaurus are assigned to multiple individual documents or digital assets, with the content ever growing over time and the same index terms will point to increasingly more documents over time.

Then, with the rise of social media, "tagging" became popular in the form of assigning keywords and names to photos or blogposts or other digital content. Initially, tagging was clearly different from indexing, because:
1) Tagging did not use a controlled vocabulary (aka thesaurus or taxonomy)
2) Tagging was done by creators and consumers of content, and not trained indexers. "Indexer" is a profession; "tagger" is not.

Indexing is also different from tagging by what results from it. If we look to the origin of the word "index", it means to indicate or to point (as with your index finger). So, the result of indexing is an "index" that the user can browse to locate referenced (if in print) or linked (if electronic) content. A thesaurus/taxonomy and an index (a structured list of the terms that had been used for indexing) could be essentially the same thing. Sometimes not the entire index is browsable but rather just a section via a type-ahead scroll-box feature. Tagging, on the other hand, with the lack of controlled vocabulary, does not result in any created work, just a folksonomy, which, with its multiple terms with the same or overlapping meaning, is not suitable for browsing. If displayed, tagging terms are shown by popularity instead, such as in a tag cloud, which is interesting, but not an accurate method for content findability and retrieval.

In time, enterprise software adopted social media methods, user interfaces, and features. As a consequence, tagging became more formalized as an employee task, and folksonomies got edited into controlled vocabularies or taxonomies, if not at least becoming sources for taxonomy terms. So, now tagging may be done with or without a controlled vocabulary, and both consumers and professional editors/content managers (if not “taggers”) do tagging.

"Tags" and "tagging" are now also designated features content management and digital asset management software, and content editors "tag" with terms from a controlled list. As such, the distinctions between "indexing" and "tagging" have become blurred, and what this activity is called may depend on what the software vendor, the industry (publishing may prefer to call it indexing, whereas ecommerce calls it tagging), and the corporate culture prefers to call it.

The designation of “indexing”, as open index creation, is also becoming less common as the full display of indexes has become less common. Search boxes (even if what the user enters into it is matched against a thesaurus) have often replaced long alphabetized lists of subject entries and subentries. We continue to find indexes at the back of books, but online for electronic content the displayed browsable index is less common than it used to be.

Tuesday, January 28, 2014

Taxonomies vs. Thesauri

Two taxonomy consulting projects I worked on last year seemed to lend themselves more to the development of a thesaurus than a set of hierarchical taxonomies. But clients usually ask for a taxonomy and not a thesaurus. Perhaps we need to ask what is in mind with the notion of a “taxonomy.” When someone wants a “taxonomy” developed, do they want a structured kind of controlled vocabulary to support consistent indexing/tagging and retrieval (the broad meaning of taxonomy), or do they specifically want a browse display of topics in a top-down navigation structure in a user interface (the narrower meaning of taxonomy)? The broad meaning of “taxonomy” includes thesauri, too. So, if you are looking for the former, maybe it is actually a thesaurus that you want.

In its broad meaning, “taxonomy” often refers to any of various kinds of controlled vocabularies: synonym rings to support search without being displayed (which a search vendor might call a “thesaurus”), hierarchical topic trees without synonyms, faceted taxonomies, and finally the more complex taxonomies that include all of hierarchical relationships, associative relationships, and synonyms. The latter is what may be called a thesaurus. In such a case, I would be asked for “a taxonomy with hierarchical relationships, associative relationships, and synonyms, and possibly term notes or definitions,” rather than “at thesaurus.” The word “taxonomy” has become the standard term of reference in the business, outside library applications.

The usual differentiating distinction between a strictly defined taxonomy (its narrower meaning) and a thesaurus is that a thesaurus has all the features of a taxonomy plus the addition of associative relationships. This is largely true, and I will add that a thesaurus also must have equivalence relationships (between a “preferred term” and its synonyms or nonpreferred terms), whereas synonyms/nonpreferred terms are merely optional in taxonomies, depending on the taxonomy size. Thesauri should also be built according to the standards of ANSI/NISO Z39.19 or ISO 25964, whereas taxonomies can be a little more flexible in their adherence to standards.

The extent of hierarchies

However, in my experience, I would say there is another very important distinction between a narrowly defined taxonomy and a thesaurus. A taxonomy has hierarchical relationships that bring in all of the terms/concepts into one or more (but a limited number) of hierarchical tree structures or facets. (We can consider a facet as a simple two-level hierarchy comprising the facet label and its narrower facet values.) Think of a taxonomy as supporting classification, categorization, and concept organization, with a basis in the Linnean taxonomy of animals and plants that is the most well-known meaning of “taxonomy.” The user typically enters a taxonomy from the top down.

In a thesaurus, by contrast, it is not necessary to structure all concepts (terms) into a limited number of top level hierarchies. A thesaurus focuses on terms and their immediate relationships with other terms. Hierarchical relationships between terms may result in extended hierarchies of various degrees, whether just two terms or more, but do not extend the depth of the entire taxonomy. Thus, numerous isolated hierarchies could exist. What this means is that a top down hierarchical display of a thesaurus would not comprise simply a few equally sized hierarchies, but rather numerous hierarchies of varied sizes and specificities. “Top terms” are not all of the same equal weight, importance of generalness. Therefore, while any thesaurus could be displayed hierarchically, it might not be desired to display hierarchically. Instead, the user might browse the terms of thesaurus alphabetically to select a term. A selected term will then indicate that term’s hierarchical relationships.

The idea of navigating without high-level hierarchies through which to drill down may seem odd, especially since hierarchy trees have become so common in website navigation. But there is no single right way to navigate. “Navigate” and “browse” are not synonymous with “drill down” through a hierarchy. Browsing could start out alphabetically and then jump from one term to the next via both hierarchical and associative relationships.

Blurred distinctions

You may have a hierarchical taxonomy with the additional thesaurus features of associative relationships, synonyms, scope notes for terms, etc., and then you can call it “a taxonomy with thesaurus features.” On the other hand, you may have a thesaurus that does in fact have an over-arching hierarchical structure, and you may call it “a thesaurus with a taxonomy structure.” Both of these kinds of “taxonomies” and “thesauri” would thus have essentially the same structure.

An organization might start calling its taxonomy a “thesaurus” if it chose to follow the terminology of its selected thesaurus software vendor. The following vendors, for example, call their products thesaurus management software and the results created as “thesauri”: Synaptica, Data Harmony, PoolParty, and MultiTes. Vendors have developed software that is full-featured, so not only can the software be used to create simple hierarchical taxonomies, but it also supports the full range of relationship types (hierarchical, associative, and equivalence) along with term notes, term attributes, and other maintenance tracking features. Thus, it is thesaurus management software that may be used for either thesauri or taxonomies or anything inbetween and other simpler types of controlled vocabularies.

Choosing the approach

The choice between adopting a hierarchical taxonomy vs. a thesaurus depend on the nature of the content and the users.
A hierarchical taxonomy would be fine if:
- The content is of a homogenous type that can be characterized by the same set of facets.
- The nature of the topics for the content falls neatly into a hierarchy.
- Users are not experts in the subjects and need to be guided by hierarchies.
A thesaurus would be more suitable if:
- Multiple, overlapping subject areas or domains are covered with diverse content.
- The terms need to be highly specific for detailed indexing.
- The topics do not lend themselves to neat hierarchies.
- Users are knowledgeable of the subject and will likely look for specific terms.