Wednesday, July 1, 2015

Taxonomies for Indexing Images

It’s becoming more common to index images with taxonomy terms, instead of just text documents or instead of just keyword-tagging of images. A taxonomy for the subject-indexing of images need not be significantly different than a taxonomy for indexing textual documents, but other metadata differs, and the indexing activity is also quite different.

A dedicated taxonomy for images might be needed for various reasons:
1.    There is no subject-indexing of text documents by an organization.
2.    Different software systems are used by the same organization to manage images and for managing text documents.
3.    Text documents of the same organization are large and thus indexed or cataloged at a broader level.

1.    No text indexing
Some organizations have a large image collection, and that is what they focus their indexing efforts on. They thus design or adapt a taxonomy specific to their image collection. They likely did not have any taxonomy for indexing text. They either don’t find the need for text document search and retrieval, or if they do, they will simply use the search engine instead, since, after all, search engines can search on text, unlike images.

2.    Different systems
Large image collections are increasingly managed in dedicated digital asset management systems, which are designed to support the various metadata associated with images and other nontext media files. Text documents, on the other hand, may be managed in document management systems, record management systems, or collaboration systems such as SharePoint. Each of these kinds of system support some form of controlled vocabulary for tagging content. But if the images are in one system and the text documents are in another system, different controlled vocabularies are likely to be developed. Of course, a generic “content management system” may be used for both images and text documents, but many organizations don’t manage all their content in a single system.

3.    Different levels of indexing detail
The classic example of different levels of detail is for materials at Library of Congress, which had developed Subject Headings for descriptive cataloging for library materials, which are generally monographs, such as books, or video-recordings of films, or sound recordings of music collections. While the subjects of these works might be quite specific, they are often not as specific as an individual graphic material. (An entire book may have numerous specific images.) But over the years, individual images also became part of its collection, and the LC Subject Headings were not specific enough, so the Library of Congress development the Thesaurus for Graphic Materials, which is freely available. The fact that the Thesaurus for Graphic Materials exists does not mean that a dedicated thesaurus for images is always needed, but that it was needed in the context of the Library of Congress collections and the shortcomings of the Library of Congress Subject Headings for indexing images.

If you already have a detailed taxonomy for documents, it certainly can be used for images, as well. Some terms, such as for abstract concepts (such as “Beliefs”), will simply not be needed in the image indexing, whereas a new terms might need to be added (such as the name of a specific type of flower.)

There is definitely unique metadata for images, of which subjects for indexing are just a part. Examples of other possible image metadata includes Creator/photographer, Location shown, Location of creation (camera location), Collection name, Time or part of day (especially if outdoors), Date taken (in contrast to date the image was digitized or edited), Number of people depicted, Copyright, Intended purpose, etc. The Thesaurus for Graphic Materials has had a separate “genre” facet that is very specific for types of graphical works (such as terms for Abstract paintings, Family trees, HVAC drawings, and Magazine covers). Image metadata standards include the IPTC (International Press Telecommunications Council)’s Photo Metadata for photojournalism. Different metadata may be needed for different kinds of images (news, commercial/advertising, art, etc.)

Indexing images is different from indexing text documents. First of all, it’s mostly manual because automation is very limited in image detection (but may be able to detect people’s faces). It’s more subjective as to what is of key importance in an image versus a document. An indexer may also tend to index for what is not actually depicted but for what is implied, which often, but not always, should be avoided.

I recently attended a conference presentation on this subject, “Get the Picture: Use Your Taxonomy to Classify Images” at the SLA conference in Boston earlier this month. The presenter, Ann Poole from Corbis, mentioned various challenges of image indexing, including over-indexing by photographer-submitters, indexing for emotions depicted or implied, and indexing for the backstory of an image in a known place.

Thursday, June 4, 2015

Taxonomist Trends



Last month I conducted an online survey of 150 taxonomists (described in my last blog post). Although the results of which will be used in another publication, it is interesting to note at this time a few comparisons between the results of this survey with a similar one I had conducted in late 2008 for my book, The Accidental Taxonomist. While I added further questions this time, some of the questions stayed the same for comparison.

We would expect over time that more taxonomists have been doing the work for longer. While this is the case for those in the field for 8-15 years, for those involved in the longest period, over 15 years, surprisingly, the survey results did not indicate this. Those who have done taxonomy work for 15 years or more were 26.2% in 2008 but only 17.6% now. The raw numbers, however, for over 15 years did, in fact, increase. So, the survey percentage indicates that there are proportionally more people who have been involved in taxonomies for an intermediate period of time. At the most beginner level, the numbers and percentage of respondents with less than a year of experience in taxonomies declined, from 9.2% to 3.4%. Those with 1-4 years of experience are about the same, and those with 4-15 years of experience increased from 32.4% to 41.2%. So, these numbers could indicate a maturing of the taxonomist profession, but not a graying of the field.

Trends in taxonomist work situation has not changed much with respect to it being a primary job responsibility vs. secondary and with respect to freelance vs. full-time employed. There was a noticeable difference, though, among those who are freelancers (totaling 17% before and 16% now), that more of them are now doing freelance taxonomy work only “occasionally” compared with before,  8% now compared with 4.7% in 2008, and not as many are doing it “often” as before, 8% compared with 12.5%. The fact that there is work for those who want to do freelance taxonomy work only occasionally, whether on top of another job or in combination with other kinds of freelance work is encouraging for those individuals who want to gradually break into taxonomy work.

Regarding the professional and educational background, the leading degree and prior profession of taxonomists today remains that of librarian, and the percentage has, in fact, increased slightly. Meanwhile, those with a technical background have proportionally decreased.  The percentage with an MLS/MLIS degree increased from 48.4% to 54.4% of respondents, and for the options of prior work experience, “librarian” increased from 27.7% to 28.3%. Those with an M.S. or M. Eng. degree decreased from 14.1% to 8.7%. Those with a background in Software/IT decreased from 12.3% to 8.3%, and those with a background in database design, development, or administration, decreased from 6.2% to 1.5%.  While the taxonomy field can certainly benefit from those with a technical background, it is not a necessary skill, and we might assume that fewer IT people in taxonomy work since 2008 might be due to an improvement in the economy, whereupon more of those people have found work in IT again.

In other areas, knowledge management, content management, and content strategy are backgrounds that have become more common, whereas “document management” has decreased. This is likely due to the fact that “content” of various formats is becoming more common than mere “documents.” Digital asset management was not even presented as an option, but three respondents wrote in the blank under “Other.”

Despite the preponderance of MLS/MLIS graduates, still only a minority of respondents had training in taxonomies/classification in college courses, and only a few percentage points more than before, merely reflecting that there were more MLS/MLIS graduates. Those having taken continuing education courses or workshops on taxonomies increased from 13.8% to 20.1%, but there are more such course that did not exist before (including mine). On-the-job training remains the primary means of learning how to create taxonomies. There has been a slight increase in on-the-job “formal” training over “informal” learning and experience, with the percentage with formal on-the-job training having increased from 21.5% to 28.9%.  Since this particular survey question permitted multiple responses, the leading response of informal on-the-job learning was 71.1%, but this was the only response option with a decrease (down of 83.1%). This is a good sign that taxonomists seem to be learning the skill in more varied means than the dominant on-the-job experience.
 

Monday, May 11, 2015

Taxonomist Survey

I had created a survey of taxonomists to gather some information for writing my book, The Accidental Taxonomist. It was mainly for Chapter 2: Who Are Taxonomists?  With the word “taxonomist” in the title, I had to write something about taxonomists, and not just about taxonomies, and this was the best way I could get more information than some anecdotes from colleagues.

But that was in late 2008, 6½ years ago. Has there been change in the industry since? In most fields, 6-7 years is not long at all, but in field of taxonomies, there could be changes. First of all, there have been significant changes in the economy over that particular period (recession and partial recovery), and, at least for internal, enterprise taxonomies, the role of the taxonomist could be considered something expendable in tight economic times. (I know, as I was laid off in 2008 and again in 2010.) More significantly, the field of information science is evolving very rapidly. So, I released a new survey this month.

My previous survey had 9 multiple choice questions and one open response. I chose to keep those questions with no changes or only minor wording changes, in order to compare the changes over time. I also decided to add a few more questions. To help me come up with the questions, I asked for input from an audience of presentation I have last month ("Taxonomy Displays: Bridging UX & Taxonomy Design" at the Content Strategy Seattle Meetup. Suggestions from that group included questions on the size of taxonomies, job titles, and taxonomy work pain points. The current survey now has 14 multiple-choice questions, one very short answer (job title), and three open responses, although all questions are optional, and it is permitted to skip questions.

Where to find taxonomists to survey


In 2008, I could think of only one logical channel to find taxonomists, the Yahoo group called Taxonomy Community of Practice. But it is no longer the only group and no longer the most active. The Taxonomy Community of Practice Yahoo group averaged only 5 messages per month in the last 6 months. In contrast, the 6 months around the time of my last survey, this group average 39 message per month. This is most likely because the LinkedIn group of the same name, Taxonomy Community of Practice, which was created in September 2007, has taken over the most of the taxonomy discussions.  Furthermore, there are additional LinkedIn groups, such as “Controlled Vocabularies”  and “Thesaurus Professionals.” The American Society for Indexing started a Taxonomies & Controlled Vocabularies Special Interest Group in late 2007, and SLA (Special Libraries Association) started a TaxonomyDivision in 2009, both of which have member discussion lists.

I have announced the current survey in all of these groups and more. However, I do not expect to reach significantly more taxonomists than before. That’s because, whereas the single Yahoo group back in 2008 tended to be subscribed to by email (individual or digest), the proliferation of groups and lists of similar or overlapping subjects has led to subscribers/members to opt out of direct emails. Additionally, email software, such as Gmail, can filter messages from lists to a category/tab that users may choose to overlook. So, my email announcements of the survey to groups may go unnoticed by many group members. It would be tempting to individually contact everyone I know personally who is involved in taxonomy work, but that could be a personal bias that would skew the pool of respondents.

Taxonomist tendencies


There have already been enough respondents to the current survey, that I can safely say that the largest number do taxonomy work as their primary responsibility, as with the previous survey, and that, like before, the majority are employees, rather than contractors, freelancers, or independent consultants. The most common educational or professional background (although not the majority) is library/information science. What is striking, though, is that despite the fact that 48% of respondents in 2008 had an MLS/MLIS degree (and from the early survey returns, the percentage is even slightly higher), only a small percentage of taxonomists learned taxonomy skills through formal educational institution coursework. Self-taught through reading, on-the-job experience, and on-the-job training, and conference workshops or seminars are each methods of learning taxonomies that are more prevalent than college courses. Additional, more specific comparisons will be the subject of a future blog post.






Saturday, April 25, 2015

Trends in Hierarchical Taxonomy Displays


Taxonomies connect users to content. So, how a taxonomy is displayed to users is very important in its effectiveness. This is a topic about which I gave a conference presentation back in 2011 and will present again next week. As I update my previous presentation, looking at some of the same public websites with taxonomies, I have observed some changes that might be considered as trends.

While faceted taxonomies (used to filter/refine/limit results by certain criteria with choices of taxonomy terms) have become more common on ecommerce or other database websites, they are not suitable in all circumstances, and when a taxonomy has a large number of topical terms, a hierarchical arrangement of those topics might be better.

Displayed full hierarchical taxonomies, however are more difficult to find. They are not as often the default.  Some have disappeared entirely such as the Yahoo directory, which was discontinued in December 2014 after 20 years. (Admittedly, trying to classify as many websites as possible into a hierarchy, as the web keeps growing, is a never ending task.) In other cases, the search box is more prominent on the page, and the link browse categories needs to be hunted for.

In the past, I had observed two main different kinds of hierarchical displays: one-level-per-page and expandable hierarchies with plus signs. The first has evolved, the second is has become rare, and a third method has emerged.

One level of taxonomy hierarchy per page was the design of the former Yahoo directory and had been early on the style followed on other sites. An example that closely follows the Yahoo Directory, is the dmoz/Open Directory Project. A list of category labels or topics at each level takes up the entire screen/page display, without the display of other content. Displaying additional content on every page has become important, so hierarchical taxonomy categories now tend to be confined to more compact lists to free up space on the web page for content. This works for some taxonomies, not all. Meanwhile, a list of terms at the same level that take up the entire page is a style that is rarely followed anymore.

Expandable hierarchy “trees,” typically with plus signs next to topics to expand a topic’s subcategories has become quite rare, at least in public web sites. An example are the USA Today topics. This hierarchical taxonomy design had been developed based on the recognizable desktop file folder structure, such as in Windows. In the meantime, users have become familiar with different representations of topic hierarchies on the web, so mimicking expandable file menus is no longer the only way to engage users. Expandable topic hierarchies are not as easy to update and change on websites and, it can take a long time to load the web page. Expandable hierarchies allow the users to have more than one hierarchical level expanded at once, which facilitates exploring the taxonomy. As much as we taxonomists might enjoy browsing a taxonomy, the goal is to get users to content rather than have them spend time exploring the taxonomy.

A third method of displaying multiple levels of a hierarchical taxonomy is through “fly-out” subcategory lists. Examples include Lynda.com (under "Browse the Library") and Books & Authors. I had not noticed this method before, so it seems to be a new trend. They are similar to submenus in website navigation, but rather than for website navigation, the topics are linked to indexed content items, which are listed in a result set for each subtopic. Fly-out subcategories allow the users to still see the parent category list, if the user wanted to back out to it, like in an expandable tree hierarchy. But unlike an expandable tree hierarchy, you cannot have multiple parent categories expanded at the same time, which is not that important anyway. The fly-out subcategory style is thus a positive trend in hierarchical taxonomy displays.

Tuesday, March 31, 2015

Varied Taxonomy Uses and Taxonomist Functions

Someone asked me recently if taxonomies were applicable to some marketing analytics he was pondering. I was not sure without further discussion. The interesting thing about taxonomies is that they have such varied uses.  Perhaps because there is no single dominant use of taxonomies, taxonomists have to go into long explanations of how taxonomies are beneficial. There is no neat list of taxonomy uses. Following are some broad categories of taxonomy usage, all but the last of which, I have worked on.
  • A key component of a product of published information for retrieval (such as in a news, periodical article, or reference database)
  • A (partial) solution to an information management problem of an organization
  • A method to connect customers to products or services, typically on a website
  • A method to connect users to information on a public information-sharing or networking website (monetized by advertising or other means)
  • As descriptive metadata in a document management, content management, records management, or digital asset management system, to support tagging and subsequently support retrieval of internal content.
  • A method to model data, information, or knowledge to serve an organization’s knowledge management strategy

Sometimes more than one of the goals may be pursued simultaneously by the same owner of the taxonomy. This is when it gets complicated, and it needs to be carefully considered whether a single taxonomy or separate taxonomies would be best.

Building up a clear list of the applications of taxonomies, not something in marketing-speak, and more specific than the areas listed above, would be a worthwhile service of the websites of taxonomy consultants and taxonomist-related professional organizations.

Taxonomy consultants need to ask from the start whether the taxonomy project they are hired to work on will be primarily for internal or external access, and not make assumptions.  It could be for both, but usually one purpose is seen as primary. Once, in my earlier days of consulting I made an assumption, and my proposal for an “enterprise taxonomy” was even accepted by the client, before I realized that their taxonomy would be primarily for public web content.

Varied taxonomist job functional areas


Just as taxonomies may have varied uses, so the functions of a taxonomist are varied. One interesting aspect about the taxonomy field, and taxonomy consulting in particular, is that transcends both internal (employee facing) and external (customer or public facing) functions of an organization. I have personally found this a very interesting aspect of the profession.

Taxonomists who are employed may work in various different departments of an organization. As such, taxonomists could find themselves either part of internal functioning groups (knowledge management, content management, information technology) or external-oriented groups (marketing and related web services).  I have worked in the organizational departments of editorial, software product development, information technology (as it was overseeing the SharePoint implementation), and consulting services, all of which while in the role of a taxonomist.  Additionally, I have seen taxonomist job postings in departments of marketing, ecommerce, communications, libraries, data governance, financial service operations, information management and technology, and the Information Management and Tech Writing department.

In any organization where one or more taxonomists are employed within a specific department, there are likely taxonomy-related needs in other departments. It would be beneficial to the organization if the taxonomists’ skills could be applied to special taxonomy-related projects outside their home department, such as across both marketing and information management.