Friday, February 28, 2014

Tagging vs. Indexing

I have blogged before on the difference between tags and categories, but recently someone asked me about the difference between tagging and indexing (the manual kind). It's not a simple answer.

One important way in which tagging and indexing differ is that tagging involves any kind of designation about a piece of content, what it is or what it is about, whereas indexing is restricted to descriptive labels for what content is about. Tagging can include content type, date, creator, source, audience, location, rights, keywords, etc., whereas indexing is for the subjects of the content.  In this sense, tagging is sort of the modern word for cataloging or the assignment of metadata.

But what if we are concerned with just the descriptive labeling of content and not other metadata? That might be called tagging or it might be called indexing. In this case, the difference is more nuanced, and to a certain extent it is historical.

When I first entered this field in early 1990s, the notion of "tagging" was not really known. Indexing, on the other hand, was a recognized activity. There are two kinds of indexing:
1) Closed indexing or back-of-the-book indexing, where the index is created based solely on concepts found in a single monograph, and the index is created for that one monograph and is then  finished ("closed").
2) Open indexing, or what was then called database indexing, whereby index terms taken from a controlled vocabulary or thesaurus are assigned to multiple individual documents or digital assets, with the content ever growing over time and the same index terms will point to increasingly more documents over time.

Then, with the rise of social media, "tagging" became popular in the form of assigning keywords and names to photos or blogposts or other digital content. Initially, tagging was clearly different from indexing, because:
1) Tagging did not use a controlled vocabulary (aka thesaurus or taxonomy)
2) Tagging was done by creators and consumers of content, and not trained indexers. "Indexer" is a profession; "tagger" is not.

Indexing is also different from tagging by what results from it. If we look to the origin of the word "index", it means to indicate or to point (as with your index finger). So, the result of indexing is an "index" that the user can browse to locate referenced (if in print) or linked (if electronic) content.  A thesaurus/taxonomy and an index (a structured list of the terms that had been used for indexing) could be essentially the same thing.  Sometimes not the entire index is browsable but rather just a section via a type-ahead scroll-box feature. Tagging, on the other hand, with the lack of controlled vocabulary, does not result in any created work, just a folksonomy, which, with its multiple terms with the same or overlapping meaning, is not suitable for browsing. If displayed, tagging terms are shown by popularity instead, such as in a tag cloud, which is interesting, but not an accurate method for content findability and retrieval.

In time, enterprise software adopted social media methods, user interfaces, and features. As a consequence, tagging became more formalized as an employee task, and folksonomies got edited into controlled vocabularies or taxonomies, if not at least becoming sources for taxonomy terms. So, now tagging may be done with or without a controlled vocabulary, and both consumers and professional editors/content managers (if not “taggers”) do tagging.

"Tags" and "tagging" are now also designated features content management and digital asset management software, and content editors "tag" with terms from a controlled list. As such, the distinctions between "indexing" and "tagging" have become blurred, and what this activity is called may depend on what the software vendor, the industry (publishing may prefer to call it indexing, whereas ecommerce calls it tagging), and the corporate culture prefers to call it.

The designation of “indexing”, as open index creation, is also becoming less common as the full display of indexes has become less common. Search boxes (even if what the user enters into it is matched against a thesaurus) have often replaced long alphabetized lists of subject entries and subentries. We continue to find indexes at the back of books, but online for electronic content the displayed browsable index is less common than it used to be.