Monday, December 3, 2012

Taxonomies and Content Management

Taxonomies are relevant to various applications, implementations, software products, disciplines, and industries, whereas taxonomy itself is not really a discipline or industry.  This is apparent in how taxonomy shows up as a topic in presentation session in many different conferences. These include conferences and fields of: knowledge management, enterprise search, content management, digital asset management, semantic technologies, text analytics, document management, records management, indexing, information architecture and user experience.

Content management and content technology was the subject of the most recent conference I attended, the Gilbane Conference in Boston, November 28-29. The Gilbane Conference, now in its 9th year takes place annually the week after Thanksgiving in (end of November or beginning of December) in Boston and often also in San Francisco in May or June.  The conference, named after its founder and chair, Frank Gilbane, has the tag-line “Content, Collaboration & Customers – Managing & Enhancing Experience.” Sessions are divided into four tracks: (1) Customers & Engagement, (2) Colleagues & Collaboration, (3) Content Technologies & Infrastructure, and (4) Web & Mobile Publishing.

Taxonomies at this year’s Gilbane conference were the focus of two presentations, and were mentioned in many others. Just as content management strategies and systems may be specialized for either internal/enterprise content or for external/public web content, so may taxonomies be applied either internally or externally (and sometimes both). So, it was appropriate that one presentation on taxonomies, “Value of Taxonomy Management: Research Results” by Joseph Busch, focused on enterprise content taxonomies, and the other, “Taxonomies for E-Commerce,” which I presented, focused on public website taxonomies.

The connection between taxonomies and content management is a very important one.  A taxonomy does not do much good when it stands alone. Its purpose of existence is typically to facilitate finadability and retrieval of specific content, whether by browsing or searching.  On the other side, content is not of much use if it cannot be found. Content management refers to managing the workflow and lifecycle of content from the planning stage and creation/collection stage through the disposition/archiving stage, with an analysis/evaluation stage bringing it full-circle. There is typically a sub-phase for content organizing, categorizing, metadata-assigning, or indexing. This is where taxonomy comes in: to provide structured categories and/or to provide a consistent vocabulary for metadata and indexing.

The field of content management is often defined in terms of its products: content management systems (CMS) and their variations, which include enterprise content management (ECM)/document management systems and Web Content Management (WCM) systems. The software vendors are an important part of conferences, such as Gilbane, and are also the subject of analysis and comparison by industry analysis firms such as The Real Story Group, CMS Watch, IDC, Forrester Research, and the Digital Clarity Group.  Content management tools do include capabilities for managing taxonomies, vocabularies, or metadata, but the capabilities vary. For anything but a simple or small taxonomy, it might be preferable to create the taxonomy externally in a dedicated taxonomy management tool and then import it into the content management system. The limitations of a content management system in the area of taxonomy management, therefore, should not necessarily limit the taxonomy.

Content management and content management systems focus on processes, and that it’s a good way to look at taxonomies, too. Taxonomies are not static, but need follow a life cycle, as does content: planned and designed, developed and edited, possibly translated, published or implemented, used in tagging, then used in browsing and searching, and finally reviewed an analyzed for further revision. Governance is also an important for both content management and taxonomy management.

The biggest challenge to integrating taxonomies with content management strategy and systems is not technical but rather in human resources. A lot of time, energy, and money is put into selecting and implementing a content management system and planning a content strategy around it. Taxonomy is only one piece of the puzzle, and may not always get the investment of time and money it deserves for a full and proper design and development. However, the better a taxonomy is designed, the better it works.


  1. Very interesting points in your post and very true that detailed attention to Taxonomy creation and 'governance' is as important, if not more so than the 'surroundings' in which it is has to co-exist to deliver real value.

  2. You have included some very important points on the challenges of integrating taxonomies into content management (CM) systems. (1) Human Resources allocation, and (2) not always getting time and money it deserves for full and proper design, development, and continued maintenance - are all issues that come up often and need to be addressed. I’ve also come across another challenge, with integrating taxonomies with some content management (CM) systems, which is more technology focused and I’d going to try to describe it below.

    In your article you mentioned that the taxonomy can “provide structured categories and/or … a consistent vocabulary for metadata and indexing”. The challenge that I’ve encountered is that at times the hierarchical relationships between the terms in the taxonomy can get “lost” when integrating the taxonomy with some CM systems. And when those hierarchical relationships get “lost”, everything is simply treated as a flat list (authority file) within the CM system.

    To try to explain this challenge further, I am going to use a taxonomy example from one of your earlier articles. :-) For example - if a taxonomy had a list of companies one being “Ford Motor Company”, and one of the narrower terms for “Ford Motor Company” was “Lincoln Division”, once the “Lincoln Division” term were used in a metadata field some CM system cannot maintain the connection between “Lincoln Division” and “Ford Motor Company”.

    Then if the connection between “Lincoln Division” and “Ford Motor Company” is not maintained, and the taxonomy had multiple terms at the same level as “Ford Motor Company”, then every term (in this case every company and division) in the taxonomy would get treated the same way. When every term in the taxonomy is treated the same way in the CM, the taxonomy will then basically be used as a flat authority file with a mixture of terms, in this example it could be a mixture of companies and divisions. And normally when authority files are created, they do not contain unlike terms – they are normally a flat list of like terms.

    So in summary then, the technology focused challenge I’ve come across with integrating taxonomies with CM systems is being able to fully articulate the value in the hierarchical relationships in the taxonomy and finding a CM that can leverage those relationships.

    -Paula Markes