Showing posts with label Software. Show all posts
Showing posts with label Software. Show all posts

Sunday, August 10, 2025

When to Design a New Taxonomy for a New System

Often organizations determine that a suitable time to adopt a new taxonomy is in conjunction with adopting a new system for its implementation, such as a content management system (CMS) or digital asset management system (DAM). They can budget taxonomy design and development services as part of the consulting services needed for the content migration and system implementation project, and they can improve and optimize the taxonomy for its new implementation and use.

There is the question of timing, though. Recently, a prospective consulting client asked me whether the new taxonomy should be developed prior to the selection and implementation of a new system or afterwards. Ideally, both the taxonomy project and the CMS or DAM adoption can happen simultaneously. However, the design and development of a taxonomy takes less time (typically 3-4 months) than the adoption of a new CMS or DAM. Altogether, a system selection, with a trial or a proof-of-concept project, implementation, data/content migration, and user training, can take 6-18 months.

Benefits of Taxonomy Development Prior to System Adoption

The primary benefit of developing a taxonomy prior to system adoption is that you can make it a system requirement that the new system supports the taxonomy that you have designed to best serve your users, your desired tagging method, and the nature of your content. These criteria should take precedence over designing a taxonomy to fit the requirements (or limitations) of a CMS or DAM.

Over time, your organization will adopt other systems, and the taxonomy should be suitable for multiple systems, rather than being system specific. Especially if you have an enterprise (enterprise-wide) taxonomy as your eventual goal, designing your ideal taxonomy first should be your approach. If one system cannot take advantage of all features of your taxonomy, another system may. There are also usually development work-arounds to get the full use out of your taxonomy.

Benefits of Taxonomy Development After System Adoption

A CMS or DAM has a variety of functions, and tagging and retrieval of content with a taxonomy in only one of those functions. Workflow management, rights management, authoring features (for CMS) and image/video editing features (for DAM) tend to matter more than taxonomy use among the requirements for a system. You can make “good support of taxonomy management and tagging” a requirement for your new CMS or DAM without getting into the specifics.

Adding features a taxonomy (such as polyhierarchy, related-concept relationships, end-user scope notes, different sets of synonyms/alternative labels to support each tagging and searching) if the system you later adopt does not support them is a waste of time and resources. It’s better to wait until a system in selected and implemented before fully designing a taxonomy.

Iterative Taxonomy Design Approach

When implementing a new taxonomy with a new system, the ideal approach is to spread out the taxonomy design and development tasks over the phases on the system selection and implementation process.

You should consider basic taxonomy requirements early in the system selection process. To do this, you might categorize different taxonomy support features as essential and nice-to-have. The method of tagging (automated, manual, automated with human review, and a mix) needs to be determined as both a system requirement and as a factor in the design of the taxonomy.

Then during the lengthy process of system testing and selection, information-gathering work for the taxonomy may take place. This involves stakeholder interviews, user focus groups or brainstorming sessions, content analysis, and review of existing/legacy taxonomies and other controlled vocabularies. Draft versions of portions of the taxonomy, without all features, may be created and reviewed, prior to the system selection decision.

After the CMS or DAM is selected and is in the process of being implemented the taxonomy design can be refined with features that the new system can support, and then the taxonomy can be fully built out. The new taxonomy can also be tested in the new system for its suitability for tagging and retrieval, and final enhancements are made based on the test results. The documentation of the taxonomy, including guidelines for its maintenance (a governance plan), should be started early in the taxonomy design process, but additional system-specific documentation is created after the new system is implemented.

Tuesday, April 30, 2019

Taxonomy Software Trends: Convergence and Visualizations


I recently looked more closely into current offerings of taxonomy software to prepare for an upcoming presentation at the SLA conference in Cleveland in June: “Taxonomy Tools and Tool Evaluation.” I will speak about the tools, and my co-presenter, Marti Heyman, will speak about how to evaluate them. I had last contacted various software vendors in 2015 when I was writing the second edition for my book, The Accidental Taxonomist. I had previously blogged on Taxonomy Software Trends in January 2015 and observed that, since researching software for my first edition in 2009, there is more cloud/web-based software, more SKOS/RDF/Semantic web framework software, and more plugins to SharePoint, content management systems, and search engines. Those trends continue. Now that I look into taxonomy software again, the additional trends I see are taxonomy, thesaurus, and ontology tool convergence and graphical vocabulary visualization.

Taxonomy, thesaurus, and ontology software convergence


Originally there was thesaurus management software (also used for any taxonomies), such as MultiTes, Data Harmony Thesaurus Manager, Synaptica KMS, and other products that no longer exist;  and ontology management software, such as TopBraid Composer, Protégé, ad others. The two kinds of software were very distinct, from different vendors, based on completely different standards and models, with different features, used by different users, for different purposes.

Now, we don’t hear as much about “thesaurus software” as before, but rather vocabulary/taxonomy/knowledge organization system (KOS)/ontology software, where the same software tool supports thesaurus standards (ANSI/NISO Z39.19 or ISO 25964) and ontology standards (OWL and RDF), and especially the SKOS (Simple Knowledge Organization System) model for any kind of controlled vocabulary. This makes sense, because an organization often has needs for more than one kind of controlled vocabulary. Newer software offerings have combined taxonomy, thesaurus, and  ontology software into one. These include Smartlogic Semaphore, PoolParty, Synaptica Graphite, TopBraid Enterprise Data Governance’s Vocabulary Manager, Mondeca Intelligent Topic Manager, and VocBench

Visualizations of taxonomies, thesauri, and ontologies


Interactive visualization charts/graphs of taxonomies (what I shall call all controlled vocabularies here) are not something I had paid much attention to, because the feature is not considered so important by a professional taxonomist for creating taxonomies. However, while taxonomists are the primary users of taxonomy management software, other stakeholders in taxonomies are important secondary users. These people include content managers, content strategists, project managers, knowledge managers, information product managers, user interface/experience designers, and subject matter experts. Rather than creating taxonomies, these various stakeholders need to view draft taxonomies and provide feedback on them. Viewing the taxonomy in the user interface used by the taxonomist is often not practical or intuitive. However, viewing the taxonomy as the end-user will see view it may not be possible, because the taxonomy has not yet been implemented into its final system or product. Therefore, a taxonomy visualization feature of taxonomy management software can be quite useful for stakeholder review and input.

Visualizations are especially useful for ontologies with their semantic relationships, but they are also helpful for taxonomies and thesauri. With the convergence of taxonomy, thesaurus, and ontology-creation capabilities in the same software, vocabulary visualization has become a more common feature. However, they are not the same in all vocabulary management software products. Following are some varied examples of visualizations. In many cases, they are interactive, whereby the user can drag and reposition the nodes.

Data Harmony Thesaurus Master offers a “sunburst” visualization for hierarchical taxonomies, as an alternative to the inverted tree display, which is available in the editing interface of the software.

Taxonomy visualization from Data Harmony Thesaurus Master
Visualization from Data Harmony Thesaurus Master

Synaptica KMS has a node and link relationship display for taxonomies and thesauri, where relationships do not need to be defined. Synaptica Graphite will have a new directed-graph visualizer feature added later this year.

Thesaurus visualization from Synaptica KMS
Visualization from Synaptica KMS


Semaphore, Mondeca, and TopBraid EDG Vocabulary Management each have a node and link relationship display for ontologies that additionally describes the types of relationships.



Ontology visualization from Smartlogic Semaphore
Visualization from Semaphore



Ontology visualization from Mondeca ITM
Visualization from Mondeca ITM




Visualization from TopBraid EDG Vocabulary Management
Visualization from TopBraid EDG Vocabulary Management



PoolParty offers a different type of visualization, focusing on the relationships of a selected concept, with each type color-coded. 
Visualization of a taxonomy concept from PoolParty
Visualization from PoolParty



In combination with other graph database tools, both Syaptica Graphite and PoolParty can support interactive nonhierarchical visualizations and graph analytics. This brings us to our next topic, knowledge graphs, which I will dicuss in my next blog post.

Monday, February 29, 2016

Free Taxonomy Management Software

There is always an interest in free taxonomy or thesaurus management software. Many people who create taxonomies try to save money on purchasing taxonomy management software by simply not using any taxonomy management software but something else they already have, such as Excel. Those who are developing either very large taxonomies or more complex thesauri, however, realize that a dedicated taxonomy/thesaurus management system will save a lot of time and headache in the long term.

Various free thesaurus management software offerings have been available since the early 1990s. They tend to have their origins in academic projects in computer science, information science, or library science at universities, and others have been government projects. Some free software of the previous decade is no longer available, though. Discontinued software is still listed for posterity on the web directory of "Software for building and editing thesauri," started by Leonard Will and now managed on the Taxobank website. For example, two free software products listed were for MS-DOS and one no later than Windows 3.1.

The first free thesaurus software I was familiar with was TheW, a simple thesaurus management software developed by Tim Craven a professor of information science at the University of Western Ontario, since retired. I actually ran across it, because I was at the time exploring another software program of Prof. Craven’s for creating website indexes. TheW32, which is available for Windows XP, Vista, and 8 and for Java, is no longer maintained. It was last updated for Windows in in 2007 and for Java in 2009. At this point, I would no longer recommend it.

Protégé Ontology Editor is an established free and open-source ontology editor from Stanford University. It is quite robust, has an active user community and support groups, and continues to be upgraded (with version 5.0.0 recently released in beta). The issue with Protégé is that it is a native ontology management tool, not a thesaurus management program (or even ontology “lite” as some thesaurus management software can manage semantic relationships and classes). Thus, it takes a very different approach to modeling and building vocabularies, which is not intuitive to taxonomists, such as myself, and, although I downloaded it, I never found it worth the difficulty to learn. If you can truly consider yourself an ontologist, though, then great, this might just be the solution for you.

I had explored some other free software offerings when writing my book, The Accidental Taxonomist, six years ago and came across TemTres and ThManager. At the time I did not find them adequately enforcing valid relationships between terms, so I was somewhat dismissive about the software. Recently I revisited these products.

TemaTres, which has its origins in the Library and the University of Buenos Aires, Argentina still does allow creating duplicate terms, which was my initial cause for concern, but since then the user interface of the latest version (2.1) offers a new configuration option for quality policies, to enable or disallow duplicate terms. Thus, TemaTres is a suitable free thesaurus software product if used by a knowledgeable and experienced taxonomist who knows to set the options and understands the alerts. TemaTres is being supported, and its latest version was just this winter, 2016. The software is web-based, which means that it requires a PHP, MySQL, and HTTP web server, so it may not be the configuration that any independent taxonomist would set up and install in a small/home office. Otherwise, TemaTres is worth looking into.

ThManager is from the University of Zaragoza and GeoSpatiumLab S.L., both in Zaragoza, Spain. ThManager supports the SKOS standard rather than ANSI/NISO Z39.19 or ISO 25964, which means it does not by default enforce all rules of the latter standards. But I have since found this to be a trend of new vocabulary management software: compliance with SKOS and support for ANSI/NISO Z39.19 or ISO 25964, as configurable rather than by default. Thus, I am no longer complaining if it does not support ANSI/NISO Z39.19 by default. The main problem with ThManager, though, is that it is not kept so well up to date. It was last significantly updated in 2006. The installation for even Windows 7 requires a “portable” version due to an installation bug.

More recently I discovered another free thesaurus management software, VocBench. It was developed originally for the management the AGROVOC thesaurus of the Food and Agriculture Organization (FAO) of the United Nations as a joint project of FAO, which is based in Rome, Italy, and the Artificial Intelligence Research group at the University of Rome Tor Vergata. VocBench, like TemaTres, is SKOS-compliant, rather than ANSI/NISO Z39.19 compliant. VocBench is web based, with web server requirements of Apache Tomcat, MySQL, and OWLIM installed on a Sesame2 server.

In addition to being free, these applications tend to have the advantage of being able to run on multiple platforms and yet can be installed and used by single user. The editing features may be a little less standard and thus less intuitive, and documentation and support tends to be less than commercial software. Yet, they are worth considering for long-term experimentation (with no time limit as in commercial demo software), for use in nonprofit or low-budget projects, or by anyone with a strong interest in working with open source software.

Saturday, January 31, 2015

Taxonomy Software Trends



I reviewed various taxonomy/thesaurus management software offerings recently, in preparation for the last of my 3-part webinar series, Practical Taxonomy Creation, and I noticed some trends since I last looked into software in such detail for my book over 5 years ago: more cloud/web-based software, more SKOS/RDF/Semantic web framework software, and more plugins to SharePoint, content management systems, and search engines.

The number of commercial vendors selling taxonomy/thesaurus management software is not significantly different, as some have left the market, and others have entered, and the rest have continued with updates and improvements. There are fewer commercial low-end, inexpensive, single-user desktop offerings, however. Products I have reviewed in the past and that have gone away include Webchoir TCS-10 Personal and Term Tree 2000. The Mac OS program Cognatrix has been unavailable for the past year, although the vendor intends to release it again as an Apple App Store program following the release of the next major version of Mac OS.

Subscription, web-based software


Synaptica pioneered web-based thesaurus management software when it introduced its product in 1995, when the Web was still young, but now other vendors also offer web-based subscription software. Data Harmony Thesaurus Master from Access Innovations was originally only available in a java-based multi-platform client-server installation. For the past several years a web version has also been available, and Access Innovations president Marjorie Hlava said in an email: “Increasingly our customers use the cloud version of the software.” Newer thesaurus management software products to the market have also been solely cloud-based. These include PoolParty, introduced by the Semantic Web Company in 2009, and TopBraid Enterprise Vocabulary Net (EVN), released by TopQuadrant in 2010. Meanwhile Synaptica began offering Synpatica Express, a cloud-computing solution for individuals and smaller businesses. Finally, the long affordable mainstay MultiTes Pro, a Windows-based desktop program that that has been available since 1983 in a single user version and then also for multiple users, introduced a multi-user cloud version about six years ago, which in 2013 was updated and renamed as MultiTes Online.

The cloud-based software offerings are, of course, priced on annual (and in one case, monthly) subscription fees, instead one-time license costs with lesser priced updates. Hopefully this means that more organizations will try out developing a taxonomy in the appropriate tool with the reduced commitment of cost for a shorter time.

SKOS/RDF/OWL Semantic web framework software


Supporting linked data and interoperability with Semantic Web content has become more important. Therefore, World Wide Web consortium (W3C) recommendations, such as the SKOS (Simple Knowledge Organization System) framework, RDF (Resource Description Framework) specifications, and OWL (Web Ontology Language) are being adopted by newer thesaurus/taxonomy software. The newer products, PoolParty and TopBraid EVN are both built around SKOS models. Synaptica and Data Harmony Thesaurus Master have been able to export to a SKOS and OWL schema for a long time, but it was only in 2013 that Data Harmony added user-defined fields to the SKOS export to include all fields in a term record. Additionally, in 2011 Synaptica introduced an Ontology Publishing Suite to publish an ontology or part of an ontology to the Web.

My first criterion for thesaurus management software is that is that in enforces relationship rules in accordance with the ANSI/NISO Z39.19-2005 standards. SKOS is not an alternative standard, but rather a framework that can be followed in addition to ANSI/NISO Z39.19-2005. Ideally a software product complies with both, and some now do.

Plug-ins and connectors for search and content management


The most common software for internal content management (even though it is not really a content management system) is SharePoint. Prior to 2010, SharePoint handled controlled vocabulary metadata in such a simple way (not even in hierarchies) that there was no point in trying to use taxonomies.  Starting with SharePoint 2010, with its Managed Metadata Services, taxonomies can now be utilized in its Term Store. However, despite Term Store improvements from SharePoint 2010 to 2013, it is still far from having the features and capabilities of a dedicated thesaurus management software product. Thus, ideally you create the taxonomy in the dedicated tool and port it over to SharePoint, and now almost all enterprise-level thesaurus management software products have methods to connect to SharePoint, whether through APIs, plug-ins, or dedicate “connector” modules.

There are also increasing numbers of content management systems and search software products being supported by thesaurus management connections. For example, SmartLogic Semaphore Ontology Manager has integrations with a greater number of applications than in the past, including SharePoint, Google Search Appliance, Apache Solr, OpenText, MarkLogic, and IBM Watson. PoolParty has a WordPress plugin, in addition to integrations with SharePoint and Drupal. Surely more such connections will be added, as I have recently heard of requests for taxonomy imports into Drupal.

Friday, April 11, 2014

Taxonomy Software Directories

It's difficult to find a list of taxonomy management software that is both comprehensive and up to date, yet not overwhelmed with related products and services. I define taxonomy management software as a tool to manually build and edit taxonomies, controlled vocabularies, and thesauri in accordance with industry standards. It should be the primary tool used by those who work as taxonomists. Lists of  “taxonomy software,” however, may include more than just tools for taxonomy management, such as auto-classification/auto-categorization/auto-indexing software, search software that utilizes taxonomies, or mind-mapping and other graphical categorization tools, etc.

Taxonomy maintenance, unfortunately, is just too small of a niche area for the major evaluators of software, whether consultancies, industry research firms, or trade publications, to find it worth their time to study. Companies that research the information technology market, such as Forrester Research, Gartner, International Data Corporation (IDC), and Real Story Group, won't get the commercial payoff from preparing studies of the taxonomy management software industry and products.

At the time I wrote my book, the most comprehensive directory of taxonomy software I found and refer my readers to was that of the British consultant Leonard Will, on the website of his consulting business Willpower Information, which lists 38 software packages, both commercial and freeware. Leonard Will had contacted each vendor and thus provided descriptive and contact information for each tool. The fact that this was a directory of "thesaurus" software and not “taxonomy” software is not an issue, and it was probably a good thing to include only software that meets thesaurus expectations. This directory was very comprehensive, including lesser-known free and open source software, which over time tended to become unsupported or even unavailable. With an interest in posterity, Leonard Will kept the unavailable software listed in his directory merely with a note to that effect. This may have been interesting for anyone thinking of developing their own thesaurus software, as they may be able to track down these other developers. For someone looking for a good commercial solution, however, there are far too many outdated products to weed through.

After Leonard Will retired, he decided he did not want to spend the time maintaining his directory, which he last updated in 2007, and in 2011 he offered the content of his directory to someone else, specifically contacting both Margie Hlava of Access Innovations and myself. Then Margie and I had to figure out which one of us would take it, fully aware that the rich content on a website would help our own respective business websites, yet it would also take quite a bit of time and effort to set up and maintain. After a year of hoping to find time, I finally relented that I would not and told Margie she could take it. The successor to the Willpower Thesaurus software directory, maintained by Margie’s employee Eric Ziecker, now resides at http://www.taxobank.org/content/thesauri-and-vocabulary-control-thesaurus-software

The core of TaxoBank's directory “Software for building and editing thesauri” at present is still essentially the same as the Willpower site, maintaining the original tabular content, style, colors, etc of that site, so visitors to the TaxoBank site may recognize it from Willpower. Posterity still seems to be valued, as all but one of the same 38 software packages are still there, although in two cases there is a note saying “The particular software referenced above is no longer available.” The notes section for many packages has been updated with additional content extracted from the vendor websites. More updating is still pending, though, as operating systems listed are dated, such as “Windows 95/98/NT/2000/XP.”

The main difference from the original Willpower site is the addition of 63 other products in a new section, separated by the note “Additional indexing, taxonomy, controlled vocabulary, thesaurus, classification, mapping and ontology software and services not referenced in Leonard Will's original listing follows below.” These additional products include many products not specific to “building and editing thesauri,” such as Apache Lucene, EMC Documentum, Oracle Endeca, Google site search, HP Autonomy, IBM Infosphere, and Microsoft SharePoint, along with one taxonomy consulting service. In my opinion, it might be better to have the related products and services on a separate web page to avoid possible confusion and to keep the list to a manageable length, as the total web page is currently 145 printed pages long. Despite these issues, I praise Margie and Eric for taking efforts to maintain this valuable resource.

As for a shorter list focused on current commercial software dedicated to supporting the manual creation and editing of thesauri and taxonomies, that may have to wait until the next edition (not yet started) of my book. For now, there are the products, as of early 2010, listed in Chapter 5 of  The Accidental Taxonomist book website links page. To this list, I would now add at least PoolParty and TopBraid Enterprise Vocabulary Net, both introduced since the book went to press. Meanwhile, taxonomy consultants still remain a valuable source of advice on taxonomy/thesaurus management software.

Monday, December 3, 2012

Taxonomies and Content Management

Taxonomies are relevant to various applications, implementations, software products, disciplines, and industries, whereas taxonomy itself is not really a discipline or industry.  This is apparent in how taxonomy shows up as a topic in presentation session in many different conferences. These include conferences and fields of: knowledge management, enterprise search, content management, digital asset management, semantic technologies, text analytics, document management, records management, indexing, information architecture and user experience.

Content management and content technology was the subject of the most recent conference I attended, the Gilbane Conference in Boston, November 28-29. The Gilbane Conference, now in its 9th year takes place annually the week after Thanksgiving in (end of November or beginning of December) in Boston and often also in San Francisco in May or June.  The conference, named after its founder and chair, Frank Gilbane, has the tag-line “Content, Collaboration & Customers – Managing & Enhancing Experience.” Sessions are divided into four tracks: (1) Customers & Engagement, (2) Colleagues & Collaboration, (3) Content Technologies & Infrastructure, and (4) Web & Mobile Publishing.

Taxonomies at this year’s Gilbane conference were the focus of two presentations, and were mentioned in many others. Just as content management strategies and systems may be specialized for either internal/enterprise content or for external/public web content, so may taxonomies be applied either internally or externally (and sometimes both). So, it was appropriate that one presentation on taxonomies, “Value of Taxonomy Management: Research Results” by Joseph Busch, focused on enterprise content taxonomies, and the other, “Taxonomies for E-Commerce,” which I presented, focused on public website taxonomies.

The connection between taxonomies and content management is a very important one.  A taxonomy does not do much good when it stands alone. Its purpose of existence is typically to facilitate finadability and retrieval of specific content, whether by browsing or searching.  On the other side, content is not of much use if it cannot be found. Content management refers to managing the workflow and lifecycle of content from the planning stage and creation/collection stage through the disposition/archiving stage, with an analysis/evaluation stage bringing it full-circle. There is typically a sub-phase for content organizing, categorizing, metadata-assigning, or indexing. This is where taxonomy comes in: to provide structured categories and/or to provide a consistent vocabulary for metadata and indexing.

The field of content management is often defined in terms of its products: content management systems (CMS) and their variations, which include enterprise content management (ECM)/document management systems and Web Content Management (WCM) systems. The software vendors are an important part of conferences, such as Gilbane, and are also the subject of analysis and comparison by industry analysis firms such as The Real Story Group, CMS Watch, IDC, Forrester Research, and the Digital Clarity Group.  Content management tools do include capabilities for managing taxonomies, vocabularies, or metadata, but the capabilities vary. For anything but a simple or small taxonomy, it might be preferable to create the taxonomy externally in a dedicated taxonomy management tool and then import it into the content management system. The limitations of a content management system in the area of taxonomy management, therefore, should not necessarily limit the taxonomy.

Content management and content management systems focus on processes, and that it’s a good way to look at taxonomies, too. Taxonomies are not static, but need follow a life cycle, as does content: planned and designed, developed and edited, possibly translated, published or implemented, used in tagging, then used in browsing and searching, and finally reviewed an analyzed for further revision. Governance is also an important for both content management and taxonomy management.

The biggest challenge to integrating taxonomies with content management strategy and systems is not technical but rather in human resources. A lot of time, energy, and money is put into selecting and implementing a content management system and planning a content strategy around it. Taxonomy is only one piece of the puzzle, and may not always get the investment of time and money it deserves for a full and proper design and development. However, the better a taxonomy is designed, the better it works.