Friday, July 19, 2019

Onsite Corporate Taxonomy Training

I enjoy teaching about taxonomies. The feedback I get from my students or workshop participants helps me improve my methods of communication, teaching, and consulting, and I learn about the varied implementations of taxonomies. The courses evolve and improve over time.  I teach online courses, conference workshops, and corporate onsite workshops. I’ve been making enhancements to the latter offering and this week led a  two-day onsite workshop at a major company on the West Coast.
Heather Hedden leading an onsite corporate training workshop in taxonomy design and creation.

Accommodating a varied audience


The participants in my “introductory” workshops, whether at conferences or at their corporate offices, have varied knowledge and experience with taxonomies. Some are complete beginners and are curious to learn about taxonomies and what they can do. Others have been tasked to build a taxonomy with little instruction and are looking for best practices and guidelines. Some of have read my book but have not had the opportunity to put what they have read into practice, so the workshop’s exercises are very helpful. Finally, some participants are experienced taxonomists seeking to fill in the gaps in their knowledge.

The absolute beginners may feel overwhelmed at the amount of information on taxonomies presented in one of my workshops, but I feel it’s important to provide enough instruction to enable people to actually create basic taxonomies (while ideally still getting feedback from someone more experienced). Also, I expect people to combine instruction from my workshop with other methods of learning taxonomies, such as reading my book, taking my online course, attending conference session on taxonomies, or getting advice from a taxonomist in their organization. While I would like to offer a more advanced workshops, it’s difficult to find enough experienced practicing taxonomists at the same location. (At a conference is possible, but sometimes conference organizers equate advanced taxonomy topics with ontologies.)

Interactive exercises

Taxonomy workshop participants doing a card-sorting exercise
Workshop participants doing a card-sorting exercise

Participants like interactive or hands-on exercises. One of the learning benefits of my onsite workshops is that they include interactive exercises that involve the entire group or class. My online course includes exercises or assignment to learn from the practice and from feedback I provide, but only the onsite workshops offer the opportunity to work on assignments with others and thus learn from others. Creating taxonomies, like designing websites or software user interfaces, needs to consider different views and is somewhat subjective. The classroom setting offers the opportunity to learn from others. 

Small-group exercises are the best for this kind of learning. My full-length workshops include small-group exercises for designing a set of facets and for doing a card-sorting exercise to categorize topics. Groups may comprise from three to six participants, depending on the total number. In addition to hearing ideas from their group members, participants then share the resulting taxonomy outline to the larger class, and I provide comments. Even exercises that do not involve small groups, but are assignments to consider and shout out answers, are beneficial, because we obtain, discuss, and evaluate various answers beyond the answers that any one individual might consider.

Remote participation is also possible, especially if the remote participants are co-located in the same office. They can form their own small group for the small group exercises, and they can do the card-sorting exercise online. This was the case in my latest corporate workshop.

Customizing corporate workshops


Heather Hedden leading a corporate onsite trainging workshop in taxonomy design and creation
To what extent I should customize the workshops for a specific organization was a question when I first offered corporate workshops. It’s not necessary, nor worth the time, to customize every example of taxonomy terms in the workshop presentation with something from the client’s domain of content. Rather, I found that it is sufficient yet instructive to customize just a few slides, such as those with examples of content types and use cases.

Another way I customize the workshops is by the outline and topics included. While all workshops include the basics (taxonomy types, definitions, uses and benefits, standards, structural design, best practices for creating terms and relationships, and governance), optional topics include: user interface display options, metadata and taxonomies, testing taxonomies, tagging, mapping taxonomies, multilingual taxonomies, integration with search, and taxonomy management software.

Finally, I customize the group exercises so that the choices for topics for facets would be applicable, and the card-sorting exercise may take an actual example especially if the client has a public taxonomy I can use as a basis for the exercise. I also include discussion questions, so that the participants can share and discuss the taxonomy issues as pertinent to their organization. In any case, I sign an NDA, so the client can comfortably share information with me which I may sue in the workshop.

Continuous improvement 


I found that asking the client for some input on possible customization, I can also generalize the issues to enhance the workshop presentation for future use. In other words, the client input on “customization” is not always that, but rather leads to a general improvement. The result has been to make the workshop presentation based more on real-world scenarios and less theoretical than my previous conference presentations. I actually did not consider my conference presentations to be that theoretical in the first place (since, after all, my knowledge of taxonomies is based on my work experience, not on studies for a degree in library/information science). But now I have made the workshops even more practical. 

Input from the client can also lead to topics for clarification, such as differing use of terminology. For example, a client wanted me to discuss taxonomy “mapping,” which we taxonomists understand to mean the creation of equivalence links between terms in one taxonomy and another, so that one taxonomy may be used to retrieve content that was tagged in the other taxonomy. However, what my client meant by “mapping” was a kind of “see also” related-term relationships between terms in two different taxonomies. Now I know to clarify and discuss both kinds of links between taxonomies.

Just as I am an accidental taxonomist and then an accidental consultant, so am I now also an accidental trainer. Details of my corporate training offerings are on my website

Sunday, June 30, 2019

Taxonomy Sessions at the 2019 SLA Conference

SLA (Special Libraries Association) offered a good number of taxonomy-related sessions at this year’s annual conference, held June 14-18 in Cleveland, Ohio, thanks to the organizing efforts of its Taxonomy Division. There were enough taxonomy sessions so that there was always at least one session of interest at any time.
SLA is a membership association of librarians and information professionals, particularly involved in “special” libraries or information services. Special libraries include corporate, specialized academic, government, military, law, medical, business, and nonprofit. I’m not a librarian (I’m an accidental taxonomist), so I didn't become a member of this professional organization until a Taxonomy Division was created 10 years ago. I’ve attended and presented at some, but not all of the SLA conferences in the past 10 years, as the taxonomy-related offerings vary, and presentation topics are usually the choice of the Taxonomy Division conference planning committee. This year, for the first time, I presented not one, but two sessions: the full-day preconference “continuing education” workshop and co-presented a session on taxonomy management software.
I was very pleased that there was such a rich program in other taxonomy sessions this year, especially compared to last year, thanks to Taxonomy Division conference program chair Janice Keeler and program committee members Edee Edwards and Margaret Nunez. There were also two sessions on knowledge management, which I found very interesting. The taxonomy-related sessions were:
  • "Introduction to Taxonomy Design & Creation" (full-day preconference workshop)
  • "Ensuring Semantic Interoperability and Creating Interoperable Taxonomies" (90 minutes, 2 speakers)
  • "Taxonomy Governance in Real Life" (90 minutes, panel discussion, 2 speakers and moderator)
  • "Taxonomy Roundtable” (90 minutes, three roundtables of participant discussions)
  • "Big Data and Controlled Vocabularies (30 minutes, 2 speakers)
  • "Taxonomy Basics" (30 Minutes)
  • "Keeping your Taxonomy Fresh and Relevant" (60 minutes, 2 speakers)
  • "Taxonomy-Ontology Conversions: Case Studies (75 minutes, 3 speakers)
  • "Taxonomy Tools and Tool Evaluation" (60 minutes, 2 speakers)

“Ensuring Semantic Interoperability & Creating Interoperable Taxonomies,” was a densely packed presentation covering the different types and issues in controlled vocabulary interoperability with two presenters in turn: Margie Hlava, President and Chairman, Access Innovations Inc., and Marcia Zeng, Professor of Library and Information Science, Kent State University. Marcia explained that interoperability is at different levels: system level, semantic level, and structural level, and her focus was on semantic interoperability, which is addressed in the international standard ISO 25964-2. She discussed in detail each of the different kinds controlled vocabulary interoperability. There are methods that are based on working from an existing knowledge organization system: derivation from an original source and expansion; and there are methods that involve working between/among existing vocabularies: integration/combination and interoperation/shared/harmonization.
“Taxonomy Governance in Real Life” featured two panelists of very different organizations: Paula McCoy, manager from ProQuest, and Susannah Woodbury, taxonomist from Overstock. Taxonomy governance was defined as maintaining the content of a controlled vocabulary (adds/deletes), maintaining the integrity of a vocabulary (standards and usage), and implementing a vocabulary (managing those who work in and use the vocabulary). Topics of discussion included working with stakeholders, the governance of change and how decisions are made, and staying flexible through the iterative process.
The “Taxonomy Roundtable” was a purely discussion-based session, whereby attendees divided into three groups of about 6-7, and each group got to discuss three of the four predefined topics in turn: taxonomy ROI, adding taxonomy to the workflow, implementing taxonomies in search, and taxonomies in user interface design. These topics were chosen based on a Taxonomy Division survey of members’ interests. Each table then reported the outcomes of their discussions to the larger group.
“Big Data and Controlled Vocabularies” was presented by Camille Matthew, Information Science Specialist, NASA Jet Propulsion Laboratory. Camille explained what big data is: an accumulation of data that is too large and complex for processing by traditional database management tools. Big data is big by 5 Vs: volume, velocity, variety, veracity (varied dates/outdated for example), and value. The issue is combining “stuff” (big data) and “strategy.” Strategies include controlled vocabulary, taxonomy, ontology, and metadata standards. Data structure cannot be assumed, so we design for unstructured content.
"Taxonomy Basics" was presented by Heather Kotula, Vice President of Marketing and Communications, Access Innovations Inc. This quick session was aimed at those new to taxonomies. It comprised definitions of taxonomies and other types of controlled vocabularies and also included quite a bit of history into the field of classification and naming.
“Keeping your Taxonomy Fresh and Relevant: The APA Thesaurus” was presented by Marisa Hughes, Taxonomist, American Psychological Association (APA). Marisa had recently led a thorough thesaurus update that took about a year and was completed in February 2019, and this presentation was largely based on lessons learned from that project. Change management is a key part of in taxonomy governance. Change is constant. Creating a responsive and relevant taxonomy involves a set of activities: adapt, determine, engage, delineate, data, identify. One needs to know when and why to change, and a roadmap is also needed.
 “Taxonomy-Ontology Conversions: Case Studies” comprised three case-study presenters: Edee Edwards, Ontology Architect at the National Fire Protection Association (NFPA); Mary Chitty, Library Director & Taxonomist at Cambridge HealthTech; and David Bender, Manager, Medical Ontology, Radiological Society of North America. The genesis of this session came out of a conversation at the SLA conference the previous year, when someone from Lexis Nexis asked you build your ontologies: from taxonomies or from data. It was anticipated that the case studies would be all be conversions from taxonomies to ontologies, but that was not necessarily so.
Edee Edwards explained that the NFPA was building a data science team, which was very interested in ontologies. There was also a data governance group involved. At that time, they also needed a system upgrade of vocabulary software, and the new one is SKOS-based. They did a proof of concept with our data science group. NFPA’s primary use for the ontology was auto-tagging.  
Mary Chitty's presentation “Preparing your taxonomy to be ready for data scientists & machine readability " presented the case of taxonomies at Cambridge HealthTech. It is still a taxonomy, not an ontology, but Cambridge HealthTech has recently partnered with OntoForce, a semantic search and data science company to use their search engine. The presentation was more about the issues than any solution. Ongoing challenges include dealing with legacy data, integrating acquired companies’ data, scaling up, and dealing with ambiguity.
David Bender’s presentation “The Big Maybe: Should You Convert Your Taxonomy to an Ontology?” presented the example of the controlled vocabulary of the Radiological Society of North America (RSNA), RadLex. RadLex, a model of anatomical procedure and modality as it pertains to radiology, is referred to as a lexicon or terminology, although it is arranged as a hierarchical tree/taxonomy. It was decided to have a structure, as an ontology, but there are still more questions than answers. In moving in the direction of ontology, they are using the tool Protégé and have converted RadLex into an OWL form, but otherwise it is still kept as a taxonomy.

Heather Hedden presenting on taxonomy tools at the SLA 2019 conferenceI presented the full-day preconference workshop "Introduction to Taxonomy Design & Creation"  and on taxonomy management software in the session "Taxonomy Tools and Tool Evaluation, and my co-presenter Marti Heyman of OCLC presented on how to evaluate taxonomy tools.  
SLA Taxonomy Division members can read detailed reports of each of these taxonomy sessions in the next issue of the Division’s “Taxonomy Times” newsletter.
SLA, an international organization, holds its annual conference in June in different cities in North America. It will next be in Charlotte, North Carolina, June 6-9, 2020.

Thursday, May 30, 2019

Knowledge Graphs and Ontologies

Schema DBpedia 2010 from Wikimedia Commons attributed to Charles Sturt University (Creative Commons license)
I’ve been hearing a lot about knowledge graphs recently. Corporate and academic implementations have been increasing in recent years, and now the taxonomy community is also taking an interest. Taxonomy software vendors are talking about knowledge graphs in webinars, blogs, and conferences, and knowledge graphs was on the list of suggested presentation proposal topics for this fall’s Taxonomy Boot Camp London conference.

Knowledge graph purposes and definitions

A knowledge graph is the organization and representation of a knowledge base as a graph, with a network of nodes and links, not as tables of rows and columns. As such, it is generally based on data in a graph database, rather than on a relational database, and graph databases are becoming more popular. A knowledge graph usually includes (but is not limited to) visualizations of data, such as of an output of graph analytics, a display of interconnected nodes and links, or a display of linked data in a “fact box.”

Knowledge graphs can serve various roles and provide many benefits. They support search, recommendation engines, e-commerce, and enterprise knowledge management. They can integrate knowledge, serve data governance, provide semantic enrichment to content, bring structured and unstructured data together, provide a unified view of varied unconnected data sources, provide a semantic layer on top of the metadata layer, improve search results beyond mere algorithms, and answer complex user queries instead of merely returning content on a specified topic. An example of a complex query, which can easily be handled by a knowledge graph linked to the right data, but would be very time-consuming if not impossible by traditional search and query methods would be: “Which of the top 10 scholarly journals (by most often cited), published in Europe in the past 3 years discuss knowledge graphs in the context of knowledge organization systems.”

Google Knowledge Graph fact box example
Google Knowledge Graph example
Like “taxonomy” or “ontology,” the definition of “knowledge graph” is not clear or agreed upon. Knowledge graphs have different meanings from different perspectives, such as those with a computer science vs. information management backgrounds. Sometimes a knowledge base, or at least a knowledge base that is represented as a graph, is considered the same as a knowledge graph. There was even a conference presentation, turned into an article, dedicated to this topic of defining knowledge graphs: "Towards a Definition of Knowledge Graphs," by Lisa Eherlinger and Wolfram Wöß, CEURWorkshop Proceedings.

A Google search with Wikipedia results at top returns the article describing Google’s own “Knowledge Graph” (introduced in 2012 and displayed as fact boxes, as in the example screenshot here for Boston) and a see also “Knowledge graph” (lower case), which redirects to the Wikipedia article “Ontology (information science).”

Knowledge graphs and taxonomies, ontologies, and other knowledge organization systems

Knowledge graphs, like taxonomies, comprise things/nodes/concepts and relationships between them. Knowledge graphs may comprise multiple domains and thus contain multiple taxonomies, thesauri, ontologies, or other knowledge organization systems. Knowledge graphs can link together disparate sources of controlled vocabularies and data.

RDF triple example
RDF Triple example
Knowledge graphs resemble ontologies (a kind of knowledge organization system that is based on taxonomies, but is more complex), but, despite what Wikipedia claims, they are not the same. Knowledge graphs and ontologies both are represented by nodes (things, concepts) and have customized semantic relationships between them. As they both can be visually represented in the same way of nodes and relationships, they may look the same in visualizations. They are both based on RDF (Resource Description Framework) triples (comprising subject-predicate-object), and are usually also based on the Semantic Web standard OWL. All nodes must have their own unique URIs. Specialized software tools are available to create knowledge graphs and ontologies.

Knowledge graphs can be considered ontologies and more. According to the authors, Eherlinger and Wöß, “A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.” A knowledge graph may comprise multiple domain ontologies, or an ontology and another vocabulary/knowledge organization system. A certain kind of very general ontology called an upper ontology or foundation ontology can also serve as the data model for a knowledge graph.

Conferences including knowledge graphs

There are many conferences that now have sessions on knowledge graphs. I cannot explore all of them, but I have attended and will attend several conferences this year that include knowledge graphs in their programs. VOGIN-IP-lezing 2019 "Search and Findability" at which I spoke in Amsterdam in March had a session on a fashion retailer's knowledge graph and a 2-hour workshop “Enterprise Knowledge Graphs." Data Summit, which I attended earlier this month in Boston, had several sessions that mentioned knowledge graphs, one focused on the topic, "From Structured Text to Knowledge Graphs," but not as something new to be defined, but rather as an accepted technology. I'm excited to be co-presenting (presenting the first part on taxonomies and ontologies) in a pre-conference full-day workshop "Fast Track to Knowledge Graphs and Semantic AI," at the SEMANTiCS conference in Karlsruhe, Germany, on September 9. Then I will be presenting  a "A Brief Introduction to Knowledge Graphs," among other presentations, at Taxonomy Boot Camp London in October.

Tuesday, April 30, 2019

Taxonomy Software Trends: Convergence and Visualizations

I recently looked more closely into current offerings of taxonomy software to prepare for an upcoming presentation at the SLA conference in Cleveland in June: “Taxonomy Tools and Tool Evaluation.” I will speak about the tools, and my co-presenter, Marti Heyman, will speak about how to evaluate them. I had last contacted various software vendors in 2015 when I was writing the second edition for my book, The Accidental Taxonomist. I had previously blogged on Taxonomy Software Trends in January 2015 and observed that, since researching software for my first edition in 2009, there is more cloud/web-based software, more SKOS/RDF/Semantic web framework software, and more plugins to SharePoint, content management systems, and search engines. Those trends continue. Now that I look into taxonomy software again, the additional trends I see are taxonomy, thesaurus, and ontology tool convergence and graphical vocabulary visualization.

Taxonomy, thesaurus, and ontology software convergence

Originally there was thesaurus management software (also used for any taxonomies), such as MultiTes, Data Harmony Thesaurus Manager, Synaptica KMS, and other products that no longer exist;  and ontology management software, such as TopBraid Composer, Protégé, ad others. The two kinds of software were very distinct, from different vendors, based on completely different standards and models, with different features, used by different users, for different purposes.

Now, we don’t hear as much about “thesaurus software” as before, but rather vocabulary/taxonomy/knowledge organization system (KOS)/ontology software, where the same software tool supports thesaurus standards (ANSI/NISO Z39.19 or ISO 25964) and ontology standards (OWL and RDF), and especially the SKOS (Simple Knowledge Organization System) model for any kind of controlled vocabulary. This makes sense, because an organization often has needs for more than one kind of controlled vocabulary. Newer software offerings have combined taxonomy, thesaurus, and  ontology software into one. These include Smartlogic Semaphore, PoolParty, TopBraid Enterprise Data Governance’s Vocabulary Manager, Mondeca Intelligent Topic Manager, and VocBench. Synaptica is the exception with two products: Synaptica KMS primarily for thesauri and graph database-based Synaptica Graphite primarily for ontologies.

Visualizations of taxonomies, thesauri, and ontologies

Interactive visualization charts/graphs of taxonomies (what I shall call all controlled vocabularies here) are not something I had paid much attention to, because the feature is not considered so important by a professional taxonomist for creating taxonomies. However, while taxonomists are the primary users of taxonomy management software, other stakeholders in taxonomies are important secondary users. These people include content managers, content strategists, project managers, knowledge managers, information product managers, user interface/experience designers, and subject matter experts. Rather than creating taxonomies, these various stakeholders need to view draft taxonomies and provide feedback on them. Viewing the taxonomy in the user interface used by the taxonomist is often not practical or intuitive. However, viewing the taxonomy as the end-user will see view it may not be possible, because the taxonomy has not yet been implemented into its final system or product. Therefore, a taxonomy visualization feature of taxonomy management software can be quite useful for stakeholder review and input.

Visualizations are especially useful for ontologies with their semantic relationships, but they are also helpful for taxonomies and thesauri. With the convergence of taxonomy, thesaurus, and ontology-creation capabilities in the same software, vocabulary visualization has become a more common feature. However, they are not the same in all vocabulary management software products. Following are some varied examples of visualizations. In many cases, they are interactive, whereby the user can drag and reposition the nodes.

Data Harmony Thesaurus Master offers a “sunburst” visualization for hierarchical taxonomies, as an alternative to the inverted tree display, which is available in the editing interface of the software.

Taxonomy visualization from Data Harmony Thesaurus Master
Visualization from Data Harmony Thesaurus Master

Synaptica KMS has a node and link relationship display for taxonomies and thesauri, where relationships do not need to be defined. Synaptica Graphite will have a new directed-graph visualizer feature added later this year.

Thesaurus visualization from Synaptica KMS
Visualization from Synaptica KMS

Semaphore, Mondeca, and TopBraid EDG Vocabulary Management each have a node and link relationship display for ontologies that additionally describes the types of relationships.

Ontology visualization from Smartlogic Semaphore
Visualization from Semaphore

Ontology visualization from Mondeca ITM
Visualization from Mondeca ITM

Visualization from TopBraid EDG Vocabulary Management
Visualization from TopBraid EDG Vocabulary Management

PoolParty offers a different type of visualization, focusing on the relationships of a selected concept, with each type color-coded. 
Visualization of a taxonomy concept from PoolParty
Visualization from PoolParty

In combination with other graph database tools, both Syaptica Graphite and PoolParty can support interactive nonhierarchical visualizations and graph analytics. This brings us to our next topic, knowledge graphs, which I will dicuss in my next blog post.

Friday, March 29, 2019

Knowledge Modeling

I recently presented a webinar on “knowledge modeling.” I usually have spoken or written only of creating controlled vocabularies, or more specifically taxonomies, rather than creating knowledge models. Now, I am beginning to think of knowledge models and knowledge modeling.

A knowledge model is not just a fancy buzzword for a controlled vocabulary. It’s more complex than that. A knowledge model is more similar to a knowledge organization system, which I defined in an earlier blog post. As a system or a model, it comprises not only the concepts, their labels and attributes, and their relationships, but also rules or policies for their use. Furthermore, a knowledge model is either a complex type of knowledge organization system, such as a thesaurus or an ontology, or a set of multiple controlled vocabularies to be used in combination for the same content set  that form a set of taxonomies, such as facets, but it is not a simple single controlled vocabulary. The designation of “model” is also what is used for RDF, SKOS, and OWL-based systems. These are often called semantic models.

The activity of “knowledge modeling” is also slightly different and more complex than mere “taxonomy creation.”   Taxonomy creation involves identifying concepts through obtaining input from stakeholders/users and from surveying the content, possibly with some additional external resources, but the extent of obtaining user input may vary. It is possible to build a taxonomy, especially one for external users, with no user input and just input from some other stakeholders. Knowledge modeling also involves inputs of people and content, but more emphasis is on stakeholder/user input. Content contains information, but people contain knowledge, so knowledge modeling requires the input of various people, with the input gathered in a comprehensive and systematic way, such as through interactive brainstorming workshops and interviews. Furthermore, knowledge modeling does not look at merely content, but starts out considering the body “knowledge” that can be derived from the content.

Knowledge modeling may also involve a slightly different thinking of the taxonomist or knowledge modeler. Instead of thinking of what terms are needed for indexing and retrieval of a set of content, the knowledge modeler thinks of what are the possible classes, facets, or concept schemes to describe a domain of knowledge, and what are the various user activities and use cases that could be supported. From there, specific concepts are then created. Taxonomy creation involves a combination of top-down and bottom approaches to the hierarchy of concepts, but knowledge modeling puts more emphasis on the top-down approach.

Knowledge modeling is a very apt description for what is involved in designing and creating ontologies, which are knowledge organization systems that describe a domain of knowledge, through concepts, classes of concepts, and customized semantic relationships between concepts of different classes. (Ontologies, by definition, should also follow the OWL standards of the World Wide Web Consortium for data representation.) There are knowledge organization systems which are not ontologies yet make use of some semantic relationships, and designing these also involves the activity knowledge modeling. Determining what additional semantic relationships are desired, how specific they should be, and what they should be named in both directions is very much a knowledge modeling task.

Knowledge modeling also suggests that it is an activity of knowledge management and not merely information management. Knowledge management is defined as “the process of capturing, distributing, and effectively using knowledge,”(Tom Davenport, 1994), which goes beyond the mere support of search, discovery, and retrieval. Knowledge management is especially for internal enterprise-level knowledge.

I think knowledge modeling is more challenging than mere taxonomy creation, but I am up for the challenge.                                                                                                                                                                             

Thursday, February 28, 2019

Taxonomy Building Steps

What are the steps to take when building a taxonomy? This question was posted not long ago to a discussion group of which I am member. I referred the person asking to slides of one of my past presentations, "Everything You Need to Know to Start a Taxonomy from Scratch." That  presentation, however, is more about what to consider in a project of creating a new taxonomy, rather than actual steps to take. So, I’ll summarize the steps here.

The main steps in developing a taxonomy are information gathering, draft taxonomy design and building, taxonomy review/testing/validation and revision, and taxonomy governance/maintenance plan drafting. The steps may overlap slightly.

Information gathering for a taxonomy

Information gathering involves the two sides of the taxonomy: the content to which it will be tagged and the users who will utilize the taxonomy in browsing, searching, filtering, etc.
Information gathering about the content involves looking at a large representative sample of content (documents, intranet or web pages, database records, digital assets, etc.) and determining how they would be classified  and what they are about. Determining how they would be classified is on the higher level of content types or document types. Determining what they are about is on the more specific level of indexing terms. As a former indexer, I approach the task as if I were going to index the documents with index terms of my choosing. These terms are then gathered and organized into the taxonomy. Any existing term lists or sets of metadata should also be gathered and analyzed.

Information gathering about the needs of the users involves conducting interviews or using questionnaires to learn about the information-seeking needs and behaviors of the primary users of the future taxonomy. Some of the users of the taxonomy won’t be those looking for content but rather those who will be publishing or uploading content and they will use the taxonomy to select terms for tagging. Those users should also be interviewed or asked questions on questionnaires, but they are asked different questions than of those who perform information-seeking.

Draft taxonomy designing and building

Creating the taxonomy may begin with an initial high-level taxonomy design and metadata specification, based on the information gathered from users and some of the content. It is at this stage that the taxonomy type (hierarchical, faceted, a combination), any larger metadata schema, and the top terms are determined. Depending on the situation, the taxonomy project owner or other key stakeholders should provide their feedback on the high-level design before detailed taxonomy building begins.

Building out the taxonomy involves approaching the structure from both directions: top down and bottom up. The top-down design and some building comes primarily from the information gathered in speaking with the users and other stakeholders. The bottom-up building comes from the index terms discerned when analyzing sample content. The taxonomy needs to be well designed from both ends and integrate well in the middle. Terms at both ends may be revised in the process.
A well-designed taxonomy not only suits the needs of the users and represents the range of content, but it also needs to follow best practices for taxonomies so that the format of terms and the relationships between terms conform to standards, and thus the taxonomy is logical and intuitive to use. 

Taxonomy review/testing/validation and revision

At one or more points in the process, the taxonomy should be reviewed and tested. Testing should ideally involve both uses of the taxonomy: finding terms to tag content and finding desired content by means of taxonomy terms. This testing can be done with an offline sample of content and taxonomy terms, if the taxonomy has not yet been implemented. Testing may be based on use cases that came out of the initial user interviews.  In this process, concepts missing from the taxonomy whose meaning is unclear can be identified and added or clarified. Testing that is done when the taxonomy is nearly finished and expected to be in good shape might be called “validation.”

Taxonomy governance/maintenance plan drafting

Documenting the policy for the taxonomy and its usage does not come merely at the end of the project but gets started as the taxonomy is built and tested. As issues come up and get resolved, they get documented. Taxonomy governance includes the taxonomy editorial policy/guidelines, the taxonomy use/tagging policy, and policies and procedures for updating and maintain the taxonomy. A taxonomy is expected to change and require updating.


Those with skills in creating index terms need to broaden their skills to include requirements gathering, stakeholder interviewing, and governance planning, if they want to design and build a taxonomy. Those with skills in information project management may need to deepen their skills in best practices for creating taxonomy terms and relationships.  If you would like to develop those skills, I am offering full-day workshops in taxonomy design and creation in Rome, Italy, on March 25, 2019, and in Cleveland, Ohio, on June 15, 2019. I also offer a self-paced online taxonomy course that can be started any time.