As mentioned in my previous
blogpost, “Evaluating Taxonomies,” taxonomy evaluation and taxonomy testing
differ. While the evaluation of a taxonomy by a taxonomist is needed when a
taxonomy is created by non-taxonomists (such as by subject-matter experts
instead), testing of a taxonomy, on the other hand, is recommended in all
cases, no matter who created the taxonomy. Following is an overview of the
different kinds of testing that can or should be performed on a taxonomy prior
to its implementation.
Card-Sorting
There are two kinds of
card-sort tests, open and closed. In open card-sorts, the testers group concepts/topics
together and then assign a broader category of their own; whereas in closed
card sorts, the broad categories are already designated, and the testers merely
categorize the specific concepts/topics within those pre-determined categories.
Open card-sorting, if chosen, is therefore done earlier in the taxonomy design
process, when broad categories are uncertain. A single taxonomy project may
have either or both kinds of card-sorting depending on where the greatest need
is for this additional input of information. Testers could be test end-users or
they could be stakeholders, depending on the needs of the test.
Card-sorting is actually not
really a kind of taxonomy testing but rather a form of taxonomy idea testing. Card-sorting is not
performed on a completed taxonomy to test it but rather to test ideas of
categories/hierarchies which later will be combined to create the taxonomy.
Therefore, card-sorting is not an alternative to the other kinds of testing
described below, which may subsequently be done.
Use Testing
Use-testing or use-case-testing
is a necessary step after a draft taxonomy is built or nearly completed but
before it is finally implemented, allowing for revisions to be made based on
the test results. It is at this point that the taxonomy is put to the test to
see if it will perform as hoped in search/retrieval and (if applicable) for
manual tagging. This type of testing might also be called taxonomy validation.
A cross-section of different
kinds of test users should be recruited to prepare several typical use cases and
perhaps one especially challenging use case of content search scenarios. The
user is then presented with the taxonomy (which can be in any format at this
stage, whether on paper, as an Excel file or as test web page) and asked to
browse the taxonomy to look for terms under which the content for the use
search scenario might be found. The user performs the test, either browsing in
the tester’s physical presence or via screensharing with verbal narration of
what the user is doing and why. The test
administrator takes notes regarding any problems in finding taxonomy terms for
the use case. These findability problems should be considered as requirements
for additional terms, additional nonpreferred (variant) terms to point to existing
terms, or perhaps more polyhierarchy or associative relationships to help guide
the user to find the desired concepts.
If the taxonomy is to be used
for manual tagging or indexing, then a second, different set of use testing is
needed, whereby users who perform this function should test the taxonomy for
indexing of typical and challenging documents that they tend to deal with.
Rather than coming up with use “cases”, the test-user-indexers merely need to
come up with actual documents. The documents should represent a good
cross-section of the various document types indexed. This exercise is even more
straightforward than the user testing for finding content, so it could even be
performed offline without the test administrator present, as long as the
test-user-indexer takes good notes.
A-B Testing
In A-B Testing, the test-users
are presented with two different possible scenarios and asked which they prefer.
When comparing two different taxonomies or parts of taxonomies, only one or two
variations should exist between the two that are compared to make the test
clear-cut. You may set up a series of A-B test pairs to compare multiple
variations. This kind of test is comparable to what an optometrist does for
vision: “Which is better, A or B?” Since
only one or two differences should be compared and tested at a time, A-B
testing is most suitable to compare proposed top-level categories, rather than
getting into the depths of a taxonomy, where it is not practical to conduct a
detailed term-by-term comparison. Thus, A-B testing focuses on high-level structural
design, navigation and browsing, and not the effectiveness of finding and
retrieving content.
A-B Testing can be done at
any time in the taxonomy design and build process. It is also very useful when
considering a taxonomy redesign for comparing the existing taxonomy (A) to a
proposed change (B). A-B Testing is usually done by presenting the test users
with graphical or interactive web page mock-ups. I’ve created the B image to an
existing online A image, by taking a screenshot of A and then edit it in
Microsoft’s Paint accessory. Although each individual A-B test is simple,
deciding what to compare and how many comparison tests to make needs to be
determined, since each test takes time and resources.
Conclusions
Taxonomies should be tested,
but it’s not true that any test is good. Different tests are for different
purposes and fit into different stages of the taxonomy process. An
inappropriate test or inappropriately timed test can be a waste of time and
money.