There is always an interest in free taxonomy or thesaurus management software. Many people who create taxonomies try to save money on purchasing taxonomy management software by simply not using any taxonomy management software but something else they already have, such as Excel. Those who are developing either very large taxonomies or more complex thesauri, however, realize that a dedicated taxonomy/thesaurus management system will save a lot of time and headache in the long term.
Various free thesaurus management software offerings have been available since the early 1990s. They tend to have their origins in academic projects in computer science, information science, or library science at universities, and others have been government projects. Some free software of the previous decade is no longer available, though. Discontinued software is still listed for posterity on the web directory of "Software for building and editing thesauri," started by Leonard Will and now managed on the Taxobank website. For example, two free software products listed were for MS-DOS and one no later than Windows 3.1.
The first free thesaurus software I was familiar with was TheW, a simple thesaurus management software developed by Tim Craven a professor of information science at the University of Western Ontario, since retired. I actually ran across it, because I was at the time exploring another software program of Prof. Craven’s for creating website indexes. TheW32, which is available for Windows XP, Vista, and 8 and for Java, is no longer maintained. It was last updated for Windows in in 2007 and for Java in 2009. At this point, I would no longer recommend it.
Protégé Ontology Editor is an established free and open-source ontology editor from Stanford University. It is quite robust, has an active user community and support groups, and continues to be upgraded (with version 5.0.0 recently released in beta). The issue with Protégé is that it is a native ontology management tool, not a thesaurus management program (or even ontology “lite” as some thesaurus management software can manage semantic relationships and classes). Thus, it takes a very different approach to modeling and building vocabularies, which is not intuitive to taxonomists, such as myself, and, although I downloaded it, I never found it worth the difficulty to learn. If you can truly consider yourself an ontologist, though, then great, this might just be the solution for you.
I had explored some other free software offerings when writing my book, The Accidental Taxonomist, six years ago and came across TemTres and ThManager. At the time I did not find them adequately enforcing valid relationships between terms, so I was somewhat dismissive about the software. Recently I revisited these products.
TemaTres, which has its origins in the Library and the University of Buenos Aires, Argentina still does allow creating duplicate terms, which was my initial cause for concern, but since then the user interface of the latest version (2.1) offers a new configuration option for quality policies, to enable or disallow duplicate terms. Thus, TemaTres is a suitable free thesaurus software product if used by a knowledgeable and experienced taxonomist who knows to set the options and understands the alerts. TemaTres is being supported, and its latest version was just this winter, 2016. The software is web-based, which means that it requires a PHP, MySQL, and HTTP web server, so it may not be the configuration that any independent taxonomist would set up and install in a small/home office. Otherwise, TemaTres is worth looking into.
ThManager is from the University of Zaragoza and GeoSpatiumLab S.L., both in Zaragoza, Spain. ThManager supports the SKOS standard rather than ANSI/NISO Z39.19 or ISO 25964, which means it does not by default enforce all rules of the latter standards. But I have since found this to be a trend of new vocabulary management software: compliance with SKOS and support for ANSI/NISO Z39.19 or ISO 25964, as configurable rather than by default. Thus, I am no longer complaining if it does not support ANSI/NISO Z39.19 by default. The main problem with ThManager, though, is that it is not kept so well up to date. It was last significantly updated in 2006. The installation for even Windows 7 requires a “portable” version due to an installation bug.
More recently I discovered another free thesaurus management software, VocBench. It was developed originally for the management the AGROVOC thesaurus of the Food and Agriculture Organization (FAO) of the United Nations as a joint project of FAO, which is based in Rome, Italy, and the Artificial Intelligence Research group at the University of Rome Tor Vergata. VocBench, like TemaTres, is SKOS-compliant, rather than ANSI/NISO Z39.19 compliant. VocBench is web based, with web server requirements of Apache Tomcat, MySQL, and OWLIM installed on a Sesame2 server.
In addition to being free, these applications tend to have the advantage of being able to run on multiple platforms and yet can be installed and used by single user. The editing features may be a little less standard and thus less intuitive, and documentation and support tends to be less than commercial software. Yet, they are worth considering for long-term experimentation (with no time limit as in commercial demo software), for use in nonprofit or low-budget projects, or by anyone with a strong interest in working with open source software.
Topics related to information management taxonomies posted by the author of the book, The Accidental Taxonomist.
Monday, February 29, 2016
Saturday, January 30, 2016
Polyhierarchy in the SharePoint Term Store
Last year I had the opportunity to create some taxonomy in
the SharePoint Term Store (also called Managed Metadata), and while I am pleased that hierarchical taxonomies
are supported in this widely used platform, I had some concerns about the
support of polyhierarchy, as information about this capability is inconsistent.
So I experimented further.
Polyhierarchy means a taxonomy term has more than one
broader term or parent term. In a traditional hierarchical taxonomy structure,
a term has one broader term (unless it is the top term, in which case it has no
broader term) and multiple narrower terms. Occasionally, though, the logic of
the hierarchy and the practical need to guide users down different possible
paths, makes it beneficial to give a term two or more broader terms. It may
appear to the user that the term is duplicated in different locations in the
taxonomy, but this duplication is in appearances only, because it is the same
term and thus linked/indexed to the same content, no matter which broader term
path the user clicked down through.
An example would be the term Financial report, which is shown in Figure 1 screenshot from the SharePoint Term Store.
![]() |
| Fig. 1 Financial report as a narrower to the term Financial documents. |
It would be
practical to have a broader term of Financial documents and another broader
term of Reports. Some users will look for the term under Financial documents,
and other users will look for it under Reports.
The SharePoint 2010 or 2013 Term Store claims to support the
creation of polyhierarchy, but it has significant limitations.
Polyhierarchy permitted only across different hierarchies
The support of polyhierarchy in the SharePoint Term Store,
takes the notion of “polyhierarchy” too literally by insisting that the two
broader terms of a term in a polyhierarchy actually belong to different
hierarchies. This means that the polyhierarchy can only be created across
different Term Sets in SharePoint. A Term Set is a hierarchy or a facet with a
single top term. It is prohibited to create a polyhierarchy within the same
Term Set. This is quite problematic, because I find that the vast majority of the
time that I want to create a polyhierarhcy it is within the same top-level
hierarchy for facet.
In the example of Financial report, it is logical to have
two broader terms of Financial documents and Reports. Both of these broader
terms, however, are within the same Term Set or facet, which I might call
Document type, so the SharePoint Term Store will not permit this polyhierarchy.
Having the term Financial documents appear under a second broader term within any
other Term Set or facet, on the other hand, such as the Department or Location
facet, is permitted by SharePoint, but this would not be a correct hierarchical
structure by taxonomy standards.
Only one method to create polyhierachy
In the SharePoint Term Store, you cannot create a broader
term relationship; you can create only narrower term relationships. Thus, you
can only create hierarchies from the top down. The normal way to create a
polyhierarchy, however, is to add a second broader term relationship, but this
is not possible in SharePoint. Instead, the same term has to be made as a
narrower term to a second term.
So, if you have the term Financial report as narrower to Financial documents, and you want to make Reports also a broader term (and Reports exists in another Term Set), you would go to the second term that will be the new broader term (Reports), click on Create Term, and type in the name of an existing term (Financial report). SharePoint, however, does not enforce taxonomy standards and permits you to create a new term with the same name as another term (Financial report), but it will not be the same term. You can see at the bottom of the General information pane, that the duplicate Financial report term’s unique identifier is different from the original Financial reports term., as shown in Figure 2.
![]() |
| Fig. 2 General Information for a selected term |
This
matters, because terms are used for indexing/tagging. The term with one ID in
one location may be indexed to some of the content, and the term with the other
ID in the other location will be indexed to other content, and neither term will
be indexed to all the content. This
would be bad for retrieval. So, this method should not be used to create
polyhierarchy.
To create polyhierarchy in SharePoint, go to a second term
that is intended to be the additional broader term (Reports), click on Create
Term and type in the name of an existing term (Financial report). You will see
at the bottom of the screen “Suggestions” with the start of the suggested
matching, with yellow highlighted type-ahead matching, to existing terms in
another Term Set or even another taxonomy group. If you select one of these
suggested terms, then you will indeed be creating a polyhierarchy. After doing
so, you will notice that the tag icon preceding the term becomes the “reused
tag” icon, as shown in Figure 3, in both locations, under the new broader term and under the existing
broader term. You will also notice that when you select the term and view its
General details that the data in the box under Member Of shows that the term is
a member of both hierarchies.
![]() |
| Fig. 3 Reused tag example for the term Marketing |
Importing a taxonomy with polyhierarchy
If you import an externally created taxonomy in CSV format as
a Term Set via the Term Store’s import feature and that taxonomy has
polyhierarchy, the Term Store will not recognize the polyhierarchy, but rather will
treat the polyhierarhcy terms as distinct terms with duplicate names, assigning
them unique IDs. Thus, they could be used inconsistently in indexing/tagging. Therefore,
you should ensure that imported CSV taxonomies should not have any
polyhierarchy.
If you import a taxonomy created in an external
taxonomy/thesaurus/ontology management system which permits polyhierarchy, and that
software has a feature or connector to import to SharePoint Term Store, there
are different methods of dealing with the polyhierarchy issue. The default of
some software, such as Semaphore Ontology Editor and TopBraid Enterprise
Vocabulary Net, is to retain only one of the pair of broader term relationships
upon export. For example, in Semaphore, the first hierarchical relationship
encountered for a term is retained and any other are not, but the user gets an
alert. Wordmap also provides a validation error if there is a polyhierarchy for
import into the same Term Set. Rather
than maintaining a random one of more than one broader term relationship, Synaptica
strips out all broader term relationships if there are more than one, and then
the former polyhierarchy terms show up on the orphan term list for review. In
some software, such as TopBraid EVN, the user can define quality/validation
rules that would identify polyhierarchy, so the user can remove any before
importing into SharePoint. Other software vendors, such as Data Harmony and
PoolParty, say they have work-arounds for the SharePoint import to sort of
support polyhierarchy, but I have not tested these.
In conclusion, the Term Store’s support of polyhierarchy
only across Term Sets (hierarchies or facets) is not very useful, since the
majority of time that we would want to create a polyhierarchy, it is within the
same Term Set, especially if the Term Set is to be used as a facet. A term with
the same name in more than one facet typically would have a slightly different
meaning and usage.
Thursday, December 31, 2015
Vocabularies and Controlled Vocabularies
I have long considered a taxonomy as a particular, structured kind of controlled vocabulary. More recently, however, I have been hearing of “vocabularies” without the word “controlled” in front, although still for the purposes of information management and retrieval, which is cause to wonder: are controlled vocabularies and vocabularies the same thing or not?
Definition
It’s the standards that drive the definitions and also the scope of meaning. “Controlled vocabularies” have been most authoritatively defined and scoped by ANSI/NISO Z39.19-2005 Guidelines for the construction, format, and management of monolingual controlled vocabularies. The Standard’s glossary defines it as: “A list of terms that have been enumerated explicitly.” Vocabulary control is an important part of the definition of controlled vocabularies, whereby synonyms are linked together, homographs are distinguished, and unambiguous concepts are defined or scoped.
Although not part of the standard’s name, ISO 25964 Thesauri and interoperability with other vocabularies (parts 1 and 2 published in 2011 and 2013) also defines controlled vocabularies in its glossary, where it states that a controlled vocabulary is a “prescribed list of terms, headings or codes, each representing a concept.” It is also noted: “Controlled vocabularies are designed for applications in which it is useful to identify each concept with one consistent label, for example when classifying documents, indexing them and/or searching them.”
Scope
As for what is included within the scope of controlled vocabularies, ANSI/NISO Z39.19-2005 states in its Scope section, on the first page that controlled vocabularies include:
What is significant is that ISO 25964 does make a distinction between “controlled vocabulary” and just vocabulary. ISO 25964 describes more kinds of vocabularies, but then addresses the issue of vocabulary control in each. Types of vocabularies that ISO 25964 discusses as having vocabulary control are:
Definition
There is another designation less commonly used of “structured vocabulary.” It appears in the name of the British Standard, BS 8723 Structured vocabularies for information retrieval – Guide. BS 8723 was published in five parts over 2005 – 2008, revising and expanding on the earlier BS and ISO standards for monolingual and multilingual thesauri, and, in turn, became the basis for the current ISO 25964 pair of standards.
ISO 25964 also includes “structured vocabulary” in its glossary, defined as an “organized set of terms, headings or codes representing concepts and their inter-relationships, which can be used to support information retrieval,” and goes on to note: “A structured vocabulary can also be used for other purposes. In the context of information retrieval, the vocabulary needs to be accompanied by rules for how to apply the terms.” Meanwhile, ANSI/NISO Z39.19-2005 does not mention “structured vocabularies.”
Scope
As for what is included within the scope of structured vocabularies, while that is not so clearly stated, it can be assumed, based on the title of BS 8723 Structured vocabularies for information retrieval – Guide, that the vocabularies included within the standard are all “structured vocabularies.” These are:
If all the types of vocabularies in part 2 are indeed considered as “structured vocabularies” then the scope of structured vocabularies would cover:
The simple one-word designation of “vocabulary,” when used in the context of support for information retrieval, comprises all controlled and structured vocabularies, including those at the margin of the definitions or not always meeting their strict requirements of controlled or structured vocabularies, such as ontologies, terminologies, name authority lists, and synonym rings, along with other flat (unstructured) term lists.
Vocabularies, not necessarily controlled or structured, are also what are referred to in other frameworks or web contexts, such as SKOS (simple knowledge organization system) vocabularies, Semantic Web Vocabularies, and Linked Open Vocabularies.
What is interesting to note is what other topics are being discussed when the terms “controlled vocabulary” and “vocabulary” alone are used in ISO 25964 part 2 Interoperability with other vocabularies. Controlled vocabularies are discussed in context of entry terms, pre-coordination, post-coordination, near synonyms, and indexing. Vocabularies in general are discussed in context of equivalence mapping, interoperability, resources and authorities, registries, multilingual types, and management software/systems.
Taxonomies, thesauri, subject heading schemes, and classification schemes are both controlled vocabularies and structured vocabularies. Most controlled vocabularies are structured vocabularies, and almost all structured vocabularies are controlled vocabularies. But there are other vocabularies that do not meet the criteria of one definition or another, and to recognize and include them, especially as resources or for the mapping of terms, we refer to them as just vocabularies.
Controlled Vocabularies
Definition
It’s the standards that drive the definitions and also the scope of meaning. “Controlled vocabularies” have been most authoritatively defined and scoped by ANSI/NISO Z39.19-2005 Guidelines for the construction, format, and management of monolingual controlled vocabularies. The Standard’s glossary defines it as: “A list of terms that have been enumerated explicitly.” Vocabulary control is an important part of the definition of controlled vocabularies, whereby synonyms are linked together, homographs are distinguished, and unambiguous concepts are defined or scoped.
Although not part of the standard’s name, ISO 25964 Thesauri and interoperability with other vocabularies (parts 1 and 2 published in 2011 and 2013) also defines controlled vocabularies in its glossary, where it states that a controlled vocabulary is a “prescribed list of terms, headings or codes, each representing a concept.” It is also noted: “Controlled vocabularies are designed for applications in which it is useful to identify each concept with one consistent label, for example when classifying documents, indexing them and/or searching them.”
Scope
As for what is included within the scope of controlled vocabularies, ANSI/NISO Z39.19-2005 states in its Scope section, on the first page that controlled vocabularies include:
- Lists of controlled terms
- Synonym rings
- Taxonomies
- Thesauri
What is significant is that ISO 25964 does make a distinction between “controlled vocabulary” and just vocabulary. ISO 25964 describes more kinds of vocabularies, but then addresses the issue of vocabulary control in each. Types of vocabularies that ISO 25964 discusses as having vocabulary control are:
- Thesauri
- Classification schemes
- Classification schemes for records management
- Taxonomies
- Subject heading schemes
- Name authority lists
Structured Vocabularies
Definition
There is another designation less commonly used of “structured vocabulary.” It appears in the name of the British Standard, BS 8723 Structured vocabularies for information retrieval – Guide. BS 8723 was published in five parts over 2005 – 2008, revising and expanding on the earlier BS and ISO standards for monolingual and multilingual thesauri, and, in turn, became the basis for the current ISO 25964 pair of standards.
ISO 25964 also includes “structured vocabulary” in its glossary, defined as an “organized set of terms, headings or codes representing concepts and their inter-relationships, which can be used to support information retrieval,” and goes on to note: “A structured vocabulary can also be used for other purposes. In the context of information retrieval, the vocabulary needs to be accompanied by rules for how to apply the terms.” Meanwhile, ANSI/NISO Z39.19-2005 does not mention “structured vocabularies.”
Scope
As for what is included within the scope of structured vocabularies, while that is not so clearly stated, it can be assumed, based on the title of BS 8723 Structured vocabularies for information retrieval – Guide, that the vocabularies included within the standard are all “structured vocabularies.” These are:
- Thesauri
- Classification schemes
- Business classification schemes for records management
- Taxonomies
- Subject heading schemes
- Ontologies
- Authority lists
If all the types of vocabularies in part 2 are indeed considered as “structured vocabularies” then the scope of structured vocabularies would cover:
- Thesauri
- Classification schemes
- Classification schemes for records management
- Taxonomies
- Subject heading schemes
- Ontologies
- Terminologies
- Name authority lists
- Synonym rings
Vocabularies
The simple one-word designation of “vocabulary,” when used in the context of support for information retrieval, comprises all controlled and structured vocabularies, including those at the margin of the definitions or not always meeting their strict requirements of controlled or structured vocabularies, such as ontologies, terminologies, name authority lists, and synonym rings, along with other flat (unstructured) term lists.
Vocabularies, not necessarily controlled or structured, are also what are referred to in other frameworks or web contexts, such as SKOS (simple knowledge organization system) vocabularies, Semantic Web Vocabularies, and Linked Open Vocabularies.
What is interesting to note is what other topics are being discussed when the terms “controlled vocabulary” and “vocabulary” alone are used in ISO 25964 part 2 Interoperability with other vocabularies. Controlled vocabularies are discussed in context of entry terms, pre-coordination, post-coordination, near synonyms, and indexing. Vocabularies in general are discussed in context of equivalence mapping, interoperability, resources and authorities, registries, multilingual types, and management software/systems.
Conclusions
Taxonomies, thesauri, subject heading schemes, and classification schemes are both controlled vocabularies and structured vocabularies. Most controlled vocabularies are structured vocabularies, and almost all structured vocabularies are controlled vocabularies. But there are other vocabularies that do not meet the criteria of one definition or another, and to recognize and include them, especially as resources or for the mapping of terms, we refer to them as just vocabularies.
Friday, November 27, 2015
Taxonomies and Terminologies
The current specialties of taxonomy management and terminology management have different histories and serve different purposes, but they are in fact closely related, and taxonomies and terminologies can be linked to share knowledge. At the annual Taxonomy Boot Camp conference in Washington, DC, earlier this month I met a terminologist attendee (Beate Früh of Büro b3) from Germany, who explained to me that the fields are quite similar, and that’s why she was attending a taxonomy conference. Also at the conference I met a vendor of a new software company (Jochen Hummel, CEO of Coreon), whose product provides both taxonomy and terminology management.
As with the field of taxonomies and taxonomy management, there are varying definitions of terminologies and terminology management. The original meanings of both taxonomy and terminology are as fields of study, with taxonomy being the study of naming and classifying and terminology being the study of terms and their use. More commonly though, we refer to taxonomies and terminologies as sets of terms or concepts for a particular subject area or purpose.
Definitions of terminology include “technical or special terms used in a business, art, science, or special subject” (www.merriam-webster.com), and a “set of designations belonging to one special language” (ISO 1087-1:2000, 3.5.1), with “each designation representing a concept” ISO 25964-2:2013. According to International Information Centre for Terminology (InfoTerm): "The systematic organization and definition of concepts is called terminology management – which also includes classification.” (T.E.R.M.I.N.O.L.O.G.Y. PDF)
There are several differences between taxonomies and terminologies. The most obvious difference is that taxonomies have hierarchical relationships between the terms/concepts so as to create an overall hierarchical structure, and terminologies generally do not. Other differences are that terminologies contain more detailed terms than are found in a taxonomy for a comparable subject area. Furthermore, while taxonomies are limited to nouns and noun phrases (including verbal nouns), terminologies may contain some specific adjectives. Terminologies generally include definitions for every term, which is not so typical for taxonomies. Many terminologies are used to support foreign language translation, so there are usually foreign language equivalents for every term, something found in only a small minority of taxonomies. In general, there is more data for a term in a terminology than in a taxonomy.
The most significant difference between taxonomies and terminologies is how they are used. Taxonomies serve information retrieval, through a combination of indexing/tagging use and browsing/navigation and/or search support. Rather than serve information retrieval, the main purposes of terminologies are to support standard use of terms, especially technical terms, with agreed-upon meaning for creating technical documentation and for foreign language translations. Translation has historically been the field of greatest use of terminologies. As such, many terminologists have a background in translation or linguistics. The co-authors of a leading book in the field of terminology, Handbook of Terminology Management, are both professors of translation.
Another difference is in regional use. Taxonomies are especially widely used in the United States and other English-speaking countries, while growing elsewhere too, whereas terminologies are more widely used in Europe and bilingual countries such as Canada. Member organizations of Infoterm, the independent international association focused on terminology, include numerous organizations in Europe, a few in each of Africa, Asia, Latin America, and Canada, but there are no organizations in the United States.
Finally, there are a greater number of standards for terminologies. There are a large number of currently published standards of ISO committee 37 for Terminology and Other Language and Content Resources, including five standards of the Principles and Methods subcommittee, 14 of the Terminographical and Lexicographical Working Methods subcommittee, and five standards of the Systems to Manage Terminology, Knowledge and Content subcommittee, including ISO 30042:2008 TermBase eXhange (TBX). For taxonomies, on the other hand, standards are fewer, or, if considering specifically taxonomies, there actually are no standards, as the most relevant standards are for thesauri (ISO 25964 or ANSI/NISO Z39.19), ontologies (OWL, based on RDF), or more broadly web-based knowledge organization systems(SKOS).
Despite their differences, taxonomies and terminologies both are kinds of vocabularies or controlled vocabularies (depending on how “controlled vocabulary” is defined, the topic of my next blog post). The international standard ISO 25964 Thesauri and interoperability with other vocabularies, (part 1 in 2011 and part 2 in 2013) discusses the following “other” vocabularies (as listed in its table of contents): classification schemes, taxonomies, subject heading schemes, ontologies, terminologies, name authority lists, and synonym rings. Thus, terminologies are listed right along with taxonomies and ontologies. The United States standard ANSI/NISO Z39.19-2005 Guidelines for the Construction, Format and Management of Monolingual Controlled Vocabularies, however, does not include terminologies in its more limited scope: “Controlled vocabularies covered in by this Standard includes lists of controlled terms, synonyms rings, taxonomies, and thesauri.” (Section 2 Scope).
The most important similarity is that both taxonomies and terminologies refer to terms and unique concepts and not to mere words. As such, they often include and bring together synonyms or other variants to disambiguate concepts. While terminologies don’t characteristically have relationships between terms, they sometimes do.
Due to these similarities, it is quite feasible to have connections, links, mappings, etc., between terms in a taxonomy and in a terminology. Taxonomies and terminologies for internal content within the same organization will have a lot of overlap, so it makes sense to leverage the same knowledge bases and either reuse the same terms in taxonomies and terminologies or at least link/map the equivalencies, both to save effort and to ensure consistency of understanding across and organization. ISO 25964-2 Thesauri and interoperability with other vocabularies includes a section on guidelines for the interoperability between thesauri (and, by extension, taxonomies) and terminologies:
Hopefully, more organizations will be developing both taxonomies and terminologies where they are lacking and also build connections between the two.
As with the field of taxonomies and taxonomy management, there are varying definitions of terminologies and terminology management. The original meanings of both taxonomy and terminology are as fields of study, with taxonomy being the study of naming and classifying and terminology being the study of terms and their use. More commonly though, we refer to taxonomies and terminologies as sets of terms or concepts for a particular subject area or purpose.
Definitions of terminology include “technical or special terms used in a business, art, science, or special subject” (www.merriam-webster.com), and a “set of designations belonging to one special language” (ISO 1087-1:2000, 3.5.1), with “each designation representing a concept” ISO 25964-2:2013. According to International Information Centre for Terminology (InfoTerm): "The systematic organization and definition of concepts is called terminology management – which also includes classification.” (T.E.R.M.I.N.O.L.O.G.Y. PDF)
Differences
There are several differences between taxonomies and terminologies. The most obvious difference is that taxonomies have hierarchical relationships between the terms/concepts so as to create an overall hierarchical structure, and terminologies generally do not. Other differences are that terminologies contain more detailed terms than are found in a taxonomy for a comparable subject area. Furthermore, while taxonomies are limited to nouns and noun phrases (including verbal nouns), terminologies may contain some specific adjectives. Terminologies generally include definitions for every term, which is not so typical for taxonomies. Many terminologies are used to support foreign language translation, so there are usually foreign language equivalents for every term, something found in only a small minority of taxonomies. In general, there is more data for a term in a terminology than in a taxonomy.
The most significant difference between taxonomies and terminologies is how they are used. Taxonomies serve information retrieval, through a combination of indexing/tagging use and browsing/navigation and/or search support. Rather than serve information retrieval, the main purposes of terminologies are to support standard use of terms, especially technical terms, with agreed-upon meaning for creating technical documentation and for foreign language translations. Translation has historically been the field of greatest use of terminologies. As such, many terminologists have a background in translation or linguistics. The co-authors of a leading book in the field of terminology, Handbook of Terminology Management, are both professors of translation.
Another difference is in regional use. Taxonomies are especially widely used in the United States and other English-speaking countries, while growing elsewhere too, whereas terminologies are more widely used in Europe and bilingual countries such as Canada. Member organizations of Infoterm, the independent international association focused on terminology, include numerous organizations in Europe, a few in each of Africa, Asia, Latin America, and Canada, but there are no organizations in the United States.
Finally, there are a greater number of standards for terminologies. There are a large number of currently published standards of ISO committee 37 for Terminology and Other Language and Content Resources, including five standards of the Principles and Methods subcommittee, 14 of the Terminographical and Lexicographical Working Methods subcommittee, and five standards of the Systems to Manage Terminology, Knowledge and Content subcommittee, including ISO 30042:2008 TermBase eXhange (TBX). For taxonomies, on the other hand, standards are fewer, or, if considering specifically taxonomies, there actually are no standards, as the most relevant standards are for thesauri (ISO 25964 or ANSI/NISO Z39.19), ontologies (OWL, based on RDF), or more broadly web-based knowledge organization systems(SKOS).
Similarities
Despite their differences, taxonomies and terminologies both are kinds of vocabularies or controlled vocabularies (depending on how “controlled vocabulary” is defined, the topic of my next blog post). The international standard ISO 25964 Thesauri and interoperability with other vocabularies, (part 1 in 2011 and part 2 in 2013) discusses the following “other” vocabularies (as listed in its table of contents): classification schemes, taxonomies, subject heading schemes, ontologies, terminologies, name authority lists, and synonym rings. Thus, terminologies are listed right along with taxonomies and ontologies. The United States standard ANSI/NISO Z39.19-2005 Guidelines for the Construction, Format and Management of Monolingual Controlled Vocabularies, however, does not include terminologies in its more limited scope: “Controlled vocabularies covered in by this Standard includes lists of controlled terms, synonyms rings, taxonomies, and thesauri.” (Section 2 Scope).
The most important similarity is that both taxonomies and terminologies refer to terms and unique concepts and not to mere words. As such, they often include and bring together synonyms or other variants to disambiguate concepts. While terminologies don’t characteristically have relationships between terms, they sometimes do.
Linkages
Due to these similarities, it is quite feasible to have connections, links, mappings, etc., between terms in a taxonomy and in a terminology. Taxonomies and terminologies for internal content within the same organization will have a lot of overlap, so it makes sense to leverage the same knowledge bases and either reuse the same terms in taxonomies and terminologies or at least link/map the equivalencies, both to save effort and to ensure consistency of understanding across and organization. ISO 25964-2 Thesauri and interoperability with other vocabularies includes a section on guidelines for the interoperability between thesauri (and, by extension, taxonomies) and terminologies:
- Concepts may be mapped between a thesaurus and a terminology, and should follow the same methods and best practices as mapping between two thesauri (22.3.2)
- Terminologies are useful as sources for concept of terms when building or maintaining a thesaurus. They can also be referred to when writing scope notes. (22.3.3)
- A search thesaurus or synonym ring may be built using a combination of a thesaurus and a terminology. (22.3.4)
Hopefully, more organizations will be developing both taxonomies and terminologies where they are lacking and also build connections between the two.
Find out more about terminologies
- AILIA Language Industry Association (Canada): What is Terminology?
- Inforterm: Why Terminology
- International Network for Terminology (TermNet)
- Uwe Muegge. "Disciplining words: What you always wanted to know about terminology management" tcworld 2.3 (2007)
Tuesday, October 6, 2015
Taxonomies and Tables of Contents
A table of contents and a hierarchical taxonomy appear to be quite similar. In my last blog post I looked at taxonomies and indexes, and in the end concluded: “A taxonomy serves a purpose that is both, or something in-between, that of a table of contents and a back-of-the-book index. It’s for searching (like in an index) and also for navigating (like in a table of contents), but it points to the subsection level (as in a detailed table of contents), not to a page (as in an index).” Taxonomies, especially the thesaurus kind, have many similarities to indexes when it comes to looking up a topic. Taxonomies, especially the hierarchical kind, are also similar to a table of contents or the navigation aid to a set of content.
Despite the apparent similarities in hierarchical structure and the the purpose of supporting browse navigation, the differences between a table of contents and a hierarchical taxonomy, however, are far greater than the differences between a displayed index and a search-supporting thesaurus.
A table of contents provides navigation, whether for a printed book or large document or for an electronic document or collection. In fact, in a MS Word document with headings, a table of contents that is generated in the left margin pane from those headings is called “Navigation.” Labels in a table of contents or navigation system are arranged like a taxonomy but are not exactly a kind of taxonomy.
There are certain editorial conventions for content, such as having units of a roughly standard length, which then impact the table of contents or navigation. While there are some variations, one chapter or section is typically not twice as long as another. To achieve balance, a large topic may be spread out over two or more sections, whereas several small topics are grouped together under a heading that is a serial list (such as “Poverty, Inequality, and Mobility”), or under “Other.” Thus, a table of contents topics are based on the amount of material presented. Taxonomy structure, on the other hand, looks at the terms/concepts only, and does not take into consideration the amount of content per term. There is once concept per term, not a list. Rare occurrences of two concepts combined into a single term, such as “Author voice and tone,” are the consequence of two topics being very closely related with overlapping meaning and usage.
While a table of contents or navigation system is not a taxonomy, nor should it be used as a taxonomy, when a legacy print source is converted to units of digital content, a table of contents is still an excellent source for creating a taxonomy.
Despite the apparent similarities in hierarchical structure and the the purpose of supporting browse navigation, the differences between a table of contents and a hierarchical taxonomy, however, are far greater than the differences between a displayed index and a search-supporting thesaurus.
A table of contents provides navigation, whether for a printed book or large document or for an electronic document or collection. In fact, in a MS Word document with headings, a table of contents that is generated in the left margin pane from those headings is called “Navigation.” Labels in a table of contents or navigation system are arranged like a taxonomy but are not exactly a kind of taxonomy.
Navigation is not a taxonomy
Navigation or a table of contents has to perfectly reflect the content that it belongs to. It is completely customized. Two books on the same subject cannot have the same table of contents. The same taxonomy, however, may be used for more than one content source and typically is. In a table of contents or navigation, each navigation entry, menu label, or heading matches one-to-one to a single, specific section or web page. Terms in a taxonomy are intended to be used more than once, so each term in a taxonomy is linked to multiple documents or content items. As such, taxonomy terms need to be somewhat generic, whereas labels or headings in a table of contents or navigation can be specific. Taxonomy terms also need to be created with the anticipation of serving not only current content but also future content, whereas navigation or table of contents entries need only reflect the current content.
Different label wording
In addition to being more generic, taxonomy terms differ from table of contents entries or navigation labels in other ways.
- The names of chapters and headings may be longer descriptions (such as “Procedures to Enhance the Accuracy and Integrity of Information Furnished”), whereas taxonomy terms should be concise to aid skimming. A complex topic with a complex heading, can be covered with a combination of taxonomy terms instead of a single complex term, because taxonomy terms do not need to match all content one-to-one (such as the combination of terms: Information accuracy, Information integrity, and Information-gathering procedures).
- The names of chapters and headings might be question phrases (such as “Why study statistics?”), whereas taxonomy terms should be nouns or adjective-noun phrases and start off with a “keyword” likely to be looked up (not “Why”) to support alphabetical lookup options. Even in a hierarchical taxonomy display, a list of terms at the same hierarchical level tend to be arranged alphabetically.
- Table of contents entries may be context-specific based on the parent/broader level (such as “Identification and General Terms” or “Special Concerns”), and, in fact, the same sub-heading could repeat under different broader headings. In a taxonomy, each term should be independently unambiguous.
- Table of contents often start off naming introductory information (such as “Introduction to Identity Theft”) or have sections for Conclusions, neither of which should be terms in a taxonomy. If the same topic is covered three times, in an introduction, body, and conclusions, it will be indexed with the same single taxonomy term, and the end-user will retrieve all indexed results on that topic grouped together.
- Table of contents or navigation headings can be like titles, which may be “catchy” or enticing to the reader, especially at the top level. Taxonomy terms, by contrast, are clear, concise, and common (based on what most users would call the concept), and not especially creative.
Different structure
Tables of contents and taxonomies also differ in their structure. Tables of contents or navigation schemes reflect the organization of content, which may be chronological, pedagogical, from fundamental to detailed, from most important to least important, or the order of perceived user interest. In a taxonomy, the terms at each hierarchical level are arranged alphabetically by default. In a navigation there are no “related terms”, so what appear as subtopics might not be taxonomical narrower terms, but just related terms. Taxonomies, on the other hand, must follow the ANSI/NISO Z39.19 guidelines or ISO 25964 with respect to structuring hierarchical relationships: narrower terms bust be specific types, instances, or integral parts of their broader terms. By having this standard format, a taxonomy provides organizational predictability for all kinds of users and all kinds of content.
There are certain editorial conventions for content, such as having units of a roughly standard length, which then impact the table of contents or navigation. While there are some variations, one chapter or section is typically not twice as long as another. To achieve balance, a large topic may be spread out over two or more sections, whereas several small topics are grouped together under a heading that is a serial list (such as “Poverty, Inequality, and Mobility”), or under “Other.” Thus, a table of contents topics are based on the amount of material presented. Taxonomy structure, on the other hand, looks at the terms/concepts only, and does not take into consideration the amount of content per term. There is once concept per term, not a list. Rare occurrences of two concepts combined into a single term, such as “Author voice and tone,” are the consequence of two topics being very closely related with overlapping meaning and usage.
Conclusions
While a table of contents or navigation system is not a taxonomy, nor should it be used as a taxonomy, when a legacy print source is converted to units of digital content, a table of contents is still an excellent source for creating a taxonomy.
Subscribe to:
Posts (Atom)


