Tuesday, November 13, 2018

Taxonomy Boot Camp, 2018: AI and Taxonomies


Artificial intelligence (AI) is not new, but it is becoming more ubiquitous, and its applications are growing within other specializations in information management, knowledge management, and content management, including taxonomies. Hence the theme for this year’s Taxonomy Boot Camp conference (November 5-6, 2018, Washington DC) was “Bridging Human Thinking and Machine Learning.”

This was the 14th Taxonomy Boot Camp conference and its 9th year in Washington, DC, which (along with the newer Taxonomy Boot Camp London) is the only conference dedicated to taxonomies. As usual, it is held along with several other co-located conferences of Information Today Inc., which overlap or are consecutive. The format, as in past years, involved an opening keynote, after which the conference breaks in two tracks of sessions the first day, one more basic and one more advanced, then on the second day a joint keynote with KMWorld conference, and a single track for the rest of the second day. By a show of hands, it appeared that 75% of the Taxonomy Boot Camp attendees were first-timers, even more than before. There were 235 attendees, including speakers and sponsors.

While the conference has two tracks the first day, a more basic and a more advanced track, presentations on machine learning and AI were in both tracks. These included “Taxonomy & Machine Learning at the Knot,” “Sandwiches, Categories, Ethics & Machine Learning,” “Taxonomy Skills in the World of AI” (a panel), “Semantic AI: Fusing Machine Learning with Knowledge Graphs,” “Semantic Search Enrichment,” “Taxonomies and AI Chat Boxes,” and “Taxonomy in the Age of Amazon Echo,” and “Applying Taxonomy Skills to Cognitive Computing” (a project involving IBM Watson data privacy research product of Thomson Reuters).
In “Semantic AI: Fusing Machine Learning with Knowledge Graphs,” presenter Andreas Blumauer of the Semantic Web Company said that increasingly companies are adopting knowledge graphs as their IT infrastructure, and leading players are trying to fuse knowledge graphs with machine learning. A knowledge graph has to be stored in a graph database. There are two types of graph database models: property graphs and RDF graphs. RDF graphs are more important for knowledge graphs.

Semantic AI core principles include the following.
      It’s about things not strings.
      It’s more than metadata: it describes the meaning of metadata as an additional, semantic layer.
      The knowledge graph establishes the semantic layer.
      Knowledge graphs can be seen as an input for machine learning.
      AI isn’t always good at understanding questions so a taxonomy/ontology is needed to support it.
      AI should be built upon data quality, data as a service, no black box, a hybrid approach, as structured data meeting text, aiming towards self optimizing machines (a vision, as we are not there yet).

Use cases of knowledge graphs include a recommendation engine. A knowledge graph is the basis behind the recommendation engine providing content, taking into consideration users.
In “Taxonomy & Machine Learning at the Knot,” the presenters of the web media company the XO Group, started with a good introduction to machine learning, starting off with explaining the problems it can solve: predicting behavior, automating tedious steps, and classifying; and that there are two types: supervised and unsupervised. Common applications include clustering, recommendations, and classification, and each of these can involve taxonomies. Specific implementation examples were provided.

As with last year, there was also a lot of talk of auto-categorization (automated or machine-aided indexing) across various session. Three were dedicated to the subject: “Driving Discovery: Combining Taxonomy & Textual AI at Sage” (a case study using Expert System auto-categorization) “Testing for Auto-tagging Success” and “Classification Relevance at Associated Press.” AP has an automated rules-based classification system for Subjects, Geography, and Organizations. Rules based auto-classification was chosen over the statistical method, because it offers transparency and control, breaking news and low frequency terms can be dealt with (don’t need the existing training set), you can scope/disambiguate between terms better, such incident type terms (Violent crime) vs. issue terms (Domestic violence), and semantic rules ensure there is not must passing mention. Entity extraction with disambiguation rules is used for person names and publicly-traded companies.

Knowledge graphs are getting more attention both here and at Taxonomy Boot Camp London. This was, of course, the main topic of the presentation Andreas Blumauer’s talk “Semantic AI: Fusing Machine Learning with Knowledge Graphs,” and Mike Doane, in the introduction of his talk on “Taxonomy in the Age of Amazon Echo  said that the information industry analysis firm Gartner reports that knowledge graphs are on the rise and are discussed more than taxonomies. Gartner is tracking knowledge graphs instead of taxonomies and ontologies.

While the opening keynote did not focus on AI or machine learning, it was presentation by a computational linguist, Deborah McGuinness, a professor of Computer, Cognitive, and Web Sciences, at Rensselaer Polytechnic Institute. Among other things, she spoke of the Data life cycle, whereby a computer understandable specification of meaning (semantics) supports enhanced lifespan and impact of data. She went on to include to specific ontology case examples.

Nearly all session slides are available to download, except the keynotes, without any login credentials at: http://www.taxonomybootcamp.com/2018/Presentations.aspx

Tuesday, October 30, 2018

Taxonomy Boot Camp London, 2018


This October, for the third year in a row, I have enjoyed the opportunity to attend and present at Taxonomy Boot Camp London (TBCL).

Similar in subject area scope, but with unique presentations, to its parent conference Taxonomy Boot Camp (TBC), usually held in Washington, DC, in November, I find it worth my time to attend both conferences. Despite what might be considered a niche topic for select audience, TBCL remains a strong conference with consistent attendance (about 170 participants), comparable to TBC in its earlier years. The size is large enough to offer a choice of two tracks but small enough to easily network with others. The conference speakers and attendees are quite international, representing 22 countries this year.

Conference Format


TBCL continues to differ from TBC by having two tracks on both days, instead of just on the first day as TBC does. It also has a pre-conference workshop day, which TBC lacks, a full-day Taxonomy Fundamentals workshop (which I lead), and two half day workshops on more specialized or advanced taxonomy topics, which are not the same each year. This year the half-day workshops were on text analytics and taxonomies in SharePoint

For the first time, Taxonomy Boot Camp London presented two awards (which Taxonomy Boot Camp in Washington, DC, does not do.) The winner of the Taxonomy Practitioner of the Year award was Tom Alexander, Taxonomy Manager, Cancer Research UK. The winner of the Taxonomy Success of the Year award was SAGE Research Methods Thesaurus, led by Alan Maloney & Martha Sedgwick, SAGE Publishing.

Exhibits


The exhibit/sponsor showcase is very different at TBCL from TBC. TBC has a small dedicated exhibit on its first day, but then shares the much larger KM World exhibit with the four other co-located conferences. TBCL’s exhibit space is similar to that of TBC’s first day, with just three software vendor sponsor-exhibitors (Synaptica, Access Innovations, and Semantic Web Company/PoolParty). However, there was a larger number of organizational supporter-exhibitors: Association for Independent Information Professionals, the Information Retrieval Specialist Group of the British Computer Society, the Danish Union of Librarians, the Knowledge & Information management Special Interest Group of CILIP (Chartered Institute of Library and Information Professionals) of the UK, the Information and Records Management Society of the UK, the UK Chapter of the International Society for Knowledge Organization (ISKO), the Network for Information & Knowledge Exchange of the UK, the SLA (Special Libraries Association) Europe chapter, and the SLA Taxonomy Division. This was a greater number of organizations than last year. The significant involvement of professional associations in TBCL contrasts with the relative lack of professional associations involved in TBC.

TBCL continues to be co-located with another Information Today conference, Internet Librarian International, but their exhibit areas are somewhat separate (although attendees of both conferences can visit booths of either conference), since their audience and market is different. Other than the drinks reception the first day, the two conferences do not share anything, such as keynotes.



Keynotes


There were three keynote presentations, two consecutive contrasting keynotes the first day and one the second day.  

The opening keynote was indeed a keynote style talk, which was on the broader subject of information on the web, rather than on the specifics of taxonomies. “This is the Bad Place: 13 Rules for Designing Better Information Environments,” was presented by Paul Rissen, Product Manager at Springer Nature UK and previously at BBC. In his thought-providing presentation he aimed at establishing “ground rules” for using the web (especially social media) and for public discourse in general.

This was followed by a more down-to-earth state of the profession talk by Dave Clarke, CEO of Synaptica, titled “Catching the Wave: What Tools do Taxonomists Need to do Their Job.” Although Synaptica was the lead sponsor of the conference, this was not promotional talk. Dave started out be summarizing what taxonomists do and enable as organize, categorize, and discover, and explained the different tools for each. More of Dave’s presentation was about what taxonomists are doing based on the results of a survey of taxonomists he has been conducting (https://twitter.com/DavidClarkeBlog). Then Dave turned to what he considered to be the future trends and issues. Artificial intelligence (AI) is relevant to what we do, but it will not replace the need for human-curated taxonomies or ontologies. Rather, taxonomies and ontologies will empower AI with the semantics and log to improve search and categorization and perform machine learning. Ontologies and linked data can help build smarter search and discovery applications by leveraging the logical dependencies. Linked open data is shared openly, and linked enterprise data is behind the firewall where the linked data model also works well.

The second day’s keynote addressed an important topic. “Selling the Benefits of Taxonomy: Numbers and Stories” was presented by taxonomy and text analytics consultant Tom Reamy. Tom’s argument was that return-on-investment (ROI) studies, with their numerical data on time spent, are not sufficient to convince decision-makers of the benefits of taxonomies, and that use case stories and internal advocacy are also needed. Stories can describe the increased richness of knowledge discovery, better decisions, and analysis of complex issues. He also suggested selling the vision of a taxonomy by means of a mini demo. Tom then turned to text analytics as the important means to make taxonomies usable, as he is rather dismissive of manual indexing. He explained that text analytics is often called auto-categorization, because that was the first use of it, but that text analytics can be used for other things, too.

Conference Sessions


The more basic track had sessions on taxonomy development, user validation, taxonomy resources, taxonomy development approaches, information architecture, enterprise information management, tagging, and taxonomy standards and architecture. I attended mostly sessions of the more advanced track, though.

A theme of the conference, as stated in the program was “Making taxonomies go further,” and conference chair Helen Lippell stated in her welcome the opportunity to “push your practice further.” This was especially true of several of the advanced track sessions I attended. “Using Ontologies for more than Information Categorization,” presented by Ahren Lehnart and Jim Sweeney of Synaptica, suggested using ontologies for project and product management and in support of various other business functions in sales, marketing, partner and competitor information management, etc.  “Beyond Taxonomy Classification: Using Knowledge Models and Linked Data to Unlock New Business Models” was presented by Ben Miller of Wiley. He spoke of knowledge models, as comprising content acquisition and content enrichment. Jim Sweeney also presented “Taking Your Show on the Road: Publishing Taxonomies and Ontologies as Linked Data,” which was a good introduction to Linked Data. In this presentation, he also introduced graph databases and their benefits. While not explicitly discussing taxonomies, Rahel Anne Baile’s talk, “Introduction to Information 4.0,” suggested another application for taxonomies which content is in “molecules and objects,” rather than on as documents, or based on pre-determined topics.  Multilingual taxonomies and taxonomy implementation in SharePoint were the topics of other presentations.

 
I am looking forward to Taxonomy Boot Camp in Washington, DC, next week, and Taxonomy Boot Camp London again next year which has been scheduled for the same venue October 15-16, 2019, with preconference workshops on October 14.

Thursday, September 6, 2018

An Open Vocabulary Tagging Experiment for Discoverability


Does tagging content with terms from a shared, publicly available controlled vocabulary make a difference in increasing content discoverability on the web? A colleague of mine proposed finding out by experimenting with tagging the same content, such as two identical blog posts, differently: one with terms typical for posts on the blog and one with terms from a publicly available controlled vocabulary. Then after a few weeks the statistic of visitor traffic to the two post versions would be compared.

Wikidata  and VIAF, were chosen as the sources of publicly available controlled vocabulary terms. Since VIAF contains only name authorities (proper nouns), I used terms just from Wikidata in my blog tagging experiment, whereas my colleague used terms from both Wikidata and VIAF in his blog post tagging experiment (The Open Web Tagging Experiment on the Ol' Patio Boat Blog).

The preceding blog post on The Accidental Taxonomist blog, "Using Linked and Other Open Vocabularies," had been posted twice identically, except that one version was tagged with terms from Wikidata, linking to them, and one was tagged with terms that have been created and used just for The Accidental Taxonomist blog. I did not linked to either blog post from other social media, as I usually do. (Now that the experiment is over, I deleted the duplicate blog post with the lower number of visitors recorded.)

After 18 days, I checked the statistics for the number of visitors to each blog post. The version with the blog's own tags (the tagging feature supported by Blogger.com) had 72 visitors, and the version without blog tags but with links to Wikidata tags had 104 visitors. (By contrast, this post "An Open Vocabulary Tagging Experiment for Discoverability" had in the same period attracted 119 visitors, without any tags or links to Wikidata terms during this period.)

The conclusions are not certain, but it appears as if links out to Wikidata may have helped in that post's discoverability, since the post with those links had more visitors. It also appears that blog tags do not seem to help significantly in discoverability, since of the three posts, the one with those tags had the least number of visitors, although the tags are useful for finding specific posts once you are on the blog's home page.  The results of my colleague's test of two identical posts with and without tagging were different, though. He concluded the opposite, that coping Wikidtata and VIAF headings into a post with incoming URLs had no effect, but putting metadata into Blogger tagging field did increase visibility. However, his visitor traffic in both cases was very low, so the difference was perhaps not statistically significant.

As for this post, which had no tags, but the highest number of visitors, that could be attributed to a post title with more searched key words and phrases in it.

Search engine optimization is a big and ever-changing field. Rather than try to game the search, I will return to my method of posting about my blog posts on social media and hope my connections will share and repost. 



Using Linked and Other Open Vocabularies

Taxonomy terms assigned to content items makes the content easier to find, whether in an internal system, on the web, or both. To make content easier to find or discover on the web, the use of taxonomy terms or tags is part of the broader application of search engine optimization (SEO). A lot has already been written by others regarding tips for creating and adding terms/labels/tags to web content to support SEO, such as how many and how specific they should be. For the taxonomist, who is interested not only in the terms alone but also in the larger taxonomy to which they belong, another question is whether using terms from shared, publicly available controlled vocabularies makes a difference in increasing content discoverability on the web. 

Linked open data and linked open vocabularies


Shared, publicly available controlled vocabularies may or may not be linked or linkable, as linked open vocabularies. So, just because a controlled vocabulary is publicly available does not mean that it inherently supports linked data on the web.

Linked data,” which usually is linked open data, refers to methods to interlink structured content in a way that can be read automatically by computers to enable the discovery of content on the web. It is described in a set of W3C specifications for web publishing that makes the data or content part of the Semantic Web. This means that instead of manually following individually created hyperlinks, semantic links and computer readable formats support automated relevant linkages among content. Linked data requires the use of named URIs to identify things, HTTP URIs for web lookup, and structured data using controlled vocabulary terms and dataset definitions expressed in an RDF standard framework. “Linked open data” additionally includes open use in accordance with an open license.

Terms in taxonomies can serve as labels to linked content as part of linked data. Additionally, although less common, taxonomy terms themselves can be the content that is linked to, if the taxonomy concepts are individually assigned URIs and HTTP addresses, and are in an RDF format.

Limitations to designating content as linked open data


If you have a document on the web that you want to have discovered as part of the Semantic Web, designating it as linked data is not so simple, because you need to include the machine-readable instructions, such as through a SPARQL endpoint or an API (application programming interface), in addition to the RDF designation. Not only is this technically outside the skills of most individual web content creators and taxonomists, but depending on how the content is managed, standard web content management systems or blog posting software may not even support editing the HTML of the page to insert such instructions

Institutions may register their content with a linked open data repository. The main repository of linked open vocabularies is Linked OpenVocabularies (LOV), hosted by the Ontology Engineering Group of the Computer Science School at Universidad Polit├ęcnica de Madrid. An individual blogger, however, who would like to make an individual blog post linked open data, cannot easily achieve that status.

Simply linking to shared, open vocabularies


Thus, if linked data instructions cannot easily be included and traditional manual links back to the page (as by means of agreed-upon link exchanges) cannot be established for practical reasons, tagging could be done with terms from a publicly available controlled vocabulary that is not part of linked open data and linked open vocabularies. Two good examples are the labels of Wikidata and the Virtual International Authority File (VIAF).

Wikidata  is a free, open, collaborative, multilingual collection of structured data. Its purpose is to support Wikipedia, Wikimedia Commons and other wikis of the Wikimedia movement, as well as anyone who wants to search, use, edit or consume its data. The data contained in the Wikidata repository consists of items, each with a unique name and ID. Currently there are 50,116,886 data items. Each item has a brief glossary definition, equivalent names in other languages, relationships ("statements”) to other data items (such a "subclass of" and "designed by"), and identifiers in other vocabularies (such as Freebase, Library of Congress authorities, and Quora topic). 

VIAF, hosted by OCLC, contains just named entities (proper nouns). But it uniquely brings together and displays as a group the headings that are the authority used by each contributor for that term. So, it’s not exactly a controlled vocabulary. VIAF has over 40 international member-contributors, most of which are national libraries.

Is there any benefit in tagging with and linking to terms that are part of a controlled vocabulary which is publicly available but is not a linked open vocabulary, such a Wikidata or VIAF? A colleague of mine proposed finding out by experimenting with tagging the same content with terms from different sources. Results will be shared in a later blog post.

Thursday, August 30, 2018

Taxonomy Hierarchical Relationship Issues


A common feature of taxonomies is the hierarchical relationship between terms. Terms are linked to each other in a relationship that indicates that one is the broader term (BT) of the other, and in the other direction, one is the narrower term (NT) of the other. You don’t need to be a taxonomist to understand this basic principle. However, even taxonomists can be challenged sometimes in determining whether it’s correct two put two terms in a hierarchical relationship.

Standards for Hierarchical Relationships


There are guidelines for the hierarchical relationship provided by the standards of ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies and ISO 25964-1: Information and Documentation — Thesauri and Interoperability with other Vocabularies — Part 1: Thesauri for Information Retrieval. The standards say that in a correct hierarchical relationship the term that is narrower to the broader term may be a specific type of the generic broader term, a named instance of the generic broader term, or an integral part of the whole broader term.

These standards, however, are for thesauri, not taxonomies. Thesauri have additionally a non-hierarchical associative relationship between terms, known as “related term” (RT). In taxonomies which lack related-term relationships, the conditions under which the hierarchical relationship is permitted need not be followed quite as strictly. Nevertheless, the thesaurus standards for creating the hierarchical relationship should be the starting point and the default for hierarchical relationships in taxonomies.

Challenges in Coming up with Broader Terms


Hierarchical taxonomies may be created from the top down, the bottom up, or a combination of both approaches. The top-down approach involves creating broadest categories first, then adding narrower terms and then adding narrower terms to narrower terms. This approach makes it easier to create good hierarchical relationships. In reality, though, we don’t always create terms based purely on their broader terms. Rather, analysis of content yields specific terms that are needed, so some degree of bottom-up taxonomy creation takes place. In the bottom-up approach there may be the challenge of determining and creating the appropriate broader term.

When I have been completely challenged in coming up with a broader term, I admit I have looked up the term in Wikipedia to see what are named as “Categories” for that term, listed at the bottom of the page. “Categories” implies a broader term, but these are not necessarily good or correct broader terms. An example of Categories that are not exactly broader terms is for the term Stress management: Stress, Management by type, Psychotherapy, and Psychiatric treatments. Stress management is not exclusively done as (is a part of) Psychotherapy or Psychiatric treatments, so those are not suitable broader terms. “Management by type” is definitely not a good taxonomy term, and the term Management alone has a different meaning of its own. As for the term “Stress,” this is more complicated. Technically, Stress management is not a kind of Stress or a part of Stress, so Stress should not be its broader term.  If this were in a thesaurus, they would definitely be related terms. If your controlled vocabulary is not a thesaurus, and the related-term relationship is not supported, then you may ignore the thesaurus rule in this case, and make Stress the broader term of Stress Management. This relationship is likely to be expected and accepted by users.

Challenges in Special Circumstances


Even creating a taxonomy from the top down taxonomists may encounter challenges or confusions with the hierarchical relationships. One challenging case is the concept of membership. Things and their members could be industries and their companies or international organizations and their member countries. It may seem logical to list the affiliate members “under” the industry or organization of which they are a part, but this is based too much on context and time. Companies can change their industries, and countries can change their international organization affiliation. More significantly, the whole-part hierarchical relationship is about integral parts, not participatory taking “part.” Finally, it may be more practical to put each type (companies, industries, companies, organizations) in a separate facet and not establish any relationship between them in a taxonomy (in contrast to a thesaurus or ontology).

Another potentially confusing case involves occupations and job titles. The subordinate nature of narrower terms should not be confused with the subordinate role of one job title to another. Thus, while a marketing specialist reports to a marketing manager, Marketing managers is not a broader term of Marketing specialists. Furthermore, while a marketing manager reports to a marketing director, we might make the hierarchical relationship in the other direction, with Marketing Directors as a narrower term to Marketing Managers, because directors are a kind of manager. Managers include directors.

Perhaps the most confusing case involves specificity which is not taxonomical specificity. For example, the Syllabi (plural of syllabus), as instructional outlines, in a certain sense are more specific than Curricula (plural of curriculum), which are also kind instructional outlines. Syllabi are for individual courses, and curricula are for a series of courses, such as an entire program of study or degree. Thus, it might seem logical that Syllabi would have the broader term of Curricula. But a syllabus is neither a specific type of curriculum, nor is it part of a curriculum. It is something different. So, it would be better not to have Curricula as a broader term of Syllabi, even in a taxonomy that is lacking related-term relationships.

Parent-Child Confusions


Sometimes the hierarchical relationship is referred to as “parent-child.” While it’s correct that a subsidiary company is a narrower term of its parent company, because it is part of the parent company, a biological child is not a narrower term if its parent, because it is not a part of the parent, but rather an offspring. To avoid confusion, it’s better to describe the relationship as broader/narrower, rather than as parent/child.

Monday, July 30, 2018

Taxonomy Hierarchy Levels


A taxonomy comprises a hierarchy of concepts (terms), and those hierarchies can be considered to be in different levels. In actuality, levels are somewhat artificial, and its important not to think of levels too strictly. In some taxonomies the levels are even named (for example: Domain, Category, Subcategory, Topic), but I would caution against such a practice.


Why we may tend to name levels


The most famous taxonomy, the Linnaean taxonomy of organisms, has well-known names for each of its hierarchical levels: Domain, Kingdom, Phylum, Class, Order, Family, Genus, and Species. There are issues, however, with this named-level system, though. In some cases, a Family may contain only a single Genus, and/or a Genus contain only a single Species (such as Homo sapiens). In some cases, a Species may have such variety within it, which we wish to describe, that we have created names for subspecies or other deeper levels (such as for dog breeds). For a digital navigation or information taxonomy of concepts, it would be considered bad style for a term to have only a single narrower term (as Homo sapiens). A term should have no narrower terms or at least two narrower terms, but not just one.

Besides the legacy of the Linnaean taxonomy, we may think of designated levels of a taxonomy, because the most common tool of developing taxonomies is MS Excel. In Excel, each column is used to designate a deeper hierarchical level, broader to more specific, from left to right. People may feel compelled to designate column headers (a typical thing to do in spreadsheets), whether as names or merely as Level 1, Level 2, etc. Excel is not intended to be taxonomy management software, and all dedicated taxonomy management software tools do not support the default naming or numbering of hierarchical levels, since there is no need for it in a taxonomy.


Why we should not name levels


Unlike the Linnaean taxonomy, the goal of a digital navigation or information taxonomy of concepts is not necessarily to classify concepts, but rather to arrange concepts (terms) in logical hierarchical relationships, so as to help guide the user to find the desired concept (which in turn is linked to content). A classification system (such industry classification codes or the Dewy Decimal system), which also has enumerated levels, is often considered a different kind of controlled vocabulary from a taxonomy.

A distinction needs to be made between hierarchical relationships and hierarchies. A good taxonomy or thesaurus design practice is to create hierarchical relationships between terms where they are logical: when one terms is a specific type or an integral part of another term, so users find narrower terms where they expect them. The extension of multiple hierarchical relationships, particularly when terms have both broader-term and narrower-term relationships, naturally results in the manifestation of hierarchies. But the resulting “natural” hierarchies are not consistent. There may be many levels deep in some places and only two levels deep in other places.  Terms that are on the same “level” may have relatively different degrees of specificity. I recently created a taxonomy for a discipline in which terms that were the equivalent of textbook courses ranged everywhere from the top to the fourth level. Fortunately, I was not constrained to have course as the first level.

Sometimes a taxonomy owner wants to set a policy as to how many levels deep the taxonomy should be.  It is understandable to limit the depth of a taxonomy in some cases: a hierarchy of navigation for public site visitors who want to get to content in the fewest clicks, lest they leave the site; a hierarchy of categories whose labels are to be picked up by search engines (supporting search engine optimization); or a hierarchy within a facet with limitations on browsing.  But there is a difference between limiting the total levels of depth and designating what the levels are called and are supposed to represent.


Examples of problems from named levels


Designating the names or types of levels inevitably results in the inaccurate application of level names or terms at inappropriate or inconsistent levels. For example, for a taxonomy of job titles I worked on, the project owner proposed that the top level be called Occupations and the narrower terms to those be called Specializations. This often works, but not always. For example, with the term Electrician and its narrower term Electrician Apprentice. Electrician was called and Occupation, and Electrician Apprentice was called a Specialization. Although an Electrician Apprentice can be a kind of (narrower term of) Electrician, it is not actually a “specialization” of Electrician. Also, a unique specialized job title may not have a broader term type of job title, so it would have to be called an Occupation. For example, Endoscopy Technician was designated as an Occupation, as it lacked a broader term, whereas Nurse Practitioner was a Specialization, since it had the broader term of Registered Nurse.

In another example of a taxonomy of academic areas of study I worked on, I was told that the taxonomy could have only two levels and the top level would be called Discipline and the second level be called Subdiscipline. The levels and designations were based on content management and business needs.  Thus, while Marketing would normally be considered a narrower term to Business, both were Disciplines at the same level. Some of the Disciplines were very specific, such as Real Estate Law (since Law did not exist as a discipline in this case), and some of the Subdisciplines were very broad, such as Computer Science (because it had a broader term of Computing). I resolved that this was not actually a taxonomy, but rather a metadata property with its values structured into two levels.

Taxonomies naturally have hierarchies, but do not naturally have levels, which are an artificial layer that sometimes get imposed.