Showing posts with label Polyhierarchy. Show all posts
Showing posts with label Polyhierarchy. Show all posts

Saturday, April 30, 2022

Polyhierarchy in Taxonomies

A defining characteristic of taxonomies is that terms/concepts are arranged in broader-narrower hierarchies, which may resemble tree structures. A limited number of top concepts each have narrower concepts, which in turn may have narrower concepts, etc., and the narrowest concepts at the bottom of the hierarchy are sometimes referred to as leaf nodes, as “leaf” extends the metaphor of “tree.” The tree model has its limits, though, because taxonomies may also have occasional cases of “polyhierarchy,” whereby a concept may have two or more broader concepts, instead of just one.

 

People who are new to taxonomies, however, might not consider polyhierarchies, because they tend to think of taxonomies as classification systems. Hierarchical information taxonomies have their origin in classification systems, such as the Linnean taxonomy of organisms, library classification systems, and industry classification systems. Classification systems, however, do not allow polyhierarchy within the system. Originally, classification systems were for physical things, such as books, which can belong in only one place, so there could be no polyhierarchy. Standard classification systems, such as industry classification systems, were developed by governmental, international, or nongovernmental organizations with a primary purpose of gathering and organizing statistical data about classes, and thus polyhierarchy is not permitted, as it would lead to double-counting of members of a class.

 

The primary purpose of hierarchy in a taxonomy is to provide guided browsing of topics to end-users, who may start out looking at broad categories and then drill down to find the narrowest concept of interest. Thus, polyhierarchy serves the same purpose. The idea is that different people will start at different points at the top of the hierarchy to arrive at the same concept of interest, which is tagged to the same content set. A polyhierarchy should be implemented if the concept’s relationship is correctly and inherently hierarchical in both of its cases. An example of a polyhierarchy is Educational software, which has both Software and Educational products as broader concepts. Educational software is a kind of software, fully included within Software, and Educational software is a kind of educational product, fully included within Educational products.

 



 

Taxonomy standards and polyhierarchy issues

 

Taxonomy/thesaurus standards (ANSI/NISO Z39.19 and ISO 25964) describe three kinds of hierarchical relationships--generic-specific, generic-instance, and whole-part,--and polyhierarchy may exist within any of these types. Polyhierarchy that combines different hierarchical types, however, can be problematic, so it is best to avoid mixing hierarchical relationship types. For example, the following polyhierarchy mixes different types:

 

Washington, DC

Broader: United States (whole-part)

Broader: Capital cities (generic-instance)

 

The reason to avoid creating a mixed type polyhierarchyis simply that the browsable hierarchy user experience can get compromised and potentially confusing. Extensive hierarchies with large numbers of narrower concept relationships would result. A hierarchical taxonomy tree should be designed with a dominant hierarchy design. An exception is a thesaurus, which is not designed so much for top-down browsing but for browsing from term to term. Mixing hierarchical types within a thesaurus is thus acceptable.

 

It is also recommended to avoid creating hierarchical relationships across different facets in a faceted taxonomy. This is because facets are designed to be mutually exclusively, so that concepts from multiple facets can be used in combination to limit/filter/refine a search. As such, facets are designed to be distinct aspects. There could be an occasional exception of polyhierarchy, though, but more than 2-3 polyhierarchies across an entire faceted taxonomy should be a cause for review.

 

With the wider adoption of the SKOS (Simple Knowledge OrganizationSystem) model for taxonomies and in taxonomy management systems, taxonomies are more commonly organized into concept schemes. A concept scheme can be represented as a facet in a faceted taxonomy, but it is not limited to use as a facet. Utilizing concept schemes, it makes sense to have separate concept schemes with different hierarchical types, some for generic-specific (for type, categories, topics), one or more for whole-part (geography, organizational structures), and some containing lists of instances (named entities). In this model, Washington, DC, would be narrower only to the United States in the whole-part hierarchical concept scheme for geographic places. It could also be linked to Capital cities, which is in a different concept scheme for place types, with a different kind of relationship (“related” or perhaps a semantic relationship from an ontology).

 

Although SKOS permits hierarchical relationships across different concept schemes, it is best practice not to do this but rather to create hierarchical relationships and polyhierarchies confined within a concept scheme, just as it is recommended not to have polyhierarchy across facets.

 

Additional polyhierarchy considerations

Polyhierarchy concerns concepts in the taxonomy, and it is not about objects, items, or assets that get tagged with taxonomy concepts, such as an individual publication, document, image, product record, etc. Each of these may get tagged with multiple taxonomy concepts, and as such may have multiple “classifications” and thus can appear as if they are in a polyhierarchy, if a frontend application displays tagged items as if they are leaf nodes in a taxonomy.

A polyhierarchy usually involves only two broader concepts, not more. Having more than two broader concepts is extremely rare. If you find yourself creating polyhierarchies of three or more multiple times in a taxonomy, check to make sure you are not doing something wrong with the hierarchy design.

Some content management systems, which have built-in taxonomy management and tagging features, do not support polyhierarchy. The best known is SharePoint with taxonomies managed in its Term Store feature. Taxonomy terms may be “reused” across Term Sets, but they are not permitted within a Term Set, where it is most suitable. See my past post, Polyhierarchy in the SharePoint Term Store, for more details

Saturday, January 30, 2016

Polyhierarchy in the SharePoint Term Store



Last year I had the opportunity to create some taxonomy in the SharePoint Term Store (also called Managed Metadata), and while I am pleased that hierarchical taxonomies are supported in this widely used platform, I had some concerns about the support of polyhierarchy, as information about this capability is inconsistent. So I experimented further. 

Polyhierarchy means a taxonomy term has more than one broader term or parent term. In a traditional hierarchical taxonomy structure, a term has one broader term (unless it is the top term, in which case it has no broader term) and multiple narrower terms. Occasionally, though, the logic of the hierarchy and the practical need to guide users down different possible paths, makes it beneficial to give a term two or more broader terms. It may appear to the user that the term is duplicated in different locations in the taxonomy, but this duplication is in appearances only, because it is the same term and thus linked/indexed to the same content, no matter which broader term path the user clicked down through.

An example would be the term Financial report, which is shown in Figure 1 screenshot from the SharePoint Term Store.
Fig. 1 Financial report as a narrower to the term Financial documents.

It would be practical to have a broader term of Financial documents and another broader term of Reports. Some users will look for the term under Financial documents, and other users will look for it under Reports.

The SharePoint 2010 or 2013 Term Store claims to support the creation of polyhierarchy, but it has significant limitations.

Polyhierarchy permitted only across different hierarchies

 

The support of polyhierarchy in the SharePoint Term Store, takes the notion of “polyhierarchy” too literally by insisting that the two broader terms of a term in a polyhierarchy actually belong to different hierarchies. This means that the polyhierarchy can only be created across different Term Sets in SharePoint. A Term Set is a hierarchy or a facet with a single top term. It is prohibited to create a polyhierarchy within the same Term Set. This is quite problematic, because I find that the vast majority of the time that I want to create a polyhierarhcy it is within the same top-level hierarchy for facet. 

In the example of Financial report, it is logical to have two broader terms of Financial documents and Reports. Both of these broader terms, however, are within the same Term Set or facet, which I might call Document type, so the SharePoint Term Store will not permit this polyhierarchy. Having the term Financial documents appear under a second broader term within any other Term Set or facet, on the other hand, such as the Department or Location facet, is permitted by SharePoint, but this would not be a correct hierarchical structure by taxonomy standards. 

Only one method to create polyhierachy

 

In the SharePoint Term Store, you cannot create a broader term relationship; you can create only narrower term relationships. Thus, you can only create hierarchies from the top down. The normal way to create a polyhierarchy, however, is to add a second broader term relationship, but this is not possible in SharePoint. Instead, the same term has to be made as a narrower term to a second term.

So, if  you have the term Financial report as narrower to Financial documents, and you want to make Reports also a broader term (and Reports exists in another Term Set), you would go to the second term that will be the new broader term (Reports), click on Create Term, and type in the name of an existing term (Financial report). SharePoint, however, does not enforce taxonomy standards and permits you to create a new term with the same name as another term (Financial report), but it will not be the same term. You can see at the bottom of the General information pane, that the duplicate Financial report term’s unique identifier is different from the original Financial reports term., as shown in Figure 2.

Fig. 2 General Information for a selected term


This matters, because terms are used for indexing/tagging. The term with one ID in one location may be indexed to some of the content, and the term with the other ID in the other location will be indexed to other content, and neither term will be indexed to all the content. This would be bad for retrieval. So, this method should not be used to create polyhierarchy.

To create polyhierarchy in SharePoint, go to a second term that is intended to be the additional broader term (Reports), click on Create Term and type in the name of an existing term (Financial report). You will see at the bottom of the screen “Suggestions” with the start of the suggested matching, with yellow highlighted type-ahead matching, to existing terms in another Term Set or even another taxonomy group. If you select one of these suggested terms, then you will indeed be creating a polyhierarchy. After doing so, you will notice that the tag icon preceding the term becomes the “reused tag” icon, as shown in Figure 3, in both locations, under the new broader term and under the existing broader term. You will also notice that when you select the term and view its General details that the data in the box under Member Of shows that the term is a member of both hierarchies.
Fig. 3 Reused tag example for the term Marketing


Importing a taxonomy with polyhierarchy

 

If you import an externally created taxonomy in CSV format as a Term Set via the Term Store’s import feature and that taxonomy has polyhierarchy, the Term Store will not recognize the polyhierarchy, but rather will treat the polyhierarhcy terms as distinct terms with duplicate names, assigning them unique IDs. Thus, they could be used inconsistently in indexing/tagging. Therefore, you should ensure that imported CSV taxonomies should not have any polyhierarchy.

If you import a taxonomy created in an external taxonomy/thesaurus/ontology management system which permits polyhierarchy, and that software has a feature or connector to import to SharePoint Term Store, there are different methods of dealing with the polyhierarchy issue. The default of some software, such as Semaphore Ontology Editor and TopBraid Enterprise Vocabulary Net, is to retain only one of the pair of broader term relationships upon export. For example, in Semaphore, the first hierarchical relationship encountered for a term is retained and any other are not, but the user gets an alert. Wordmap also provides a validation error if there is a polyhierarchy for import into the same Term Set.  Rather than maintaining a random one of more than one broader term relationship, Synaptica strips out all broader term relationships if there are more than one, and then the former polyhierarchy terms show up on the orphan term list for review. In some software, such as TopBraid EVN, the user can define quality/validation rules that would identify polyhierarchy, so the user can remove any before importing into SharePoint. Other software vendors, such as Data Harmony and PoolParty, say they have work-arounds for the SharePoint import to sort of support polyhierarchy, but I have not tested these.

In conclusion, the Term Store’s support of polyhierarchy only across Term Sets (hierarchies or facets) is not very useful, since the majority of time that we would want to create a polyhierarchy, it is within the same Term Set, especially if the Term Set is to be used as a facet. A term with the same name in more than one facet typically would have a slightly different meaning and usage.