Showing posts with label SharePoint Term Store. Show all posts
Showing posts with label SharePoint Term Store. Show all posts

Monday, January 13, 2020

Intranet and ECM Taxonomies

In designing a taxonomy for tagging and retrieving content in intranets or in an enterprise content management (ECM) system, there is a fundamental question of whether to strive for creating a single comprehensive taxonomy to be applied throughout the enterprise or to have multiple specific taxonomies for different sets of content and different groups of users within the enterprise, or both. This question involves not only issues of information usability and user experience but also a mindset, which could involve a goal of “breaking down silos” by having a single enterprise taxonomy or one of encouraging “democracy” among organizational units and letting them create their own local taxonomies or terms sets (with training).
The main advantage of a single, global taxonomy is to enable users to effectively search and refine/filter results across all the content within an enterprise system using the same parameters. Users then don’t need to know in which intranet site or sub-site the desired content is to be found. Users need only become familiar with a single taxonomy, not multiple. So, it becomes easier to use. Content can be better shared and discovered.

On the other hand, more, specific taxonomies can also be of value, providing more precise retrieval results by users who know where and how to search with them. In many organizations, there are very specific sets of documents, for which a specific taxonomy would aid in retrieval, yet they can be of value to any employee. For example, in an organization that conducts research, these could be research reports or profiles of experts. In an organization that provides services, these could be documents of service descriptions, procedures, and policies. In and an organization with a large sales operation, these could be all the documents that support salespeople. The design of a taxonomy should reflect the  nature and the scope of the content and the needs of all users. Content in specialized repositories (research reports, experts, service documents, sales support documents, etc.) ought to have customized taxonomies to more fully support the best options in retrieval. For example, a taxonomy for research reports needs to be detailed in research subject areas. A taxonomy for experts would include areas of expertise, departments, locations, and job titles. A taxonomy for service support documents needs to be detailed in types of services and document types and should also include a set of terms for market segment. A set of taxonomies in support of sales should likely include product categories, sales function or process stage, market, and customer type. Meanwhile, a “generic” taxonomy, to be used across the organization, might be based on departments and types of functions/activities, along with general document types and topics.

It may be unclear who should decide and how the decision should be made regarding global, enterprise vs. specific, departmental taxonomies. The decision should probably be left to those in the organization who lead knowledge management or content strategy. The IT department, which sets up the Intranet, ECM, or SharePoint  system may have influence in this matter, based on how they choose to configure the system.  There can also be uncertainty and ambivalence over which taxonomy approach to take. During my interview with stakeholders for a recent SharePoint taxonomy consulting project, a lead IT stakeholder said that there was no policy, but that they “encourage” departments to use the same topical taxonomy. Yet at the say time, they also “create a local classification, but don’t encourage a local classification.”

Approaches to intranet taxonomies

Let’s look more closely at the various options for intranet taxonomy design.

1. Create a general enterprise-wide taxonomy and various departmental-specific taxonomies.
Benefits: Taxonomies are suited to the content
Drawbacks: Has silos and less sharing. User outside of a department may not be familiar with the departmental taxonomy.

2. Create a single comprehensive taxonomy (or set of taxonomies/facets) to cover all the internal information needs of the organization.
Benefits: There is more sharing and ease of having a single taxonomy of terms for users to refine searches by.
Drawbacks: It is more difficult for tagging with a large and potentially confusing taxonomy, where sections of the taxonomy are irrelevant to some sets of taxonomy, and some terms may have been intended for one purpose but get used for another purpose.

Other options are more creative, and hopefully IT can customize the content management and search software accordingly to support them.

1. Create an enterprise-wide taxonomy, as a master taxonomy, which is both general and specific, and various specific taxonomies, and map the specific taxonomies, term-by-term, to the master taxonomy which includes all terms. Those who tag only need to use their appropriate specific taxonomies, but those who search, making use of the master taxonomy, can have a “federated search” experience allowing discovery and retrieval across the enterprise.

2. Create a single comprehensive taxonomy with branches that can be hidden from display to those tagging content which does not require the terms from those branches of the taxonomy. This makes it easier for those who tag, not being overwhelmed with a very large taxonomy, much of which is not relevant to their content, and contains terms which could potentially be confusing and misused.

As I was struggling with the problem with my current client on whether to make a large taxonomy (500-600 terms) available for tagging in all SharePoint sites, even though it was relevant to only a minority of the sites, the IT stakeholder informed me that for designated sites he could set the display of the taxonomy for tagging of just one top-level branch of the taxonomy and hide the rest. Although no more than one branch could be displayed in this method, which would impact the hierarchical design of the taxonomy, this was the best compromised solution in this case.

I look forward to sharing and learning more about taxonomies for intranets at the upcoming IntraTeam Event Copenhagen: The European DEX Conference, where I will be giving a pre-conference workshop "Taxonomy Design & Creation."

Tuesday, August 29, 2017

Taxonomies in SharePoint



Controlled vocabulary metadata, including hierarchical taxonomies, has been supported in SharePoint since its 2010 version, and its use and features have been enhanced is succeeding versions of SharePoint. While it’s not technically difficult for users to create taxonomies and apply their terms to content items in SharePoint, developing a metadata/taxonomy design and application strategy is definitely a challenge.

The distinction and overlap between metadata and taxonomies was the topic of my previous blog post, "Metadata and Taxonomies," and it is very relevant to SharePoint. Controlled vocabularies or taxonomies used to tag content are referred to in SharePoint as “managed metadata.” This designation is indeed accurate and fitting. Some, but not all, metadata is in taxonomy form (hierarchical structures), and in SharePoint it is managed/controlled in a central way, where permissions on who can change or add to the metadata terms may be limited to a smaller set of users than those who may tag content with the metadata. “Managed Metadata” is something you will hear about in documentation, but in the SharePoint application itself, what you want to work with its “Term store management,” grouped with other Site Administration settings under “Site Settings” (under the gear symbol in Office 365 SharePoint). Terms are grouped into “Term Sets” (top-term hierarchies or facets).
Questions to consider in taxonomy design in SharePoint include:

  • To what extent will document libraries (virtual folders) be used to categorize content within a site, and would proposed subfolder names be better suited as metadata terms for tagging?
  • Will the primary use be for filtering lists of documents in place, within an open document library, based on metadata selected for the various “columns,” or will the primary use be for refining search results, based on metadata selected in the left-hand margin refinement panel after executing a search?
  • How many Term Sets should be created and how many and which metadata fields in total should display to the users, either in columns or as search refinements.
  • When should a Term Set be a flat list and when should it be created as a hierarchy, and how deep should the hierarchy be?

Use of document libraries vs. metadata tagging 

 

SharePoint supports the creation of a hierarchy of nested folders within libraries within sites. So, it may be tempting to start of creating such a “taxonomy” of categories for content, especially if migrating content over from a shared drive where such folders had been used. However, tagged metadata has many advantages over categories of folders for finding and retrieving content.
A content item may be tagged with more than one term from the same Term Set if it deals with more than one topic or if it falls into more than one category type, whereas putting a document in more than one folder can lead to version control issues. (It’s true that you can put a document in one folder and a link to it from within another folder, but this is not easily remembered to do, nor does it look as “clean.”)

You can create Term Sets (as facets) each for multiple ways to categorize, such as by document type, function, audience, topic, etc., serving as facets, and then tag a content item with terms from each, whereas folders don’t deal well with mixed methods of categorization, and you are forced to choose one method of categorization.

  • Tagged metadata allows you to filter a large set of content in place to quickly narrow results to what you want, whereas folders require clicking down through multiple paths, taking more time to find desired content, which is in different places.
  • Tagged metadata can also be implemented as search refinement filters, also known as faceted search.
  • Tagged metadata terms can have synonyms, helping users find what they want by different names, whereas folder names cannot have synonyms. 

Thus, what had been labels for folders on a shared drive should most likely be changed to terms in taxonomy Term Set. Whether you should have any document libraries, or just a few without subfolders, depends on the preferences of your users, but I don’t recommend the creation of subfolders. 

Filtering on columns vs. refining searches

 

The same Term Sets may be used for both column filters and search refinements. But typically, the implementation of managed metadata in SharePoint is either primarily for one or the other purpose, and the other use may not even be set up. Generally, if the managed metadata is going to be applied to documents within a single library or one site, on the order of tens or hundreds, then column filters are desired; if the managed metadata is going to be applied across multiple sites on thousands of documents, then search refinements would be used.

If unsure whether to promote filtering on columns or refinements on search, consider that filtering columns will always get more accurate results, but metadata has to be consistently applied.  Out-of-the-box search in SharePoint will retrieve documents with the word or phrase anywhere in the document. The idea behind this is to get search set that is larger than needed and not miss anything, and then the user can refine the search result with the various refinements. So, the results of search are not as accurate, but there will be results, even if metadata tagging is incomplete.

Knowing how the Term Sets will be used can have an impact on the wording of terms and the extent of use of hierarchy. Both columns and refiners have limited width for term names to display. The user can easily adjust column widths to accommodate long names, but the refinement panel width cannot be widened by the user. The use of columns also makes it desirable to keep terms to a limited length within a given Term Set.  Refiners indicate hierarchies of terms by default to the user who is searching content, whereas columns do not indicate any hierarchy in the default view.

Number of terms sets and metadata fields

 

There is no point in creating a Term Set if it’s not going to be displayed to the users for filtering or refining, and too many metadata fields take up horizontal or vertical space, are a burden to tag, and make the user experience of searching or filtering too complicated. So, you need to consider what would be truly useful, and not merely possibly nice to have. Just two Term Sets, such as Document Type and Topic, may be sufficient. In addition to the managed metadata that you create, there will be other metadata fields desired for filtering or refining, such as date, author, and format type, and perhaps uncontrolled keyword tags applied by users. In the case of columns, there will always be the document title taking up a column and considerable horizontal space as well. 

The default columns in SharePoint are “Type” (file format) “Name” (filename), “Modified” (the date the file was uploaded or the last time any of its properties were updated, not the date the file itself was modified) and “Modified By” (the person who uploaded the file or last updated its properties, but not necessarily who actually modified the file). The default search refinements are the same, excluding title: “Result type” (file format), “Author” (an even worse misnomer for Modified by”), and “Modified date” (often displayed in a graph form). If you believe such information is not that valuable, you can remove these columns/refinements, especially when you plan to add other columns/refinements, which will take up horizontal or vertical space. 

I would recommend no more than 4-7 total metadata fields, including those that are not based on managed metadata. You should avoid having more metadata as columns, along with the document titles, than can fully display horizontally, so as not to require horizontal scrolling. Search refinements, on the other hand, by default display sample high-use terms under each refiner, so typically no more than three refiners display in the left margin without vertical scrolling. Vertical scrolling is expected and acceptable to a limited degree.


Term Sets as flat lists or hierarchical taxonomies

 

SharePoint makes it easy to create hierarchies within Term Sets by simply right-clicking on a selected term and selecting “Create Term” from the context menu. Some people might thing that since the Term Store is for taxonomies, and taxonomies are hierarchical, hierarchies should be created if applicable. However, hierarchies are only helpful for navigating the taxonomy if the taxonomy is sufficiently large. If you set up multiple Term Sets, each used as a facet in combination with others, they each don’t need to be very large. Furthermore, the types of content most people store in SharePoint tends not to need extensively large and detailed taxonomies as might be needed in a content management system or digital asset management system

My rule of thumb is up to 12 terms should be on the same level before considering creating any hierarchy, but it could go up to 20 or so, and even more if the list of named entities/proper nouns.  Also, if you do have hierarchies, consider keeping them relatively shallow, such as to only two levels, instead of three. Even if a hierarchy is technically correct, it does not mean you have to set it up that way.

If you need only a short flat list of terms, you might consider not using the Term Store at all, but rather create the list as "Choice" type of column. This is easier to implement, but the terms would limited in their use to filtering and sorting the column, and could not also be applied to search and navigation. 

Saturday, January 30, 2016

Polyhierarchy in the SharePoint Term Store



Last year I had the opportunity to create some taxonomy in the SharePoint Term Store (also called Managed Metadata), and while I am pleased that hierarchical taxonomies are supported in this widely used platform, I had some concerns about the support of polyhierarchy, as information about this capability is inconsistent. So I experimented further. 

Polyhierarchy means a taxonomy term has more than one broader term or parent term. In a traditional hierarchical taxonomy structure, a term has one broader term (unless it is the top term, in which case it has no broader term) and multiple narrower terms. Occasionally, though, the logic of the hierarchy and the practical need to guide users down different possible paths, makes it beneficial to give a term two or more broader terms. It may appear to the user that the term is duplicated in different locations in the taxonomy, but this duplication is in appearances only, because it is the same term and thus linked/indexed to the same content, no matter which broader term path the user clicked down through.

An example would be the term Financial report, which is shown in Figure 1 screenshot from the SharePoint Term Store.
Fig. 1 Financial report as a narrower to the term Financial documents.

It would be practical to have a broader term of Financial documents and another broader term of Reports. Some users will look for the term under Financial documents, and other users will look for it under Reports.

The SharePoint 2010 or 2013 Term Store claims to support the creation of polyhierarchy, but it has significant limitations.

Polyhierarchy permitted only across different hierarchies

 

The support of polyhierarchy in the SharePoint Term Store, takes the notion of “polyhierarchy” too literally by insisting that the two broader terms of a term in a polyhierarchy actually belong to different hierarchies. This means that the polyhierarchy can only be created across different Term Sets in SharePoint. A Term Set is a hierarchy or a facet with a single top term. It is prohibited to create a polyhierarchy within the same Term Set. This is quite problematic, because I find that the vast majority of the time that I want to create a polyhierarhcy it is within the same top-level hierarchy for facet. 

In the example of Financial report, it is logical to have two broader terms of Financial documents and Reports. Both of these broader terms, however, are within the same Term Set or facet, which I might call Document type, so the SharePoint Term Store will not permit this polyhierarchy. Having the term Financial documents appear under a second broader term within any other Term Set or facet, on the other hand, such as the Department or Location facet, is permitted by SharePoint, but this would not be a correct hierarchical structure by taxonomy standards. 

Only one method to create polyhierachy

 

In the SharePoint Term Store, you cannot create a broader term relationship; you can create only narrower term relationships. Thus, you can only create hierarchies from the top down. The normal way to create a polyhierarchy, however, is to add a second broader term relationship, but this is not possible in SharePoint. Instead, the same term has to be made as a narrower term to a second term.

So, if  you have the term Financial report as narrower to Financial documents, and you want to make Reports also a broader term (and Reports exists in another Term Set), you would go to the second term that will be the new broader term (Reports), click on Create Term, and type in the name of an existing term (Financial report). SharePoint, however, does not enforce taxonomy standards and permits you to create a new term with the same name as another term (Financial report), but it will not be the same term. You can see at the bottom of the General information pane, that the duplicate Financial report term’s unique identifier is different from the original Financial reports term., as shown in Figure 2.

Fig. 2 General Information for a selected term


This matters, because terms are used for indexing/tagging. The term with one ID in one location may be indexed to some of the content, and the term with the other ID in the other location will be indexed to other content, and neither term will be indexed to all the content. This would be bad for retrieval. So, this method should not be used to create polyhierarchy.

To create polyhierarchy in SharePoint, go to a second term that is intended to be the additional broader term (Reports), click on Create Term and type in the name of an existing term (Financial report). You will see at the bottom of the screen “Suggestions” with the start of the suggested matching, with yellow highlighted type-ahead matching, to existing terms in another Term Set or even another taxonomy group. If you select one of these suggested terms, then you will indeed be creating a polyhierarchy. After doing so, you will notice that the tag icon preceding the term becomes the “reused tag” icon, as shown in Figure 3, in both locations, under the new broader term and under the existing broader term. You will also notice that when you select the term and view its General details that the data in the box under Member Of shows that the term is a member of both hierarchies.
Fig. 3 Reused tag example for the term Marketing


Importing a taxonomy with polyhierarchy

 

If you import an externally created taxonomy in CSV format as a Term Set via the Term Store’s import feature and that taxonomy has polyhierarchy, the Term Store will not recognize the polyhierarchy, but rather will treat the polyhierarhcy terms as distinct terms with duplicate names, assigning them unique IDs. Thus, they could be used inconsistently in indexing/tagging. Therefore, you should ensure that imported CSV taxonomies should not have any polyhierarchy.

If you import a taxonomy created in an external taxonomy/thesaurus/ontology management system which permits polyhierarchy, and that software has a feature or connector to import to SharePoint Term Store, there are different methods of dealing with the polyhierarchy issue. The default of some software, such as Semaphore Ontology Editor and TopBraid Enterprise Vocabulary Net, is to retain only one of the pair of broader term relationships upon export. For example, in Semaphore, the first hierarchical relationship encountered for a term is retained and any other are not, but the user gets an alert. Wordmap also provides a validation error if there is a polyhierarchy for import into the same Term Set.  Rather than maintaining a random one of more than one broader term relationship, Synaptica strips out all broader term relationships if there are more than one, and then the former polyhierarchy terms show up on the orphan term list for review. In some software, such as TopBraid EVN, the user can define quality/validation rules that would identify polyhierarchy, so the user can remove any before importing into SharePoint. Other software vendors, such as Data Harmony and PoolParty, say they have work-arounds for the SharePoint import to sort of support polyhierarchy, but I have not tested these.

In conclusion, the Term Store’s support of polyhierarchy only across Term Sets (hierarchies or facets) is not very useful, since the majority of time that we would want to create a polyhierarchy, it is within the same Term Set, especially if the Term Set is to be used as a facet. A term with the same name in more than one facet typically would have a slightly different meaning and usage.

Thursday, December 29, 2011

From Folders to Facets

A recent taxonomy project I completed involved creating a new taxonomy for a financial services client who was migrating its internal content from shared drive folders to a SharePoint-based intranet, which also included automated indexing and a search engine (FAST). The new taxonomy will help support the search functionality, and taxonomy terms will also display in the left-hand margin (called the Refinement Panel), so that users can refine/narrower their initial search results by selecting terms from several attributes/filters/facets.  The client had already made an attempt at the start of a taxonomy by the time I had become involved. Not surprisingly, the client-created taxonomy followed the structure of the existing folder names quite closely. After all, the folder structure was their only reference point. It became apparent that a taxonomy for folders and a taxonomy for facets, even for the same content, should be designed quite differently.

A hierarchy of nested folders has the following characteristics:
  1. It is designed to gather and group similar documents together.
  2. It is usually designed and created by a person who is uploading/storing documents with the frame of mind of “where can I put these so that I might find them later.”
  3. A document can go into only one folder and thus under only one category.
  4. A folder can be located within only one parent folder.
  5. The hierarchy of nested folders thus may become quite deep, such as six of seven levels.
  6. Folder names at deeper levels can become long and complex to describe a combination of criteria (a taxonomy design characteristic called pre-coordination).

A faceted taxonomy for search refinement has the following characteristics:
  1. It is designed to refine and narrower a search by specific criteria.
  2. It is designed to help all members of an enterprise find documents, including documents uploaded by different people in different departments.
  3. A document can be assigned multiple taxonomy terms, even terms from within the same facet/broad category.
  4. A taxonomy term may display “under” more than one parent taxonomy term, as long as it is a logical hierarchy. (This feature is called “polyhierarchy.”)
  5. The displayed hierarchy of terms is not so deep, usually only three levels.
  6. Taxonomy term names stay simple, since they are intended to be used in combination (a taxonomy design characteristic known as post-coordination).

With this many differences between hierarchical folders and refinement facets, it’s inevitable that the taxonomy for each will differ, even if the content/documents and the users remain the same. Actually, a nested folder structure may or may not even constitute a “taxonomy.” It depends on whether the folder system was designed with a consistent structure and folder names or whether it just grew ad hoc.

A year and a half ago I was involved with a similar taxonomy project for the wind energy company First Wind. In addition to designing a faceted taxonomy for the Refinement Panel to support search in SharePoint, I was also tasked with improving the nested folder structure and folder names already in use in SharePoint, and which was not going to go away. I remember being asked then, if I could just create a single taxonomy for both purposes. The answer was no, not entirely. There would be overlap, but there would also be differences.  To the stakeholders, that seemed like a lot of additional work, but to me, the taxonomist, that’s simply the nature of my work, and I enjoy the diversity of building different kinds of taxonomies. In the end, more work put in the by the taxonomist means less work needed by the users.