The Accidental Taxonomist: Blog post taxonomies

Showing posts with label Blog post taxonomies. Show all posts

Thursday, September 6, 2018

Using Linked and Other Open Vocabularies

Taxonomy terms assigned to content items makes the content easier to find, whether in an internal system, on the web, or both. To make content easier to find or discover on the web, the use of taxonomy terms or tags is part of the broader application of search engine optimization (SEO). A lot has already been written by others regarding tips for creating and adding terms/labels/tags to web content to support SEO, such as how many and how specific they should be. For the taxonomist, who is interested not only in the terms alone but also in the larger taxonomy to which they belong, another question is whether using terms from shared, publicly available controlled vocabularies makes a difference in increasing content discoverability on the web.

Linked open data and linked open vocabularies

Shared, publicly available controlled vocabularies may or may not be linked or linkable, as linked open vocabularies. So, just because a controlled vocabulary is publicly available does not mean that it inherently supports linked data on the web.

“Linked data,” which usually is linked open data, refers to methods to interlink structured content in a way that can be read automatically by computers to enable the discovery of content on the web. It is described in a set of W3C specifications for web publishing that makes the data or content part of the Semantic Web. This means that instead of manually following individually created hyperlinks, semantic links and computer readable formats support automated relevant linkages among content. Linked data requires the use of named URIs to identify things, HTTP URIs for web lookup, and structured data using controlled vocabulary terms and dataset definitions expressed in an RDF standard framework. “Linked open data” additionally includes open use in accordance with an open license.

Terms in taxonomies can serve as labels to linked content as part of linked data. Additionally, although less common, taxonomy terms themselves can be the content that is linked to, if the taxonomy concepts are individually assigned URIs and HTTP addresses, and are in an RDF format.

Limitations to designating content as linked open data

If you have a document on the web that you want to have discovered as part of the Semantic Web, designating it as linked data is not so simple, because you need to include the machine-readable instructions, such as through a SPARQL endpoint or an API (application programming interface), in addition to the RDF designation. Not only is this technically outside the skills of most individual web content creators and taxonomists, but depending on how the content is managed, standard web content management systems or blog posting software may not even support editing the HTML of the page to insert such instructions

Institutions may register their content with a linked open data repository. The main repository of linked open vocabularies is Linked OpenVocabularies (LOV), hosted by the Ontology Engineering Group of the Computer Science School at Universidad Politécnica de Madrid. An individual blogger, however, who would like to make an individual blog post linked open data, cannot easily achieve that status.

Simply linking to shared, open vocabularies

Thus, if linked data instructions cannot easily be included and traditional manual links back to the page (as by means of agreed-upon link exchanges) cannot be established for practical reasons, tagging could be done with terms from a publicly available controlled vocabulary that is not part of linked open data and linked open vocabularies. Two good examples are the labels of Wikidata and the Virtual International Authority File (VIAF).

Wikidata is a free, open, collaborative, multilingual collection of structured data. Its purpose is to support Wikipedia, Wikimedia Commons and other wikis of the Wikimedia movement, as well as anyone who wants to search, use, edit or consume its data. The data contained in the Wikidata repository consists of items, each with a unique name and ID. Currently there are 50,116,886 data items. Each item has a brief glossary definition, equivalent names in other languages, relationships ("statements”) to other data items (such a "subclass of" and "designed by"), and identifiers in other vocabularies (such as Freebase, Library of Congress authorities, and Quora topic).

VIAF, hosted by OCLC, contains just named entities (proper nouns). But it uniquely brings together and displays as a group the headings that are the authority used by each contributor for that term. So, it’s not exactly a controlled vocabulary. VIAF has over 40 international member-contributors, most of which are national libraries.

Is there any benefit in tagging with and linking to terms that are part of a controlled vocabulary which is publicly available but is not a linked open vocabulary, such a Wikidata or VIAF? A colleague of mine proposed finding out by experimenting with tagging the same content with terms from different sources. Results will be shared in a later blog post.

Saturday, June 30, 2018

Categories, Tags, and Taxonomies in WordPress

When I upgraded my Hedden Information Management website to WordPress a few months ago, I took advantage of WordPress’s blog post feature and incorporated a copy this blog into the website (while also keeping its original location on Blogger.com). The difference between categories and tags in the different platforms became clear. Blogger.com offers only “labels” to its bloggers, although these are listed as “Categories” on the displayed blog. WordPress, by contrast, offers both “Categories” and “Tags.” When I imported my blog posts to the WordPress site, the Categories in Blogger.com became Categories in WordPress, but none of the posts had any Tags. I then realized that some of these Category terms perhaps should be changed to Tags.

The difference between Tags and Categories is a topic I blogged on five years ago. A simple comparison is that Categories tend to be broader than tags, and more documents get assigned the same Category, whereas Tags tend to be more specific with fewer documents assigned the same Tag. Conversely, a document typically has only one or two Categories but more Tags. Categories can also be organized into a hierarchy with subcategories, but Tags tend to be unstructured. However, Blogger.com does not offer the capability of putting its Categories into a hierarchy, which would be desirable, since the number of my Categories has become too great to browse easily in a flat list.

WordPress appropriately treats Categories and Tags in differently in the following ways:

Categories, unlike Tags, have the capability of being put into a hierarchy, be selecting a “parent” Category for a given Category. The hierarchy displays both in the Dashboard and optionally on the site.
While both Categories and Tags are displayed on each individual post (and are hyperlinked to a list of posts which share the same Category or Tag), and both Categories and Tags that can be generated as Tag Clouds, it is only the Category list that can be alphabetically browsed by the site visitor (if added as a widget to a page).
Categories are required, whereas Tags are not. If you don’t assign a Category to a post it will automatically get assigned the “Uncategorized” Category.
Category labels appear additionally within the default URL of the blog post in a file path between the domain name and the filename. For example, my blog post with the Category of “Metadata,” received the URL of www.hedden-information.com/metadata/metadata-and-taxonomies.
The Category name also appears within the breadcrumb trail, if the site has one displayed on each page. Of course, some blog posts have multiple Categories, and only one of them can appear within the URL and breadcrumb trail, so WordPress assigns one of them by default.

Creating and managing Categories and Tags for posts is a default feature of WordPress that’s easy to do in the Dashboard of a site. Since I had recently imported dozens of blog posts that had Categories and no Tags, I especially liked the feature to selectively convert Categories to Tags (One can also convert selected Tags to Categories.) I went through my list of Categories and converted most of those that were infrequently used into Tags. The Categories to Tags Converter is one of the default Tools available for Import, but it does need to be “imported” and “activated” to be available.

Additional features in taxonomy management in WordPress can be obtained through various free or premium plugins. This is the case if you want to create multiple taxonomies, whether as sets of Categories or Tags, or faceted taxonomies. The default Categories and Tags feature permits the creation of just a single Category set and a single Tag set. If your site has different types of posts, such as custom post types, or if you want multiple term sets by which to filter posts by different aspects (facets), then you would need to create custom taxonomies. It is possible to create custom taxonomies by writing code, but if you are not a WordPress developer, there are plugins available for creating custom taxonomies. The support of synonyms/alternative labels/nonpreferred terms for Tags is also a feature available only with plugins, in this case plugins that aim to support search.

I will be discussing these topics in a presentation “Taxonomies,Categories, and Tags” at the WordPress conference, WordCamp Boston 2018, on Saturday, July 21. If you are in the Boston area come join me!