Taxonomy terms assigned to content items makes the content
easier to find, whether in an internal system, on the web, or both. To make
content easier to find or discover on the web, the use of taxonomy terms or
tags is part of the broader application of search engine optimization (SEO). A
lot has already been written by others regarding tips for creating and adding
terms/labels/tags to web content to support SEO, such as how many and how
specific they should be. For the taxonomist, who is interested not only in the
terms alone but also in the larger taxonomy to which they belong, another
question is whether using terms from shared, publicly available controlled
vocabularies makes a difference in increasing content discoverability on the
web.
Linked open data and linked open vocabularies
Shared, publicly available controlled vocabularies may or
may not be linked or linkable, as linked open vocabularies. So, just because a
controlled vocabulary is publicly available does not mean that it inherently
supports linked data on the web.
“Linked data,” which usually is linked open data, refers to
methods to interlink structured content in a way that can be read
automatically by computers to enable the discovery of content on the web.
It is described in a set of W3C specifications for web publishing that makes
the data or content part of the Semantic Web. This means that instead of
manually following individually created hyperlinks, semantic links and computer
readable formats support automated relevant linkages among content. Linked data
requires the use of named URIs to identify things, HTTP URIs for web lookup, and
structured data using controlled vocabulary terms and dataset definitions
expressed in an RDF standard framework. “Linked open data” additionally
includes open use in accordance with an open license.
Terms in taxonomies can serve as labels to linked content as
part of linked data. Additionally, although less common, taxonomy terms
themselves can be the content that is linked to, if the taxonomy concepts are
individually assigned URIs and HTTP addresses, and are in an RDF format.
Limitations to designating content as linked open data
If you have a document on the web that you want to have
discovered as part of the Semantic Web, designating it as linked data is not so
simple, because you need to include the machine-readable instructions, such as through
a SPARQL endpoint or an API (application programming interface), in addition to
the RDF designation. Not only is this technically outside the skills of most
individual web content creators and taxonomists, but depending on how the
content is managed, standard web content management systems or blog posting
software may not even support editing the HTML of the page to insert such
instructions
Institutions may register their content with a linked open
data repository. The main repository of linked open vocabularies is Linked OpenVocabularies (LOV), hosted by the
Ontology Engineering Group of the Computer Science School at Universidad
Politécnica de Madrid. An individual blogger, however, who would like to make
an individual blog post linked open data, cannot easily achieve that status.
Simply linking to shared, open vocabularies
Thus, if linked data instructions cannot easily be included
and traditional manual links back to the page (as by means of agreed-upon link
exchanges) cannot be established for practical reasons, tagging could be done
with terms from a publicly available controlled vocabulary that is not part of
linked open data and linked open vocabularies. Two good examples are the labels
of Wikidata and the Virtual International Authority File (VIAF).
Wikidata is a free, open, collaborative, multilingual
collection of structured data. Its purpose is to support Wikipedia, Wikimedia
Commons and other wikis of the Wikimedia movement, as well as anyone who wants
to search, use, edit or consume its data. The data contained in the Wikidata
repository consists of items, each with a unique name and ID. Currently
there are 50,116,886 data items. Each item has a brief glossary definition,
equivalent names in other languages, relationships ("statements”) to other
data items (such a "subclass of" and "designed by"), and identifiers
in other vocabularies (such as Freebase, Library of Congress authorities, and Quora
topic).
VIAF, hosted by OCLC, contains just named
entities (proper nouns). But it uniquely brings together and displays as a group the headings that are the
authority used by each contributor for that term. So, it’s not exactly a
controlled vocabulary. VIAF has over 40 international member-contributors, most
of which are national libraries.
Is
there any benefit in tagging with and linking to terms that are part of a controlled
vocabulary which is publicly available but is not a linked open
vocabulary, such a Wikidata or VIAF? A colleague of mine proposed finding out by experimenting
with tagging the same content with terms from different sources. Results
will be shared in a later blog post.
Providing access to linked data via SPARQL or another API is only required if you want your linked data to be queryable or if you want others to be able to write web applications that use your data. Otherwise, Linked Data can be made discoverable on the web with RDFa (https://rdfa.info/) or JSON-LD (https://json-ld.org/spec/latest/json-ld/#embedding-json-ld-in-html-documents). Content management systems like WordPress and Drupal have plugins for providing schema.org markup, which makes things easier for content creators wanting to use that vocabulary.
ReplyDeleteMany organizations simply provide their RDF datasets as downloadable files -- you see this a lot at academic institutions running the VIVO research networking software, which uses an ontology for its data model. It's possible to host a SPARQL endpoint on a VIVO instance, but SPARQL endpoints are difficult to maintain and most organizations do not bother to do so, or have abandoned their efforts to do so. (See http://sparqles.ai.wu.ac.at/availability for the status of many SPARQL endpoints around the web.)
I also wanted to mention that for those people working in the biomedical space, BioPortal, at https://bioportal.bioontology.org/, is a very comprehensive repository of biomedical ontologies, and you can find there the Open Biomedical Ontologies (OBO) that LOV chooses not to index.
Thank you, Marijane, for taking the time to provide this informative response. Admitedly I was going out of my comfort zone of taxonomy knowledge by looking into linked data and linked open vocabularies. That's good to hear that WordPress and Drupal have plugins that enable linked data to be discoverable. Yes, discoverability is my focus, not enabling the ability to query data. I don't supposed anything would enable a Blogger.com blog post to be discoverable. My website is hosted on WordPress and I duplicated my blog there, so I'll look into such plugins.
Delete- Heather