Presented at the CNI Spring Membership Meeting in San Antonio, Texas 4 April 2016. OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015, receiving responses from a total of 90 institutions in 20 countries. In the 2015 survey, 112 projects or services that consumed or published linked data were described (compared to 76 in 2014). This presentation summarizes the 2015 survey results: 1) which institutions have implemented or are implementing linked data; 2) what linked data sources institutions are consuming, and why; 3) what institutions are publishing, and why; 4) barriers and advice from the implementers.
3. International Linked Data Surveys
for Implementers
2014
48
2015
71
Number of institutional responses
Both
29
4. Geographic breakdown of 90
responding institutions
20 countries
represented
0 5 10 15 20 25 30 35 40 45
USA
Spain
UK
The Netherlands
Norway
Canada
Australia
France
Germany
Italy
Switzerland
Austria
Czech Republic
Hungary
Ireland
Japan
Malaysia
Portugal
Singapore
Sweden
Linked Data Survey Respondents
7. 2015 2014
Not yet in production 37 27
Less than one year 19 13
More than one year,
less than two years
10 12
More than two years 46 24
How long linked data project or
service in production
Total 112 76
9. Reasons for publishing linked
data 2015 2014
Expose to larger audience on
the Web 67 45
Demonstrate what could be
done with datasets as linked
data 59 41
Heard about linked data and
wanted to try it out by exposing
our data as linked data. 43 21
See if publishing linked data
would improve our Search
Engine Optimization (SEO.) 29 9
10. Types of data published as linked
data
0 10 20 30 40 50 60
Authority files
Bibliographic data
Data about musuem objects
Datasets
Descriptive metadata
Digital collections
Encoded archival descriptions
Geographic data
Ontologies/vocabularies
Other
20. Barriers to publishing linked data 2015
Steep learning curve for staff 40
Inconsistency in legacy data 33
Selecting appropriate ontologies to
represent our data 31
Establishing the links 27
Little documentation or advice on how
to build the systems 21
21. Reasons for consuming linked data 2015 2014
Provide our users with a richer experience. 51 35
Enhance our own data by consuming linked
data from other sources. 50 37
More effective internal metadata
management. 32 16
Greater accuracy and scope in our search
results 27 12
See if consuming linked data would improve
our Search Engine Optimization (SEO). 19 12
Experiment with combining different types of
data into a single triple store. 17 15
Heard about linked data and wanted to try it
out by using linked data sources. 17 13
22. 2015 linked data sources most consumed 2015
VIAF (Virtual International Authority File) 41
DBpedia 36
GeoNames 35
id.loc.gov 35
Resources we convert to linked data
ourselves 17
Getty's AAT 16
FAST (Faceted Application of Subject
Terminology) 15
WorldCat.org 15
data.bnf.fr 12
Deutsche National Bib Linked Data Service 12
24. VIAF
http://viaf.org
Combines multiple name authority files into a single
OCLC-hosted name authority service.
More than 100,000 requests/day
Size: 500 million – 1 billion triples
Consumes:
• GeoNames
• id.loc.gov
• ISNI
• Wikidata
• WorldCat.org
• WorldCat.org Works
RDF Vocabularies/Ontologies:
• Bibliographic Ontology
• Dublin Core & DC Terms
• FOAF
• Owl 2 Web ontology
• RDF schema
• Schema.org
• SKOS
25. id.loc.gov
Enables developers to interact with vocabularies found in
data & standards promulgated by LC as linked data.
More than 100,000 requests/day
Size: 100 million – 500 million triples
Consumes:
• AGROVAC
• data.bnf.fr
• DNB’s Linked Data Service
• id.loc.gov
• VIAF
• Wikidata
• WorldCat.org Works
• Resources we convert to linked data ourselves
RDF Vocabularies/Ontologies:
• BibFrame
• FOAF
• MADS/RDF
• RDF schema
• SKOS
26. Getty’s AAT
http://vocab.getty.edu
A structured vocabulary for generic concepts related to art
and architecture.
More than 100,000 requests/day
Size: 10 million – 50 million triples
Consumes: None RDF Vocabularies/Ontologies:
• Bibliographic Ontology
• Dublin Core & DC Terms
• FOAF
• Local vocabulary
• Owl 2 Web ontology language
• RDF schema
• SKOS
27. FAST
http://id.worldcat.org/fast/
Adapts LC Subject Headings with a simplified syntax to
retain LCSH’s rich vocabulary while making the schema
easier to understand, control, apply and use .
10,000 – 50,000 requests/day
Size: 10 million – 50 million triples
Consumes:
• DBpedia
• GeoNames
• id.loc.gov
• VIAF
RDF Vocabularies/Ontologies:
• Dublin Core & DC Terms
• FOAF
• Schema.org
• SKOS
• WSGS84 Geo Positioning
28. WorldCat.org
OCLC has made WorldCat.org bibliographic metadata
experimentally available in linked data form.
More than 100,000 requests/day
Size: 15 billion triples
Consumes:
• DBpedia
• FAST
• VIAF
• WorldCat.org
RDF Vocabularies/Ontologies:
• Dublin Core
• FOAF
• Schema.org
• SKOS
29. data.bnf.fr
Make the data produced by the Bibliothèque nationale de
France more useful on the Web.
10,000 – 50,000 requests/day
Size: 100 million – 500 million triples
Consumes:
• AGROVAC
• data.bnf.fr
• DBpedia
• DNB’s Linked Data Service
• GeoNames
• id.loc.gov
• ISNI
• VIAF
• http://datos.bne.es (+ others)
RDF Vocabularies/Ontologies:
• Bibliographic Ontology
• Biographical Ontology
• Dublin Core & DC Terms
• FOAF
• FRBR
• ISNI
• Music Ontology
• OAI ORE Terms
• Owl 2 Web ontology
• RDA
• RDF schema
• SKOS
• WSGS84 Geo Positioning …
30. DNB’s Linked Data Service
http://www.dnb.de/EN/lds
Publishes authority and bibliographic data in RDF to
make the data accessible to the semantic Web community
with no need to know library-specific metadata schemes.
Size: 100 million – 500 million triples
Consumes: None RDF Vocabularies/Ontologies:
• Bibliographic Ontology
• Dublin Core & DC Terms
• FOAF
• ISBD
• Owl 2 Web ontology language
• RDA
• RDF schema
• SKOS
31. Barriers to consuming linked data 2015
Matching, disambiguating and
aligning source data and linked data
resources 23
Mapping of vocabulary 17
What's published as linked data is not
always reusable or lacks URIs 16
Lack of authority control 15
Datasets not being updated 14
Size of RDF dumps 12
Understanding how data is structured
before using it 12
32. What would you do differently? 2015
Have more time allocated for its
development 38
Would do nothing differently 30
Get more staff 28
Get wider organizational support 23
Have more realistic expectations 12
33. • Focus on what you want to achieve, not technical stuff.
• Build on what you have that others don’t.
• Pick a problem you can solve.
• Model data that solves your use cases.
• Consider legal issues from the beginning.
• Read as widely as possible, consult community experts.
• Have a good understanding of linked data structure,
available ontologies and your own data.
• Strive for long-term data reconciliation & consolidation.
• Involve your institution/community.
• Experiment and start small.
• Start now! Just do it!
Advice from the implementers
34. Full details of responses
http://www.oclc.org/content/dam/research/activities/linkeddata/oclc-research-
linked-data-implementers-survey-2014.xlsx
The impetus for an “International Linked Data Survey for Implementers” were discussions with OCLC Research Library Partner metadata managers who were aware of a number of linked data projects or services but felt there must be more “out there”. In consultation with a number of colleagues and after some beta testing with a group of linked data implementers with the survey instrument, we conducted an initial survey in July – August 2014. The target audience were those who had already implemented a linked data project or service, or were in the process of doing so. Questions were asked both about publishing linked data and consuming linked data.
I published the results in a series of posts on our HangingTogether blog.
One of the first criticisms we received were that the results did not include some leading linked data implementers such as the national libraries of France and Germany. So we repeated the survey between 1 June and 31 July 2015.
These are the number of institutions reporting one or more linked data project or service, either ones publishing linked data, consuming linked data, or both.
These are the countries represented by the 90 institutions which have implemented or are implementing at least one linked data project or service. US respondents numbered 39, or 43% of the total.
Spain: 10 (11%)
UK: 9 (10%)
Netherlands: 6 (7%)
We were successful in our attempts to solicit responses from more national libraries in the 2015 survey.
This is how I categorized the responding institutions, but others may do it differently.
National Libraries which responded (14): Biblioteca. Real Academia Nacional de Medicina, Bibliotheque nationale de France, British Library, German National Library, Koninklijke Bibliotheek, Library of Congress, National Diet Library, National Library of Malaysia, National Library of Medicine, National Library of Portugal, National Library of Spain, National Library of Sweden, National Library of Wales, National Széchényi Library [Hungary]
Categorized as “network” (10): ABES, BIBSYS, Consorci de Serveis Universitaris de Catalunya, Digital Public Library of America, Europeana Foundation, Haute école de gestion de Genève (SwissBib), North Rhine-Westphalian Library Service Center, OCLC, RERO - Library Network of Western Switzerland, and The European Library.
Government (7): Agencia Española de Cooperación Internacional para el Desarrollo (AECID). Biblioteca della Camera dei deputati (Italy), Biblioteca Valenciana Nicolau Primitiu, Biblioteca Virtual de Derecho Aragonés, Consejería de Educación, Cultura y Deportes Gobierno de Castilla-La Mancha, España, Diputación de Málaga. Cultura y Deportes. Biblioteca Cánovas del Castillo, Ministry of Defense (Spain)
Scholarly (based at one institution but multi-institutional on a theme/discipline) (6): Big Data Institute [Muninn Project, Canadian Writing Research Collaboratory]; Colorado State [datasets from the NSF-funded Shortgrass Steppe-Long-Term Ecological Research station in northern Colorado, for researchers in natural sciences]; Fundacción Ignacio Larramendi (Spain); Pratt Institute [Linked jazz]; University of Alberta Libraries [Canadiana, partners with Pan-Canadian Documentary Heritage Network]; University of Applied Sciences St. Poelten [encyclopedic music data for music magazines, legal information for publishers and semantic tagging/indexing for video files at community TV network.]
Public library/libraries (5): Anythink Libraries, Arapahoe Library District, Evansville Vanderburgh Public Library, New York Public Library, Oslo Public Library
Museum (3): British Museum, J. Paul Getty Trust, Smithsonian
Other: 1 publisher (Springer) and 3 societies (American Numismatic Society, Chemical Heritage Foundation, Minnesota Historical Society)
The 71 institutions responding to the 2015 survey reported a total of 168 linked data projects/services, of which 112 were described. Two-thirds of these linked data projects/services are in production, of which 61% have been in production for more than two years. It’s almost double the number of projects/services in production for over two years reported in 2014l.
In both the 2014 and 2015 surveys, most projects/services both consume and publish linked data. Relatively few only publish linked data.
Although the number of respondents between the two surveys differ, the ranking of the reasons given for publishing linked data are the same.
Given the relatively large representation of libraries among respondents, no surprise that bibliographic and authority data are the most common types of data published, with descriptive metadata a close third.
Other: 5 of the 11 “other” were about organizational data; 2 were data about people (researchers, library staff). 1 about performance works (e.g., shows).
I’ve selected a few examples from the 75 linked data projects or services described that are in production. Not so easy, as they are meant for machines to read, not a human like me. This is just a sampling.
In March 2010 the hbz, several Cologne-based libraries and the Library Centre of Rhineland-Palatinate started an open data initiative as the first German institutions to release library catalog data into the public domain. ” In November 2013 the hbz launched a linked open data API via its service lobid. This API provides access to different kinds of data:
- _bibliographic data_ from the hbz union catalogue with 20 million records and 45 million holdings
- _authority data_ from the German Integrated Authority File (Gemeinsame Normdatei, GND) with subject headings, persons, corporate bodies, events, places and works
_address data_ on libraries and related institutions, taken from the German ISIL registry and the MARC organization codes data base.”
This is one of the larger published linked data sources with 1 – 5 billion triples.
This is from North Carolina State University’s “Organization Name Linked Data”. “Where possible, Acquisitions & Discovery staff created links to descriptions of the same organization in other linked data sources, including the Virtual International Authority File (VIAF), the Library of Congress Name Authority File (LCNAF), Dbpedia,Freebase, and International Standard Name Identifier (ISNI). ”
Springer is the only publisher to respond to our survey. The description of its linked data project:
"In this project we make data about scientific conferences available as Linked Open data. The availability of such a dataset will contribute to the broader goals of publishing the scholarly data as LOD:
– accessible science: data about publications, authors, topics, and conferences should be easy to explore;
– transparent science: the data on productivity and impact of authors, research institutions, and conferences should be open and easy to analyze.”
The British Library was one of the first to make its national bibliography available as linked open data, exposing it in bulk. It is considered successful as it has been selected for the UK National Information Infrastructure and its data model has been influential.
Note that it includes links to both the ISNI and VIAF identifiers for this entity. The end of the page also shows the SPARQL query to retrieve the result that people can modify and re-run.
The National Diet Library reported on 5 different projects in the 2015 survey. One was for publishing bibliographic data as linked data, another on publishing authority data as linked data. This one is to enable comprehensive searching of sounds videos, images, web information and other resources related to the Great Kanto earthquake of 2011.
Slide from Jean Godby, OCLC Research
Example from the British Museum Semantic Web Collection Online, “to join and relate to a growing body of linked data published by other organisations around the world interested in promoting accessibility and collaboration.”
“The Muninn Project is a multidisciplinary, multinational, academic research project investigating millions of records pertaining to the First World War in archives around the world. Our aim is to take archives of digitized documents, extract the written data using massive amount of computing power and turn the resulting information into structured databases. These databases will then support further research in a number of different areas.”
The Pratt Institute reported its Linked Jazz project, which applies Linked Open Data technologies to digital heritage materials and explores the implications of linked data in the user experience. It exposes relationships between musicians and enables jazz enthusiasts to make more connections.
It generated new triples from the content of interview transcripts – from the data rather then converting existing metadata. Transcripts came from Rutgers Institute for Jazz Studies Archives, Smithsonian Jazz Oral Histories, the Hamilton College Jazz Archive, UCLA’s Central Avenue Sounds series, and the University of Michigan’s Nathaniel C. Standifer Video Archive of Oral History.
Slide from Jean Godby, OCLC Research
Again, the ranking of the reasons given for consuming linked data are almost the same among respondents in the 2014 and 2015 surveys.
These are the sources 12 or more of the 2015 survey respondents reported that they consumed. I’ve starred the ones which also responded to the survey. Note that “resources we convert to linked data ourselves” is one of the top linked data sources consumed. One advice from linked data implementers is to first consume the linked data you publish.
As these could be considered successful publishers of linked data by the degree to which others consume the data provided, I thought it would be worthwhile to look at their profiles extracted from their respective responses.
The sources in red font are those which also responded to the survey.
I was struck by the number of respondents who said they would do nothing differently!
We plan to add the responses to the 2015 survey to the spreadsheet with the 2014 responses. You’ll be able to make your own comparisons, or focus on the institutions that most resemble yours.