Linked data has been hailed as a disruptive innovation that will change the way we organize and discover information, but what does it really mean for catalogers and metadata creators?
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Strange new world: linked data for catalogers and metadata librarians
1. Strange New World
Linked Data for Catalogers and
Metadata Librarians
Online Northwest Conference 2014
2. Linked Data Basics
•
•
•
•
Records
Statements
Documents
Data
Resource Description Framework (RDF)
Relationships defined in “triples”
– Thing A
•
•
•
•
•
Has relationship to
Thing B
Record becomes a graph formed by triples
Foundation = URIs
Not limited to libraries
Database (closed)
Web (open)
SQL
SPARQL
10. Edward Abbey
is author of
Desert Solitaire
[http://lccn.loc.gov/n78093802]
[http://purl.org/dc/elements/1.1/creator]
[http://www.worldcat.org/oclc/17353644]
77. Linked Data Resources
• Linked Open Data: the Essentials by Florian Bauer and Martin
Kaltenbock. (http://www.semantic-web.at/LOD-TheEssentials.pdf)
• Guides and tutorials on Linked Data Cloud
(http://linkeddata.org/guides-and-tutorials)
• Stanford Linked Data Workshop (2011)
(http://lib.stanford.edu/files/Stanford_Linked_Data_Workshop_Rep
ort_FINAL.pdf)
• Library Technology Reports:
– By Karen Coyle
• v. 46, no. 1, 2010. “Understanding the Semantic Web: Bibliographic Data and
Metadata”
• v. 48, no. 4, 2012. “Linked Data Tools: Connecting on the Web”
– By Erik T. Mitchell
• V. 49, no. 5, 2013. “Library Linked Data: Research and Adoption”
Linked data is all about breaking documents/records into individual statements represented by RDF triples. Each element of the triple has a data value, which can be a URI or a literal value.Much as we use SQL to search our existing databases, SPARQL is the new query language that will enable searching a linked data environment.
Need to get our data where the users are—that’s usually not the library’s page. (This graph is taken from the University of Northern Colorado’s 2013 LibQual+® Survey.)
Authority based on text match are easily broken.
We are accustomed to manipulating/acting on bits of data using some of our existing software programs (MS Excel here). Image the power of an open web of data.
http://www.gapminder.org/Here’s one example of a website using various data repositories to create powerful visualizations.
A tiny sampling of the data utilized by Gapminder.
http://well-formed.eigenfactor.org/Another data visualization.
To understand the underlying concepts of linked data, we need to start at the basics. We can think of linked data as a series of interlinking RDF triples. All the linkages can be displayed as a graph.
Here’s the simple RDF triple. Subject, predicate, object. Here you see that you can use an URI to represent each piece of this data.
When describing a subject, many statements are aggregated.
Of course, as data sets are published on the web. The same concept may be found in many places. The Web Ontology Language (OWL) uses the “sameas” to cross reference data.
http://sameas.orgThere’s even resources to help.
Here you see the owl:sameAs references for the dbpedia page for F. Scott Fitzgerald.
Now we’re going to look at some of the pioneers in linked data.One project, Schema.org is a collaboration between:Google, bing, yahoo, yandex
Facebook’s knowledge graph is an example of linked data in use. Think about it: in Facebook, every time you hit a “like” or “friend” someone, you’ve just established a RDF triple. You, like, thing. Jane, if friend of, Tom. This is the data feeding the knowledge graph.
For instance, here are musicians my fiends have “liked”.
You can even dig further.
In order for us to make use of the data for our own purposes, the data must be open. Social media is notoriously in silos. Many library vendors working in linked data are also doing so in a silo. You can only use that data if you pay for it. The goal of the linked data movement is open data that’s available to all to use as needed.
Google knowledge graphAlready seeing some changes to this. So far, the book links go to Google book project.
Those of us in library cataloging will recognize what Google is doing here as defining specific “works”. Is Google FRBR-izing before libraries?
Wikipedia has a tremendous amount of data behind it.
http://dbpedia.org/page/F._Scott_FitzgeraldYou can see this data when you view it in Dbpedia.
For instance, if you would like to see other people associated with the Lost Generation movement, Dbpedia can do that for you.
As I mentioned earlier, the Google knowledge graph does not lead easily back to libraries. For something as popular and widespread as the “Hero with a Thousand Faces” you do not see a link to library resources anywhere on the first page of hits.
OCLC has done quite a bit of work with linked data. You can see it if you search for something more obscure. This is a title in my library’s holdings. With something this obscure, WorldCat is the first hit.
This exposure is partially achieved by WorldCat’s incorporation of schema.org. You can see this by expanding the “Linked Data” section at the bottom of the WorldCat results.
So, how do libraries make use of linked data and catch up to the power players on the Web?
A lot of library data has already been published as linked data.
The Library of Congress is taking linked data and the open web environment seriously. Let’s look at a few of the projects LOC is doing.
Viewshare is a LOC tool that offers libraries a way to publish some of their collections on line, augment the data with linked data sets, and create various views and visualizations of that data.You may request a free account at viewshare.org.
You can upload your data in various forms.
You can add fields that are augmented with open data sets. Dates/times and geographic locations are good examples of fields that can be augmented.
With the geographic information, you can then create map views. Dates can be used to create timeline views.
With image collections, you can generate galleries. You can also insert simple widgets like the search box and list on the left. Tag clouds are another option. Viewshare can be used by libraries with limited resources to publish their unique collections online.
Most catalogers will be familiar with LC’s Bibframe initiative. This is the project working on a replacement for MARC. The goal is a tool that will represent bibliographic data in a way that opens it up to the web and utilizes linked data to better expose library resources.
Bibrame.orgThe site has lots of good information. A good place to start learning about Bibframe is the model primer document available in the getting started section. We’re going to look at the information and resources this site makes available.
Here’s the four main classes of the Bibframe model.
The Bibrame vocabulary is a work in progress. It is published on this site, and the goal is to soon have a stable vocabulary.
You can also find working documents that discuss the development of the Bibframe model.
We’re going to focus on the resources available in the “tools” tab.
I believe these tools are invaluable for catalogers. They offer a way for us to begin thinking of our records in terms of Bibrame. First, we’ll look at the comparison service. To use this, all you need is the 001 field from a LC bibliographic record.
I have located an 001 from the Library of Congress catalog.
You just paste that 001 number into the comparison tool and search.
Here’s the same record in MARC/XML. We can click at the top on “Bibframe RDF/XML” to view it in RDF.
Here we have the catalog record of the future. But before we look too closely at this, let’s refresh our memories about the Bibframe structure.
A Bibframe work is basically equivalent to what we’ve come to think of in FRBR terms as both works and expressions. A Bibrame instance roughly equates to both manifestations and items.
We can now tease out the parts of the Bibframe model in our comparison example. Keep in mind that the goal is to think of this less as a record and instead as an aggregate of statements.
Now, I’m going to show you how to take your own MARC records and convert them into Bibframe using the transformation service.
I started by creating a list in my local ILS. We have a local collection of materials by the author Connie Willis. I created a list of various formats of her novel “Doomsday Book.”
I then converted that file into MARC/XML using the freely available MarcEdit software.
Here’s what you get in MarcEdit.
Click on the option here to “Paste MARC/XML”
Copy your data from MarcEdit.
Paste it into the transformation tool window.
The tool will generate a URL that you will be able to use for at least “a few days.”
Here you see the first bit of the generated view. We have a collection title that Bibframe is pulling out first. Remember, it is a work in progress.
Bibframe has generated several “work” records, here it looks like they’re coming from the series statements. You can click on any one of these to see the RDF behind that piece of information.
Here you see an instance associated with a Bibframe work.
You may have noticed the stars beside some of the data lines. This is where we already begin to see some linked data capability. If you click on the star, you will see this bit of data in LC’s linked data service, id.loc.gov.
Id.loc.govNotice our URI for Connie Willis.
If you click on the information itself, you will see the RDF for that piece of data.
You see what we’ve been led to expect in a linked data world. Notice the URI from id.loc.gov.
If you’d like to try the transformation service and don’t have a list of your own, there is a sample set of records provided on the “contribute” tab.The link to join the Bibframe listserv is also here, as well as some tools for the coders among us.
To wrap up, I’d just like to showcase some library-related projects that demonstrate how awesome open data can be. These visualizations are fantastic examples of linked data in action.The first is a project from Stanford. The map and timeline are fed by data from the Library of Congress’s Chronicling America collection.
This data visualization is from the Kansas City Public Library. It displays data from their Civil War on the Western Border collection.
I don’t believe we can assume that this is something we won’t have to “worry about” for a good time to come. The growth of the Web and the rate of development of Web 2.0 is exponential. If we don’t move and move rapidly, libraries are going to be left behind.
We’ve all seen these linked data diagrams. This is March 2009.
This is two years later in 2011. What does it look like now?