TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eXtensible Catalog
1. Jennifer Bowen, University of Rochester
DC-2010 Conference
October 20, 2010, Pittsburgh, PA
Moving Library Metadata
toward Linked Data: Opportunities
Provided by the eXtensible Catalog
2. About me…
Currently:
- Librarian
- Technical services administrator
- Software development team co-leader
Formerly:
- Cataloger (MARC)
- Standards developer (RDA)
Maybe someday…Linked Data Expert?
2
3. My Topics Today
3
Is it feasible to turn legacy library
MARC metadata into Linked Data
in an automated environment,
and,
How can eXtensible Catalog (XC)
software play a role in that
process?
Image source: www.blog.kdl.org
4. Semantic Web and Linked Data
Semantic Web: a set of technologies that
allow computers to understand the meaning
of information on the web
Linked Data: a mechanism for exposing,
sharing and connecting data on the web,
using identifiers and relationships
4
5. Linked Data “Expectations of Behavior”
– Use URIs as names for things
– Use HTTP URIs so that people can look up
those names.
– When someone looks up a URI, provide useful
information, using the standards (RDF*,
SPARQL)
– Include links to other URIs so that they can
discover more things.
Tim Berners-Lee,“Design issues”, 2006
http://www.w3.org/DesignIssues/LinkedData.html
5
6. Linked Data: RDF triple
6
This presentation Jennifer Bowen
has creator
ObjectPredicateSubject
7. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
A Reality Check
7
10. Getting Started
To create Linked Data, we need:
–Software to transform legacy data
–Analysis: mapping of legacy metadata to
Linked Data properties
10
11. The software…
11
eXtensible Catalog (XC) is open source,
user-centered, next generation software
for libraries.
XC provides a discovery system and a set
of tools for libraries to manage metadata
and build applications.
12. XC Software Components
User Interface Website on Drupal CMS
Integrated Library System Repository
XC User Interface
Metadata Processing Metadata
Services Toolkit
Connectivity tools NCIP
Toolkit
12
OAI
Toolkit
13. XC’s original metadata goals
- Aggregate MARC and other metadata for
use in new applications
- Define a FRBR-based metadata schema to
support XC’s user-interface functionality
- Create a software application to process
batches of metadata through a set of
services
13
15. XC and Linked Data
How can XC help move legacy library
metadata closer to Linked Data?
NOT among XC’s original goals
However, XC software creates an opportunity
to contribute to this effort and provides
important “lessons learned”
15
16. Converting MARC to Linked Data
What XC software can do:
– Convert MARC codes to vocabulary values
– Remove extraneous data
– Normalize inconsistencies
– Map most MARC fields/subfields and parse to
appropriate FRBR Group 1 entity records
16
17. Converting MARC to Linked Data
Problematic areas:
– Some MARC fields/subfields are difficult to
map to appropriate FRBR entities
– Tracking relationships between FRBR entity
records: How many relationships can we
support with XC software?
17
18. MARC to XC Schema Transformation
Parses MARCXML
records into linked
FRBR-based records Maps MARCXML data
elements to Linked-Data-
Compatible elements in the
XC Schema.
21. Issue: Managing Multiple Relationships
21
MARC bibliographic records can refer to
multiple FRBR entities of the same type
(analytics that represent multiple
works/expressions, e.g. tracks on a CD)
22. Issue: Beyond FRBR Group 1 Entities
22
MARC “Alternate Graphic Representation”
(880 fields) can contain data that belong in
records for Group 2 and Group 3 entities
Contributor:
700 1 ‡6 880‐08 ‡a Vasil’ev, Maksim.
880 1 ‡6 700‐08 ‡a Васильев, Максим.
Subject:
600 10 ‡6 880‐06 ‡a Putin, Vladimir Vladimirovich, ‡d 1952‐
880 10 ‡6 600‐06 ‡a Путин, Владимир Владимирович, ‡d
1952‐
23. If we were to parse this 880 data correctly:
23
Alternative
script of
name from
880
Alternative
script of
subject
from 880
24. Issue: Related Group 1 Entities
Language attribute for a related expression
041 1 ‡a eng ‡h ita
100 0 ‡a Dante Alighieri, ‡d 1265‐1321.
240 10 ‡a Divina commedia. ‡l English
245 14 ‡a The divine comedy / ‡c Dante ; a
new verse translation by C.H. Sisson.
500 ‡a Translation of: Divina commedia.
24
25. If we were to parse 041 ‡h data…
25
Alternative
script of
name from
880
Original
language from
041 ‡h
Alternative
script of
subject
from 880
26. Managing Relationships Between Entities
26
Original
language from
041 $h
Alternative
script of
subject
from 880
Alternative
script of
name from
880
27. •new records
•changed records
•deleted records
•changed
relationships
Maintaining links between separate FRBR
entity records in a production
environment monopolizes system
resources and may not be scalable.
What we are learning from XC
27
28. 28
But wait…
If we can map a
MARC data element
to a FRBR entity, we
can probably convert
it to Linked Data.
What does this emphasis on FRBR have to do
with Linked Data?
FRBR Group 1 Entities
29. 29
But do we have to?
- Do we have to be able to map MARC
elements to a FRBR entity in order to create
Linked Data?
- Would managing RDF triples be more
scalable than managing FRBR-based records
and the relationships between those
records?
30. Best Practices for Linked Data
- Unique identifiers for XC metadata
records
- Data elements from registered schemas
- Registered vocabularies
30
By attempting to follow best practices in
XC for Linked Data, we hope to facilitate
eventual output of XC metadata in RDF.
32. RDF Triple – Record identifiers
32
ObjectPredicateSubject
oai:mst.rochester.edu: MST/
MARCToXCTransformation/
10081
This resource has subject Poets, American
33. Identifiers for XC Schema records
33
<?xml version="1.0" encoding="UTF-8"?>
<xc:frbr xmlns:xc="http://www.extensiblecatalog.info/Elements"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:rdvocab="http://rdvocab.info/Elements" xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:rdarole="http://rdvocab.info/roles">
<xc:entity type="work" id="oai:mst.rochester.edu:MST/MARCToXCTransformation/10081">
<dcterms:subject xsi:type="dcterms:LCC">PS3505.U334</dcterms:subject>
<dcterms:subject xsi:type="dcterms:DDC">811/.52</dcterms:subject>
<dcterms:subject xsi:type="dcterms:DDC">B</dcterms:subject>
<rdarole:author>Sawyer-Lauc<U+0327>anno, Christopher, 1951-</rdarole:author>
<rdvocab:titleOfTheWork>E.E. Cummings :</rdvocab:titleOfTheWork>
<xc:subject xsi:type="dcterms:LCSH">Cummings, E. E. (Edward Estlin), 1894-
1962.</xc:subject>
<xc:subject xsi:type="dcterms:LCSH">Poets,American-20th century-Biography.</xc:subject>
</xc:entity>
</xc:frbr> A persistent, globally unique identifier
for each XC Schema record
34. RDF Triple - Registered Data Elements
34
http://www.
extensiblecatalog.info
/Elements/subject
ObjectPredicateSubject
oai:mst.rochester.edu: MST/
MARCToXCTransformation/
10081
This resource has subject Poets, American
42. Experimenting with Linked Data
- Within a MARC or MARCXML
environment?
- Possible to give each record a
URI
- MARC elements themselves
don’t have URIs
- How to embed multiple URIs for
registered vocabularies in MARC?
42
- XC enables experimentation outside of a MARC
environment with data that originated as MARC
43. Making Linked Data a Priority for XC
– Balancing goals
– Time/funding constraints
– What’s our use case?
– Output of Linked Data from XC vs.
– Using Linked Data within XC?
43
44. XC Linked Data Accomplishments
XC has set the stage for Linked Data by:
- Providing a platform for creating Linked Data
using XC software
- Ensuring that XC Schema records can be
converted to RDF triples as easily as possible
- Enabling others to build upon what we have
accomplished done so far.
44
45. Next Steps
- Monitor RDA implementations
- Develop XC authority control service
- Enable RDF output of XC Schema metadata
- Encourage libraries to use XC software and
contribute to the XC user community
- Seek funding for additional software
development
45