Building a Linked Open Data Set

Implementing a Linked
Open Data set

Joel Richard
Smithsonian Libraries
richardjm@si.edu

SLA Annual Conference, July

Who are the Smithsonian Libraries?
• 20 Libraries in the U.S. and Panama
• Supports research of staff and the public
• Strong effort to digitize pre-1923 texts
• Taxonomic Literature II is one of these
texts

Joel Richard, SLA Annual Conference, July

Summary of Agenda

• Our data set and process
• Conversion to Linked Data
• Storing Linked Data
• Examples and More Info
• Summary
• … and Best brew pubs in Chicago


Disclaimer

We are still learning.


What is Linked Data?
HTTP URIs identify things to Humans and
computers
Identifiers are related to other identifiers (or
values) via predicates in a “triple”:

Charles Darwin // Creator // On the Origin of Species

See also :
http://linkeddata.org/
http://en.wikipedia.org/wiki/Linked_Data
http://richard.cyganiak.de/2007/10/lod/

Joel SLA Annual Conference, July

http://richard.cyganiak.de/2007/10/lod/


Taxonmic Literature II

Essential Reference
Tool for Botanists

Authors and their
Publications from
1753 to 1940

It is a “database in book form.”

Our process

Scanned the pages
Hired contractor for OCR and correction
(99.97% accuracy)
Received XML dataset from Contractor

Verified and Imported to SQL Server
Built a website to search the data


Great! Let’s make some linked data!

First...what does 99.97% accuracy mean?

~12,000 Errors



Select Identifiers for your data
http://library.si.edu/tl-2/author/darwin
http://library.si.edu/tl-2/title/origin_of_species
http://library.si.edu/tl-2/title/1313

Choose vocabularies for
predicates(harder than it sounds)

OWL, FOAF, DublinCore, OpenGraph,
SIOC, SKOS, BIBO, etc.


Mondeca Labs
Linked Open
Vocabularies (LOV)

Vocabulary of a Friend
(VOAF)

A vocabulary for
describing other
vocabularies

http://labs.mondeca.com/dataset/lov


http://library.si.edu/tl2/author/darwin

tl2:creator
http://library.si.edu/tl2/title/1313

owl:sameAs
http://viaf.org/viaf/27063124

http://library.si.edu/tl2/title/origin…
dc:creator

owl:sameAs
http://www.archive.org/details/
originofspecies00darwuoft


RDF Type = foaf:Person

foaf:lastName, foaf:familyName

foaf:firstName, foaf:givenName

foaf:name, skos:prefLabel

tl2:birthYear

tl2:deathYear

skos:definition

tl2:personAbbreviation

http://library.si.edu/tl2/title/origin…
RDF Type = bibo:Book

tl2:titleNumber

dc:title

event:place

dc:publisher

tl2:titleAbbreviation

dc:created



How are we going to store all this?
We’re using Drupal. RDFa is built-in, RDF
extensions is an add-on module.

Probably not a good idea for very large
datasets.

TL-2: 10,000 authors + 37,000 titles
becomes about 400,000 triples.


Storage considerations

Performance of Drupal Import:
Feeds Import: 7 Hours for 35k Records
Other options? Still searching…

Our linked data set will grow to at least
600-700k Drupal nodes.

Is Drupal the best way to do this?


Storage considerations

2000 US Census
19 million households received “long form”

Joshua Tauberer: converted to 1bln triples
http://www.rdfabout.com/demo/census/

Carefully consider your storage options!


Storage

ARC2 used by Drupal 7
RDBMS via D2RQ

RDBMS via Triplify
OpenLink Virtuoso
See Also:
http://www.w3.org/2001/sw/rdb2rdf/use-cases/


Linked Data. What’s the point?

Disambiguation
Connecting Relevant Information

More visible via search
Enrichment of your data

Easier reuse of data


http://en.openei.org/apps/mashathon2010/


http://data.nytimes.com/schools/schools.html


http://data.nytimes.com/N38444093941437235523


http://www.worldcat.org/oclc/7619054

Other Examples and Info
Library of Congress: Linked Data Services
http://id.loc.gov/

Schema.org
http://www.schema.org

Data.gov / Semantic
http://www.data.gov/semantic

Linked Data.org
http://linkeddata.org/

Stephen Dale: Linked Data in Action
http://www.slideshare.net/stephendale/linked-data-in-action-4487244


Thank you!
richardjm@si.edu
http://slideshare.net/joelrichard

?

Building a Linked Open Data Set

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Building a Linked Open Data Set

Ähnlich wie Building a Linked Open Data Set (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Building a Linked Open Data Set

Hinweis der Redaktion