SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Building the New
    Open Linked
          Library
              (Revisited)


                       Joel Richard
         LITA National Forum 2012
                   October 5, 2012
Smithsonian Libraries
• Founded in 1846
• 1.5 m volumes in collection, plus assorted
  archival collections
• 15,000 volumes scanned and online
• 20 libraries serving ~500 researchers/curators
  + hundreds of fellows and interns
• 105 library staff
• 1.5 web staff
• Founding member of the Biodiversity
  Heritage Library


                           Le Garde-meuble, ancien et moderne [Furniture repository, ancient and modern], 1839-1935
(From 2011)
Drupal and Linked Data
• Native support for RDFa in Drupal 7.
• RDF Extensions (rdfx) – even more features.
• Vocabularies can be imported and cached for
  reuse.
• Few or no modifications to HTML to support
  RDFa.


What’s the difference between RDF,
RDF/XML and RDFa?
                                  LITA National Forum, September 30,
                                  2011
(From 2011)
RDF/XML Sample
 URI: http://library.si.edu/book/origin-of-species.rdf

 <?xml version="1.0" encoding="UTF-8"?>
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/terms/"
   xmlns:bibo="http://purl.org/ontology/bibo/">

   <rdf:Description rdf:about="http://localhost:8087/content/
         origin-species">

     <rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/>
     <dc:title>The Origin of Species</dc:title>
     <dc:created>November 24, 1859</dc:created>
     <bibo:numPages>1000</bibo:numPages>
     <dc:language>english</dc:language>
     <bibo:authorList
                   rdf:resource="http://localhost:8087/content/darwin-charles"/>

     <owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”>
   </rdf:Description>
 </rdf:RDF>




                                                         LITA National Forum, September 30,
                                                         2011
TL-2 Page Sample                          (From 2011)


                   http://library.si.edu/tl2/author/darwin

                     tl2:creatorOf
                     http://library.si.edu/tl2/book/1313

                     owl:sameAs
                     http://viaf.org/viaf/27063124




                   http://library.si.edu/tl2/book/1313
                     dc:creator
                     http://library.si.edu/tl2/author/darwin

                     owl:sameAs
                     http://www.archive.org/details/
                         originofspecies00darwuoft




                   LITA National Forum, September 30,
                   2011
TL-2 Page Sample Results                                                          (From 2011)

http://library.si.edu/tl2/author/darwin            http://library.si.edu/tl2/book/1313

tl2:creatorOf                                      dc:creator
   “http://library.si.edu/tl2/book/1313”              “http://library.si.edu/tl2/author/darwin”

owl:sameAs                                         owl:sameAs
   “http://viaf.org/viaf/27063124”                    ”http://www.archive.org/details/
                                                       originofspecies00darwuoft”
foaf:lastName “Darwin”
                                                   tl2:bookNumber “1313”
foaf:familyName “Darwin”
                                                   bibo:shortTitle “On the origin of species”
foaf:firstName “Charles”
                                                   dc:title “On the origin of species by means
foaf:givenName “Charles”                              of natural selection, or the preservation
                                                      of favoured races in the struggle for
foaf:name “Darwin, Charles Robert”                    life.”

skos:prefLabel “Darwin, Charles Robert”            event:place “London”

tl2:birthYear “1809”                               dc:publisher “John Murray”

tl2:deathYear “1882”                               dc:created “1859”

tl2:description “British evolutionary biologist”   tl2:bookAbbreviation “Origin sp.”

tl2:personAbbrev “Darwin”




                                                           LITA National Forum, September 30,
                                                           2011
(From 2011)




LITA National Forum, September 30,
2011
(From 2011)
                           Who is reusing our data?
Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/




                                               LITA National Forum, September 30,
                                               2011
(From 2011)
                          Who is reusing our data?
Encyclopedia of Life – http://eol.org/




                                         LITA National Forum, September 30,
                                         2011
Linked Data Review
• Publishing structured data on the web
• RDF (Resource Description Framework)
• Enables queries computer 2 computer
• Uses standard ontologies (vocabularies)
• Data in is presented as “triples”

URI      http://library.si.edu/tl2/author/charles-darwin
Predicate owl:sameAs
Object   http://viaf.org/viaf/27063124
Linked Data In Action
Google Knowledge Graph
Linked Data Review


                                   “Feb 12 1809”
                  Born On
                                                   Type        City
                         Born In
 Charles Darwin                     Shrewsbury
                                                      Is In

                                                              England
                  Type
                                      Person                            Type


                                                                  Country
Our Website
Organically grown since 1995

 •   83,000 HTML pages
 •   3,700 ColdFusion pages
 •   253,000 JPEG files
 •   27,000 PNG files
 •   46,000 PDFs

 No CMS for legacy information

 Now using Drupal for “Brochure-ware”
Content Analysis
• 400+ Online “books”
• Exhibitions
• Research Tools
• Image Collections (16,000+ images)
• “Brochure” content (About us, Locations, Hours)
• Bibliographies, Fact Sheets, Subject Guides
• Databases, inventories, and database-like books

 Collections not on our website:
• ~15,000 digitized volumes, with many more planned
• Other analog collections that will be digitized

          Bureau of American Ethnology Bulletin 164; Sewing Machine Trade Literature; Underwater Web Exhibition, Smithsonian Libraries
Linked Data in our Library
Books (and book-like objects)
  • Expose bibliographic data for reuse
  • Consume links to other internal
    content and external authoritative
    data
Databases
  • Expose data previously unavailable
  • Provide authoritative data
  • Consume our data and others’ to
    create new aggregate websites
Linked Data in our Books
                     http://library.si.edu/tl2/author/darwin
                     RDF Type = foaf:Person

                        foaf:lastName, foaf:familyName

                        foaf:firstName, foaf:givenName

                        foaf:name, skos:prefLabel

                        tl2:birthYear

                        tl2:deathYear

                        tl2:description

                        tl2:personAbbrev

                     http://library.si.edu/tl2/book/1313
                     RDF Type = bibo:Book

                        tl2:bookNumber

                        dc:title

                        event:place

                        dc:publisher

                        tl2:bookAbbreviation

                        dc:created
Linked Data Tools (Drupal)
• Fields, Views, Views UI
• Node Reference
• SPARQL Endpoint , SPARQL API
• RESTful Web Services
• SPARQL Views
• RDF External Vocabulary Importer


Caveat: Some modules not ready for Drupal 7
   • i.e., Biblio module (no CCK, RDF capabilities)
Disclaimer
    We are still learning!

  How to effectively use Drupal

 What goes into a Digital Library

      How to best leverage
       Linked Open Data

(Also: We will always be learning.)

                       J. L. Hammett Illustrated Catalogue of School Merchandise 1872-1873…, 1872-1874
What is a Digital Library?
 More than a virtual stack of books
 Digital allows more capabilities, access
 Interlinked Content (See more from this item)

What content will be in our digital library?


 Digitized Books           Lists / Bibliographies
 Image Library             Smithsonian Publications
 Collections (of things)  Videos
 Exhibitions               “Trade Literature” and
 Databases                  other non-cataloged items
Knowledge/Data Sharing
Taxonomic Literature II     Index Animalium
 Essential botanical        35 Volumes
  reference                  430,000 Scientific
 15 volumes
                              Names
                             Each with a citation to
 9,000 Botanists
                              first description
 37,000 Titles authored     7000+ items in the
  by these botanists          bibliography, many
 More modern, simpler to     linked to WorldCat
  handle                     Older, challenging in
                              nature
Our Process for TL-2
Scanned the pages

Hired contractor for OCR and correction
 (99.97% accuracy)

Received XML dataset from Contractor

Verified and Imported to SQL Server
Built a website to search the data
TL-2 Today
Before we import…


    What exactly does 99.97% accuracy mean?



       ~12,000 Errors
Importing
Millions of records are no problem for
modern databases. But, how to get data
into Drupal?

 Use existing tools?


 Create my own import?




                              The Muralo Company Muralo: Sanitary Wall Coatings in the Home, 1912
Importing
Import via existing tools

 Used Drupal’s Feeds Importer
 Typically used for importing RSS or similar
 Fast to set up (< 5 minutes)
 Slow to import (47,000 records = 8+ hours)
 Poor error recovery (imported 5 times)
 What if the data changes in the future?

                 Faster ≠ Better
Importing
Write my own import. But how?

 Make a Drupal Module!
 Steep Learning Curve (many APIs)
 Faster to set up (48,000 records = 85 minutes)
 Added bonus: Modules can be versioned
 Can use the “version update” code to update our data
 Versioned modules good for Dev / Prod servers
Importing
Digitized Books Online

 Similar module for importing
 Module also handles a page for reading books online
 Uses Internet Archive book reader in an <IFRAME>
 Links to WorldCat / VIAF
 FAST Subjects
 Table of Contents Navigation
 Eligible for Linked Open Data


http://archive.org/details/smithsonian
Data Schema: British Library




http://talis-systems.com/wp-content/uploads/2011/07/British-Library-Data-Model-v1.01.pdf
Data Schema
What data model are we going to use?
 British Library
 Schema.org
 Something else?


What vocabularies are we using?
 Dublin Core        FOAF
 OWL                Event?
 SKOS               Org?
 BIBO               Geo?
 BIO                Our own vocabulary for TL-2
Other Content
Galaxy of Images
 Image collection of plates from our digitized books
 18,000 images and growing
 Richer set of metadata
 Data needs to be massaged / imported
 Images served from another system



http://www.sil.si.edu/imagegalaxy/
Other Content
Videos
 All are currently on YouTube
 Will remain there for now
 Metadata to be imported to Digital Library
 Will eventually be served from our network


http://www.youtube.com/smithsonianlibraries
Other Content
 Collections and Exhibitions
 Bibliographies, lists, subject guides
 Trade Literature
    Sewing machines!
    Scientific equipment!
    Seed Catalogs!
 Smithsonian Publications (DSpace)
 Smithsonian Libraries Blog
 Art and Artist Vertical Files



                                          W. Atlee Burpee & Co. Burpee's New Annual for 1910, 1910
Future Work
 More planning!
 Developing a LOD Vocabulary for
  TL-2
 Continued parsing of content in
  TL-2
 Continuing the development of
  the Index Animalium content
 Publishing the Index Animalium
  on the web as LOD

 How to leverage linked data to
 create… what?

                               Leopoldo Galluzzo Altre scoverte fatte nella luna dal Sigr. Herschel , 1836
Thank you!

Joel Richard
richardjm@si.edu
@cajunjoel
http://slideshare.net/joelrichard
http://library.si.edu/staff/richardjm

Weitere ähnliche Inhalte

Was ist angesagt?

An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataJose Emilio Labra Gayo
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)ALATechSource
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes senseFabien Gandon
 
Best Practices for multilingual linked open data
Best Practices for multilingual linked open dataBest Practices for multilingual linked open data
Best Practices for multilingual linked open dataJose Emilio Labra Gayo
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011Ross Singer
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparqlDhavalkumar Thakker
 
Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)ALATechSource
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Richard Urban
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapesJose Emilio Labra Gayo
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 

Was ist angesagt? (19)

An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
Best Practices for multilingual linked open data
Best Practices for multilingual linked open dataBest Practices for multilingual linked open data
Best Practices for multilingual linked open data
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 
Semantic Web - OWL
Semantic Web - OWLSemantic Web - OWL
Semantic Web - OWL
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 

Andere mochten auch

Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked LibraryJoel Richard
 
Unlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataUnlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataJoel Richard
 
Building a Linked Open Data Set
Building a Linked Open Data SetBuilding a Linked Open Data Set
Building a Linked Open Data SetJoel Richard
 
Linked Open Data and Systematic Taxonomy
Linked Open Data and Systematic TaxonomyLinked Open Data and Systematic Taxonomy
Linked Open Data and Systematic TaxonomyJoel Richard
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

Andere mochten auch (6)

Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked Library
 
Unlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataUnlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open Data
 
Building a Linked Open Data Set
Building a Linked Open Data SetBuilding a Linked Open Data Set
Building a Linked Open Data Set
 
Linked Open Data and Systematic Taxonomy
Linked Open Data and Systematic TaxonomyLinked Open Data and Systematic Taxonomy
Linked Open Data and Systematic Taxonomy
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Ähnlich wie Lita national forum 2012

Building the new open linked library: Theory and Practice
Building the new open linked library: Theory and PracticeBuilding the new open linked library: Theory and Practice
Building the new open linked library: Theory and PracticeTrish Rose-Sandler
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 
Consuming linked data by machines
Consuming linked data by machinesConsuming linked data by machines
Consuming linked data by machinesPatrick Sinclair
 
Richard Wallis Linked Data
Richard Wallis Linked DataRichard Wallis Linked Data
Richard Wallis Linked DataIncisive_Events
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...ICZN
 
Unlocking indexanimaliumstatic
Unlocking indexanimaliumstaticUnlocking indexanimaliumstatic
Unlocking indexanimaliumstaticSCPilsk
 
Linked Data - Exposing what we have
Linked Data - Exposing what we haveLinked Data - Exposing what we have
Linked Data - Exposing what we haveRichard Wallis
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
Smithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSmithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSCPilsk
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?ESPOL
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasadoespol
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM PresentationHafabe
 
The Power of Sharing Linked Data: Giving the Web What It Wants
The Power of Sharing Linked Data: Giving the Web What It WantsThe Power of Sharing Linked Data: Giving the Web What It Wants
The Power of Sharing Linked Data: Giving the Web What It WantsNASIG
 
The Power of Sharing Linked Data (NASIG)
The Power of Sharing Linked Data (NASIG)The Power of Sharing Linked Data (NASIG)
The Power of Sharing Linked Data (NASIG)Richard Wallis
 
Publishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesPublishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesNikolaos Konstantinou
 

Ähnlich wie Lita national forum 2012 (20)

Building the new open linked library: Theory and Practice
Building the new open linked library: Theory and PracticeBuilding the new open linked library: Theory and Practice
Building the new open linked library: Theory and Practice
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Consuming linked data by machines
Consuming linked data by machinesConsuming linked data by machines
Consuming linked data by machines
 
From Record to Graph
From Record to GraphFrom Record to Graph
From Record to Graph
 
Richard Wallis Linked Data
Richard Wallis Linked DataRichard Wallis Linked Data
Richard Wallis Linked Data
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
 
Unlocking indexanimaliumstatic
Unlocking indexanimaliumstaticUnlocking indexanimaliumstatic
Unlocking indexanimaliumstatic
 
Linked Data - Exposing what we have
Linked Data - Exposing what we haveLinked Data - Exposing what we have
Linked Data - Exposing what we have
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
Smithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSmithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in Research
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasado
 
Why link?
Why link?Why link?
Why link?
 
Why Link?
Why Link?Why Link?
Why Link?
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
 
The Power of Sharing Linked Data: Giving the Web What It Wants
The Power of Sharing Linked Data: Giving the Web What It WantsThe Power of Sharing Linked Data: Giving the Web What It Wants
The Power of Sharing Linked Data: Giving the Web What It Wants
 
The Power of Sharing Linked Data (NASIG)
The Power of Sharing Linked Data (NASIG)The Power of Sharing Linked Data (NASIG)
The Power of Sharing Linked Data (NASIG)
 
Publishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesPublishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web Technologies
 

Lita national forum 2012

  • 1. Building the New Open Linked Library (Revisited) Joel Richard LITA National Forum 2012 October 5, 2012
  • 2. Smithsonian Libraries • Founded in 1846 • 1.5 m volumes in collection, plus assorted archival collections • 15,000 volumes scanned and online • 20 libraries serving ~500 researchers/curators + hundreds of fellows and interns • 105 library staff • 1.5 web staff • Founding member of the Biodiversity Heritage Library Le Garde-meuble, ancien et moderne [Furniture repository, ancient and modern], 1839-1935
  • 3. (From 2011) Drupal and Linked Data • Native support for RDFa in Drupal 7. • RDF Extensions (rdfx) – even more features. • Vocabularies can be imported and cached for reuse. • Few or no modifications to HTML to support RDFa. What’s the difference between RDF, RDF/XML and RDFa? LITA National Forum, September 30, 2011
  • 4. (From 2011) RDF/XML Sample URI: http://library.si.edu/book/origin-of-species.rdf <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:bibo="http://purl.org/ontology/bibo/"> <rdf:Description rdf:about="http://localhost:8087/content/ origin-species"> <rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/> <dc:title>The Origin of Species</dc:title> <dc:created>November 24, 1859</dc:created> <bibo:numPages>1000</bibo:numPages> <dc:language>english</dc:language> <bibo:authorList rdf:resource="http://localhost:8087/content/darwin-charles"/> <owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”> </rdf:Description> </rdf:RDF> LITA National Forum, September 30, 2011
  • 5. TL-2 Page Sample (From 2011) http://library.si.edu/tl2/author/darwin tl2:creatorOf http://library.si.edu/tl2/book/1313 owl:sameAs http://viaf.org/viaf/27063124 http://library.si.edu/tl2/book/1313 dc:creator http://library.si.edu/tl2/author/darwin owl:sameAs http://www.archive.org/details/ originofspecies00darwuoft LITA National Forum, September 30, 2011
  • 6. TL-2 Page Sample Results (From 2011) http://library.si.edu/tl2/author/darwin http://library.si.edu/tl2/book/1313 tl2:creatorOf dc:creator “http://library.si.edu/tl2/book/1313” “http://library.si.edu/tl2/author/darwin” owl:sameAs owl:sameAs “http://viaf.org/viaf/27063124” ”http://www.archive.org/details/ originofspecies00darwuoft” foaf:lastName “Darwin” tl2:bookNumber “1313” foaf:familyName “Darwin” bibo:shortTitle “On the origin of species” foaf:firstName “Charles” dc:title “On the origin of species by means foaf:givenName “Charles” of natural selection, or the preservation of favoured races in the struggle for foaf:name “Darwin, Charles Robert” life.” skos:prefLabel “Darwin, Charles Robert” event:place “London” tl2:birthYear “1809” dc:publisher “John Murray” tl2:deathYear “1882” dc:created “1859” tl2:description “British evolutionary biologist” tl2:bookAbbreviation “Origin sp.” tl2:personAbbrev “Darwin” LITA National Forum, September 30, 2011
  • 7. (From 2011) LITA National Forum, September 30, 2011
  • 8. (From 2011) Who is reusing our data? Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/ LITA National Forum, September 30, 2011
  • 9. (From 2011) Who is reusing our data? Encyclopedia of Life – http://eol.org/ LITA National Forum, September 30, 2011
  • 10. Linked Data Review • Publishing structured data on the web • RDF (Resource Description Framework) • Enables queries computer 2 computer • Uses standard ontologies (vocabularies) • Data in is presented as “triples” URI http://library.si.edu/tl2/author/charles-darwin Predicate owl:sameAs Object http://viaf.org/viaf/27063124
  • 11. Linked Data In Action Google Knowledge Graph
  • 12. Linked Data Review “Feb 12 1809” Born On Type City Born In Charles Darwin Shrewsbury Is In England Type Person Type Country
  • 13. Our Website Organically grown since 1995 • 83,000 HTML pages • 3,700 ColdFusion pages • 253,000 JPEG files • 27,000 PNG files • 46,000 PDFs No CMS for legacy information Now using Drupal for “Brochure-ware”
  • 14. Content Analysis • 400+ Online “books” • Exhibitions • Research Tools • Image Collections (16,000+ images) • “Brochure” content (About us, Locations, Hours) • Bibliographies, Fact Sheets, Subject Guides • Databases, inventories, and database-like books  Collections not on our website: • ~15,000 digitized volumes, with many more planned • Other analog collections that will be digitized Bureau of American Ethnology Bulletin 164; Sewing Machine Trade Literature; Underwater Web Exhibition, Smithsonian Libraries
  • 15. Linked Data in our Library Books (and book-like objects) • Expose bibliographic data for reuse • Consume links to other internal content and external authoritative data Databases • Expose data previously unavailable • Provide authoritative data • Consume our data and others’ to create new aggregate websites
  • 16. Linked Data in our Books http://library.si.edu/tl2/author/darwin RDF Type = foaf:Person foaf:lastName, foaf:familyName foaf:firstName, foaf:givenName foaf:name, skos:prefLabel tl2:birthYear tl2:deathYear tl2:description tl2:personAbbrev http://library.si.edu/tl2/book/1313 RDF Type = bibo:Book tl2:bookNumber dc:title event:place dc:publisher tl2:bookAbbreviation dc:created
  • 17. Linked Data Tools (Drupal) • Fields, Views, Views UI • Node Reference • SPARQL Endpoint , SPARQL API • RESTful Web Services • SPARQL Views • RDF External Vocabulary Importer Caveat: Some modules not ready for Drupal 7 • i.e., Biblio module (no CCK, RDF capabilities)
  • 18. Disclaimer We are still learning! How to effectively use Drupal What goes into a Digital Library How to best leverage Linked Open Data (Also: We will always be learning.) J. L. Hammett Illustrated Catalogue of School Merchandise 1872-1873…, 1872-1874
  • 19. What is a Digital Library?  More than a virtual stack of books  Digital allows more capabilities, access  Interlinked Content (See more from this item) What content will be in our digital library?  Digitized Books  Lists / Bibliographies  Image Library  Smithsonian Publications  Collections (of things)  Videos  Exhibitions  “Trade Literature” and  Databases other non-cataloged items
  • 20. Knowledge/Data Sharing Taxonomic Literature II Index Animalium  Essential botanical  35 Volumes reference  430,000 Scientific  15 volumes Names  Each with a citation to  9,000 Botanists first description  37,000 Titles authored  7000+ items in the by these botanists bibliography, many  More modern, simpler to linked to WorldCat handle  Older, challenging in nature
  • 21. Our Process for TL-2 Scanned the pages Hired contractor for OCR and correction (99.97% accuracy) Received XML dataset from Contractor Verified and Imported to SQL Server Built a website to search the data
  • 23. Before we import… What exactly does 99.97% accuracy mean? ~12,000 Errors
  • 24. Importing Millions of records are no problem for modern databases. But, how to get data into Drupal?  Use existing tools?  Create my own import? The Muralo Company Muralo: Sanitary Wall Coatings in the Home, 1912
  • 25. Importing Import via existing tools  Used Drupal’s Feeds Importer  Typically used for importing RSS or similar  Fast to set up (< 5 minutes)  Slow to import (47,000 records = 8+ hours)  Poor error recovery (imported 5 times)  What if the data changes in the future? Faster ≠ Better
  • 26. Importing Write my own import. But how?  Make a Drupal Module!  Steep Learning Curve (many APIs)  Faster to set up (48,000 records = 85 minutes)  Added bonus: Modules can be versioned  Can use the “version update” code to update our data  Versioned modules good for Dev / Prod servers
  • 27. Importing Digitized Books Online  Similar module for importing  Module also handles a page for reading books online  Uses Internet Archive book reader in an <IFRAME>  Links to WorldCat / VIAF  FAST Subjects  Table of Contents Navigation  Eligible for Linked Open Data http://archive.org/details/smithsonian
  • 28. Data Schema: British Library http://talis-systems.com/wp-content/uploads/2011/07/British-Library-Data-Model-v1.01.pdf
  • 29. Data Schema What data model are we going to use?  British Library  Schema.org  Something else? What vocabularies are we using?  Dublin Core  FOAF  OWL  Event?  SKOS  Org?  BIBO  Geo?  BIO  Our own vocabulary for TL-2
  • 30. Other Content Galaxy of Images  Image collection of plates from our digitized books  18,000 images and growing  Richer set of metadata  Data needs to be massaged / imported  Images served from another system http://www.sil.si.edu/imagegalaxy/
  • 31. Other Content Videos  All are currently on YouTube  Will remain there for now  Metadata to be imported to Digital Library  Will eventually be served from our network http://www.youtube.com/smithsonianlibraries
  • 32. Other Content  Collections and Exhibitions  Bibliographies, lists, subject guides  Trade Literature  Sewing machines!  Scientific equipment!  Seed Catalogs!  Smithsonian Publications (DSpace)  Smithsonian Libraries Blog  Art and Artist Vertical Files W. Atlee Burpee & Co. Burpee's New Annual for 1910, 1910
  • 33. Future Work  More planning!  Developing a LOD Vocabulary for TL-2  Continued parsing of content in TL-2  Continuing the development of the Index Animalium content  Publishing the Index Animalium on the web as LOD  How to leverage linked data to create… what? Leopoldo Galluzzo Altre scoverte fatte nella luna dal Sigr. Herschel , 1836

Hinweis der Redaktion

  1. (2-3 min) Open with an introduction of who SIL is and what we do? (Old Slide 1 and 2)Questions: How many know SI has libraries? How many have visited the libraries? How many want to visit?
  2. To recap from last year, we covered a solid introduction on linked data and how Drupal 7 supports it out of the box via the built-in RDF and RDFx modules.
  3. We talked about what RDFa might look like in a webpage or RDF/XML stream that we are creating.
  4. We discussed this TL-2, taxonomic literature, reference tool for botanists and how we are converting it to Linked Open Data.
  5. And finally for TL-2 we offered some idea of the kind of data that we might be producing in RDF. This is yet another representation of the linked data, this time in N-Tuples format.
  6. Finally, we talked about how Open data, (not linked open data) is benefiting the Biodiversity Heritage Library. If you spend any amount of time around me, you’ll find that I will eventually come around to talking about this.
  7. And some examples of how people have used open data. This person mapped the usage of certain animal names over time and how they fall in or out of favor as time progresses. Those bars are time periods of 200 years of natural history literature.
  8. SLIDE: Overview of Linked Data (concept, statistics)
  9. This is linked data in action. Google knowledge graph. Google acquired Metaweb in 2010 and in that process, they got Freebase, which eventually was used to create this new pane of information on Google.
  10. SLIDE: Details of Linked Data (diagram of triple)
  11. (5 min) Review our discussion from last year. Sharing knowledge is our prime directiveLinked Data is a no-brainer Not going to to review what linked data is (unless we need to?)SLIDE: Overview of our website (statistics, content, etc) (LITA 2011 page 6)Questions: How many know what linked data is? Do we need to review?
  12. SLIDE: Content that could be linked data (LITA 2011 page 9)Quick review of what things have good metadata for likingWe said we would have something up in about one year. (Ha!)Last year I reviewed some of the details of how we are converting to linked data 
  13. (Show this again, but only briefly) (Old Slide 20, 21 22)SLIDE: Details of Darwin&apos;s linked data fields (LITA 2011 page 22)TAKEAWAY: Know your data (or whatever it is you’re sharing). Become intimately familiar with it. Take it on a date.
  14. List some of the modules we are using (Old Slide 15)SLIDE: List of Drupal Modules (LITA 2011 page 15)Questions: How many of you are using linked data? What data do you have that could be useful if linked? Know that if you raise your hand, I&apos;m going to pick on you throughout the rest of the talk. :)Disclaimer: We are still learning as we go! Even we, the Smithsonian, are figuring things out. We are also constrained by budgets, personnel and other requirements, possibly more as government entity.
  15. SLIDE: We are still learningFirst we had to decide what a Digital Library was. Our instinct is to go online and see what other people are doing. This is fine and all, but I think it&apos;s safe to say that we know what data we have, we know what we are doing as we move from an old website to a new. What of it belongs in the digital library? Well... here&apos;s what we have.It’s safe to say that we know our data, though we may go to others to see how to present that data. We’ll also use focus groups and usability studies to analyze the site once we have a beta.TAKEAWAY: You’ll always be learning. :) If you stop, you become irrelevant.
  16. SLIDE: What is a digital library? Books? Images? Exhibitions? Databases? Research papers? All of these things?Question: How many of you have a “digital library”. Want one?Question: Is anyone out there working with data that doesn&apos;t fall in these? I&apos;m curious as to what else might be out there.
  17. As far as vast amounts of linked data goes, currently there are two that stand out as really good useful datasets:SLIDE: Two data sets: TL2 and Index Animalium, numbers of records, types of dataFor us, the things that make sense to publish as linked data are TL2 (47k records) and Index Animalium (500k records). TL2 is almost there. IA has a long, long way to go. We&apos;ll come back to that.The first phase of our process was to get us on drupal. This actually took longer than we&apos;d hoped due to the planning required by our nature as a government institution. We have a certain level of planning and security analysis that must be done. That said, we have a simple brochure-ware website that is online at library.si.edu.CHM licensed their base metadata for their collections as CC0. Talk to the lawyers first.TAKEAWAY: Creative Commons (or CC0) licensing of metadata is becoming popular. We have a CC-BY license for TL2. Index Animalium is public domain. I think. We are libraries and we have a lot to share to the internet. Let’s make it happen so that others don’t.
  18. SLIDE: Content that could be linked data (LITA 2011 page 9)Quick review of what things have good metadata for likingWe said we would have something up in about one year. (Ha!)Last year I reviewed some of the details of how we are converting to linked data 
  19. SLIDE: TL2 website as it is today. How do we get it into Drupal? We use a module!Drupal is capable of handling millions of records, but getting those records into Drupal is not the easiest thing in the world. How do we import 430,000 species names for Index Animalium?Question: How many others are using a CMS? Drupal? (what is the name of that MS Technology to compete with Drupal?) PHP? ASP? Java? Others?Question: Is anyone developing in Drupal? Modules? Themes?
  20. Now that we are on drupal, we can move forward with some data! Yeah! Bring on the import!Disclaimer: The actual steps are specific to drupal, but you may find yourself in a similar situation of trial and error.Last time I reviewed how we were going to take this taxonomic literature thing to linked data. We have something almost online, but let&apos;s review where we are...We first imported via Feeds Importer (Question: anyone familiar?). Then we had to import again. Oops, the data was wrong again, so we had to import AGAIN. Three weeks later, I gave up. It was too slow and too painful. SLIDE: Feeds importer: 7 hours. 47,000 records in 7 hours? 1.8 rec/sec - Dismal!
  21. So I wrote a module! Yay! Module development! This makes sense. But there was one major challenge: I didn&apos;t know how to build modules in Drupal. So I learned. And then I realized that I could import the data as part of the installation of the module. Import times dropped to 81 minutes. (an improvement as I could control what the database was doing and minimize database traffic.)SLIDE: Drupal Module development is hard! Steep learning curve. List APIs that I had to become familiar with: Field. Node. Theme. Styling. Preprocess Functions. Render Elements.And THEN I learned that we could use the versioning of modules to update the data down the road. Either to create new database fields, munge the data, etc. This is a nice feature since we couldn&apos;t do that before. (12-15 hour downtime for our TL-2 site would have been a bad thing indeed)TAKEAWAY: Consider your options, the easy way is not always faster/better.
  22. So, we decided to use another module! Home grown! Versioned Data! We needed something to manage the delivery of the books using the IFRAME version of the Internet Archive book-reader. But uploading the data is even better. This time we were able to import in about 5 minutes. This handles the books, authors, vocabularies, subjects (FAST?), places as subject, timeframe as subject. It also handles the links between them. Much of this data came out from the MARCXML record, but sometimes we used MODS (where it was easier)Synchronization issues regarding the book metadata between IA, SIRIS, Picklist and Drupal. FUN!What do you do when your data lives in multiple places. One master many slaves? Multi-master? Mixed bag of drunken cats?SLIDE: And books have linked data, too! We&apos;re not sure how we are going to link it, but at least we&apos;ll have OCLC number, author name to VIAF, etc.
  23. Before we began really building our site, we needed to firm up our data model and make sure we had a good idea on how everything is going to relate to each other. This is an example of what the British Library created. I think they were very thorough and included a ot of detail. It is probably overkill for what we want to do, but who knows, we may end up in the same place, but maybe not in such an explicit manner.
  24. How do we structure our data? How do we organize it? What vocabularies will we be using? QUESTION: For those who are familiar with LOD, are you using any vocabularies other than these? Anyone making their own?
  25. Talk about Galaxy of Images, Other elements in the digital libraryPlates and other pretty pictures. Show the website for GOI. Search page, etc.Highlight the balloon that was StumbleUpon-ed and boosted our traffic 100-fold. Show a picture of the GA chart of the traffic.The data needs some cleanup. Standardization of the subjects metadata.Images need to be moved into DAMS (Artesia digital asset management system)This is being done in coordination with the manage of the GOI and our metadata team who is As an aside, one of the things we do to get new pretty images is to capture the plates from our metadata collection thingy for the BHL. We divert a stream of data of the &quot;pretty pictures&quot; from there into the Galaxy of Images through a mostly automated process. This will automatically upload (For ongoing projects, stress automation where possible. Take humans out of the equation. As smart as we are, we make mistakes. Code doesn&apos;t unless we make mistakes in our code and it frees us to do other things.)
  26. Talk about VideosOver 8 or 10 years of them, we needed to round them all up and get them organized. Lectures, animations, videos, interviews, demos, informational things. 30-40 of them? All are (or should be) on YouTube at this point in time. Ultimatley we will serve them from our DAMSCentered around our content, exhibitions,etc
  27. Collections / Exhibitions Arbitrary Collections of things. Exhibitions, tooCollections: arbitrary grouping of things under a heading (category) with maybe some introductory text.Exhibition: Same thing, but more sequential, telling a story of narrative. Order becomes important than in collections. Possibly more words.Bibliographies, lists of things, subject guidesLegacy content. Not sure if we need to keep it alive. Is it something that people continue to use. We’ll check out our analytics. HOWEVER, as they are tied to the library itself, we’ve already had to migrate them to the new site. Perhaps a bit of wasted effort, but at least it’s easier to manage now.Trade LiteratureDescribe them – Scientific Instruments, sewing machines!How are they catalogued (they are not) Catalogued by Manufacturer, well, inventoried. Nothing is scanned, we would like to scan them, but it poses some of its own challenges in how we organize the content. Each catalog can’t be a record in our, um, catalog, can it? SI PublicationsCollecting the output of the researchers at the smithsonian to gauge their … effectiveness, reach, influence, (Klout?)Currently in Dspace, will likely stay there, but we want to index and search it via the website, see: Summon Discovery LayerBlogThe blog is part of the website, too, but as it lives off in its own world, we don’t really need to concern ourselves with it because it’s not really part of the digital library per se.TAKEAWAY: Each set of content that you have may be different from the others. Creating a digital library is not going to be an easy task.
  28. Todo in the future:Made our own vocabulary for TL2: turns out we only needed two or three terms. The Biography vocabulary had much of what we needed already.Plan the migration of our exhibitions, which will lay the foundation for other online collections.Migrate our image into our DAMS systems, refining the metadata in the process, which will preclude us from having to store all these images on our web server.Figure out a method of handling collections and the arbitrary ordering of things. Is there a module? Should we make one? Should we reuse things that already exist (yes!)List some of the other tools that people might use for LOD. Take from my talk at SLA.Discuss Summon and the giant black box that it isIt’s on the way, it will be the discovery layer for our entire site. All our data needs to get into it. Including our catalog, our licensed content, all website content, blog content. API development, Integration with Drupal is a big mystery. Do I see another module in my future? :) If so, it will be similar to that of the Google Search Appliance module.How to leverage LOD for more stuff. Artists Files, Trade Lit, etc. linking to our catalog, history books, etc.TAKEAWAY: A website is a living, breathing, growing beast. I needs care and feeding and love and attention to keep it going.
  29. Open the floor for questions