"In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles

•Als PPTX, PDF herunterladen•

1 gefällt mir•946 views

National Information Standards Organization (NISO)

NISO Two Day Virtual Conference: Using the Web as an E-Content Distribution Platform: Challenges and Opportunities Oct 21-22, 2014 John Mark Ockerbloom, Digital Library Architect and Planner, University of Pennsylvania

Bildung

“In the Early Days of a Better
Nation”
Enhancing the power of metadata
today with linked data principles
John Mark Ockerbloom
University of Pennsylvania Libraries
NISO Virtual Conference – October 2014

Photo of Scottish Parliament building by Bernt Rostad, 2011
licensed CC-BY: https://www.flickr.com/photos/brostad/5841841020/
10/23/2014
University of Pennsylvania Libraries

Linking Open Data cloud diagram 2011, by Richard Cyganiak and Anja Jentzsch
http://lod-cloud.net/ Licensed CC-BY-SA
10/23/2014
University of Pennsylvania Libraries

Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer,
Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ Licensed CC-BY-SA
10/23/2014
University of Pennsylvania Libraries

OCLC Library Survey: Summer 2014
• 76 linked data projects described (with 46 more
mentioned) in an open solicitation
– 25 (only) consume linked data
• Consumption primarily to enhance their own data
– 4 (only) publish linked data
• Publication primarily to gain exposure, demonstrate
– 47 both consume and publish linked data
• More details (and implementation
advice):http://oclc.org/research/news/2014/09-
19.html
10/23/2014
University of Pennsylvania Libraries

Sir Tim’s Linked Data Principles
• Use URIs as names for things
• Use HTTP(S) URIs so names can be looked up
• Upon lookup, provide useful information “using
the standards (RDF*, SPARQL)”
• Include links to other URIs, so more things can be
discovered
(Tim Berners Lee, “Linked Data”, 2009:
http://www.w3.org/DesignIssues/LinkedData.html )
10/23/2014
University of Pennsylvania Libraries

A more functional (& fuzzy) take
• Use unambiguous identifiers for things
• …that can be looked up on the Web
• …that provide data a machine can interpret and
process usefully, without human intervention
• …that can be easily related to data from other sources
Also important:
Open, easy access
Commitment to support the data for its users
10/23/2014
University of Pennsylvania Libraries

Linked open data: Not a 1-step process
(Catalog photo for W3C’s “5 star linked open data” mug,
available for purchase at http://www.cafepress.com/w3c_shop.597992118 )
10/23/2014 University of Pennsylvania Libraries

How open can you get?
• Maximally open: CC0, Public Domain Dedication
– Relinquishes legal control; not necessarily other norms
– Examples: UI original bib records; DPLA; WikiData
• Attribution only: ODC-BY, CC-BY
– Examples: WorldCat bib data; UI OCLC records
– Lightweight attributions: collection-level; original URIs
• “Copyleft” or Share-Alike: CC-BY-SA
– More cumbersome, but used by Wikipedia & derivatives
(including DBPedia)
– Note that data might not be copyrightable
• Not really “open”: CC-NC, CC-ND
10/23/2014
University of Pennsylvania Libraries

VIVO: RDF etc.; days to harvest
10/23/2014
University of Pennsylvania Libraries

Elements: Tables & spreadsheets;
seconds or minutes to generate
10/23/2014
(From Penn’s Symplectic Elements service)
University of Pennsylvania Libraries

Use cases: Discoverability
10/23/2014
(From Schema.org web site)
University of Pennsylvania Libraries

10/23/2014
University of Pennsylvania Libraries

Modeling library collections in
schema.org
From https://www.w3.org/community/schemabibex/wiki/Holdings_via_Offer
10/23/2014
University of Pennsylvania Libraries

$Publishing open data on the Web # Cattype data file for Forward to Libraries Service # Ed. by John Mark Ockerbloom, University of Pennsylvania # (contact details at everybodyslibraries.com) # This data file released to the world as CC0 # (though appropriate attribution appreciated) # See docs file for documentation, libraries file for usage of this catalog data # Last updated 2014-09-04 CATTYPE 360search SUBURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=subject&term=${ARG} AUTURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=author&term=${ARG} TITURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=title&term=${ARG} KEYURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=keyword&term=${ARG} CATTYPE aleph1 TINDEX TTL SUBURL ${BASEURL}/F/?func=find-b&find_code=SUB&local_base=${LCODE}&request=${ARG} (…) 10/23/2014 University of Pennsylvania Libraries$

Publishing open data on the Web
# Libraries data file for Forward to Libraries Service
# Ed. by John Mark Ockerbloom, University of Pennsylvania
# (…)
ID OCLC-WCDGL
NAME Worldcat.org
COUNTRY 00
LOCATION Union catalog of more than 10,000 libraries worldwide
BASEURL http://www.worldcat.org/search
CATTYPE worldcat
ID AU-ANU
NAME Australian National University
LOCATION Canberra, ACT
COUNTRY AU
CATTYPE summon
BASEURL http://anu.summon.serialssolutions.com
(…)
10/23/2014
University of Pennsylvania Libraries

Hubs: Common reference points for
communicating, understanding data
• Common vocabularies
– Schema.org; VIVO ontology; BIBFRAME
– RDF itself (and JSON…)
• Common identifiers
– ISSN; DOI; VIAF; OCLC’s bibliographic & library identifiers
– Traditional authority headings, Wikipedia (have to deal w ID
change)
• Common corpuses of data
– Bibliographic records from HathiTrust, Harvard, Illinois, etc.
– OCLC’s WorldCat, WorldCat Registry; VIAF, etc.
– Authorities & thesauri (LC, Getty, etc.), DBPedia, Wikidata
• Facilitate communities that use & improve data
– Wikimedia’s great strength
10/23/2014
University of Pennsylvania Libraries

Building on shared & linked data
10/23/2014
University of Pennsylvania Libraries

In summary
• Lots of open data out there libraries can use
– Lots more they can contribute
• Making it as easy as possible for interested users
to find & reuse your data makes big difference
– Optimize the top priorities for your users
– But support broad audience (standards help here)
• Find, and work with, your hubs
• Commitment, openness, & community ties key
10/23/2014
University of Pennsylvania Libraries

Questions!
• John Mark Ockerbloom
– ockerblo@pobox.upenn.edu
– @JMarkOckerbloom
– http://EverybodysLibraries.c
om/
All content original to this
presentation is licensed CC-BY
10/23/2014
University of Pennsylvania Libraries
WORK
AS IF
YOU LIVE
IN THE
EARLY DAYS
OF A
BETTER
NATION.
ALASDAIR GRAY

Weitere ähnliche Inhalte

Was ist angesagt?

Exploring a world of networked information built from free-text metadataShenghui Wang

Arlitsch may4-3National Information Standards Organization (NISO)

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...National Information Standards Organization (NISO)

2015 NISO Forum: The Future of Library Resource DiscoveryNational Information Standards Organization (NISO)

NISO Standards Update @ ALA Midwinter, January 27, 2013 in Seattle, WANational Information Standards Organization (NISO)

UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG: connecting the knowledge community

Linked Open Data: Identifying OpportunitiesLibrary_Connect

2015 NISO Forum: The Future of Library Resource DiscoveryNational Information Standards Organization (NISO)

Connecting the Dots: Linking Digitized Collections Across Metadata SilosOCLC

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...National Information Standards Organization (NISO)

Putting Research Data into Context: A Scholarly Approach to Curating Data for...OCLC

Oehrli - Creating Data Literate StudentsNational Information Standards Organization (NISO)

Freedman Center for Digital Scholarship Colloquium - 14_1106jeffreylancaster

2015 NISO Forum: The Future of Library ResourceNational Information Standards Organization (NISO)

ALA 2016 NISO Standards Update Hillman Bibliographic RoadmapNational Information Standards Organization (NISO)

Read Surkis Facilitating Development of Research Data ServicesNational Information Standards Organization (NISO)

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...National Information Standards Organization (NISO)

Redefining the Academic LibraryTed Lin (林泰宏)

November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...National Information Standards Organization (NISO)

UKSG Conference 2017 Breakout - From Google Scholar to discovery platforms vi...UKSG: connecting the knowledge community

Was ist angesagt? (20)

Exploring a world of networked information built from free-text metadata

Arlitsch may4-3

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

2015 NISO Forum: The Future of Library Resource Discovery

NISO Standards Update @ ALA Midwinter, January 27, 2013 in Seattle, WA

UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...

Linked Open Data: Identifying Opportunities

2015 NISO Forum: The Future of Library Resource Discovery

Connecting the Dots: Linking Digitized Collections Across Metadata Silos

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...

Putting Research Data into Context: A Scholarly Approach to Curating Data for...

Oehrli - Creating Data Literate Students

Freedman Center for Digital Scholarship Colloquium - 14_1106

2015 NISO Forum: The Future of Library Resource

ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap

Read Surkis Facilitating Development of Research Data Services

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

Redefining the Academic Library

November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...

UKSG Conference 2017 Breakout - From Google Scholar to discovery platforms vi...

Andere mochten auch

Knowledge Unlatched – Navigating Through the Rapids of Change National Information Standards Organization (NISO)

Being Responsive to Libraries, Librarians, and PatronsNational Information Standards Organization (NISO)

This Used to Be a Book … Next-Generation Online ReferencesNational Information Standards Organization (NISO)

Embrace Technology – or It will Embrace YouNational Information Standards Organization (NISO)

Next Generation SystemsNational Information Standards Organization (NISO)

Authorea: Write and Manage Data-Driven, Interactive Scientific Articles in a ...National Information Standards Organization (NISO)

Trends in Publishing AutomationNational Information Standards Organization (NISO)

Evolution of e-Content Distribution: Ad Hoc to StandardizationNational Information Standards Organization (NISO)

Andere mochten auch (8)

Knowledge Unlatched – Navigating Through the Rapids of Change

Being Responsive to Libraries, Librarians, and Patrons

This Used to Be a Book … Next-Generation Online References

Embrace Technology – or It will Embrace You

Next Generation Systems

Authorea: Write and Manage Data-Driven, Interactive Scientific Articles in a ...

Trends in Publishing Automation

Evolution of e-Content Distribution: Ad Hoc to Standardization

Ähnlich wie "In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles

NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...National Information Standards Organization (NISO)

HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project

SAA 2014 session 703rosalielack

Boundless OpportunityRachel Frick

Drupal and LibrariesEllyssa Kroski

What Does It Mean to Have Collections?Karen S Calhoun

Agile resources on the open web …. a global digital libraryJisc

Library websites of the futureRachel Vacek

Reference Rot and Linked Data: Threat and RemedyEDINA, University of Edinburgh

The Power of Sharing Linked Data: Bibliothekartag 2014Richard Wallis

Towards OpenURL Quality Metrics: Initial Findingsalc28

How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference

NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNebraska Library Commission

Beyond MARC: BIBFRAME and the Future of Bibliographic DataEmily Nimsakont

2014 Library PresentationNew Media Consortium

Linked Data: Why Bother?Jennifer Bowen

Open Source ILS Add-Onsloriayre

Linked Open Data for Cultural HeritageNoreen Whysel

Library as Place, Place as Library: Duality and the Power of CooperationKaren S Calhoun

Final Johnson Research Libraries and Computational ResearchNational Information Standards Organization (NISO)

Ähnlich wie "In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles (20)

NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...

HIBERLINK: Reference Rot and Linked Data: Threat and Remedy

SAA 2014 session 703

Boundless Opportunity

Drupal and Libraries

What Does It Mean to Have Collections?

Agile resources on the open web …. a global digital library

Library websites of the future

Reference Rot and Linked Data: Threat and Remedy

The Power of Sharing Linked Data: Bibliothekartag 2014

Towards OpenURL Quality Metrics: Initial Findings

How Libraries Use Publisher Metadata Redux (Steven Shadle)

NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data

Beyond MARC: BIBFRAME and the Future of Bibliographic Data

2014 Library Presentation

Linked Data: Why Bother?

Open Source ILS Add-Ons

Linked Open Data for Cultural Heritage

Library as Place, Place as Library: Duality and the Power of Cooperation

Final Johnson Research Libraries and Computational Research

Mehr von National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Bazargan "NISO Webinar, Sustainability in Publishing"National Information Standards Organization (NISO)

Rapple "Scholarly Communications and the Sustainable Development Goals"National Information Standards Organization (NISO)

Compton "NISO Webinar, Sustainability in Publishing"National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design: Large Language Models"National Information Standards Organization (NISO)

Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design" - Introduction to Machine Learning"National Information Standards Organization (NISO)

Mattingly "Text and Data Mining: Building Data Driven Applications"National Information Standards Organization (NISO)

Mattingly "Text and Data Mining: Searching Vectors"National Information Standards Organization (NISO)

Mattingly "Text Mining Techniques"National Information Standards Organization (NISO)

Mattingly "Text Processing for Library Data: Representing Text as Data"National Information Standards Organization (NISO)

Carpenter "Designing NISO's New Strategic Plan: 2023-2026"National Information Standards Organization (NISO)

Ross and Clark "Strategic Planning"National Information Standards Organization (NISO)

Mattingly "Data Mining Techniques: Classification and Clustering"National Information Standards Organization (NISO)

Straza "Global collaboration towards equitable and open science: UNESCO Recom...National Information Standards Organization (NISO)

Lippincott "Beyond access: Accelerating discovery and increasing trust throug...National Information Standards Organization (NISO)

Kriegsman "Integrating Open and Equitable Research into Open Science"National Information Standards Organization (NISO)

Mattingly "Ethics and Cleaning Data"National Information Standards Organization (NISO)

Mercado-Lara "Open & Equitable Program"National Information Standards Organization (NISO)

Mehr von National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Bazargan "NISO Webinar, Sustainability in Publishing"

Rapple "Scholarly Communications and the Sustainable Development Goals"

Compton "NISO Webinar, Sustainability in Publishing"

Mattingly "AI & Prompt Design: Large Language Models"

Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...

Mattingly "AI & Prompt Design" - Introduction to Machine Learning"

Mattingly "Text and Data Mining: Building Data Driven Applications"

Mattingly "Text and Data Mining: Searching Vectors"

Mattingly "Text Mining Techniques"

Mattingly "Text Processing for Library Data: Representing Text as Data"

Carpenter "Designing NISO's New Strategic Plan: 2023-2026"

Ross and Clark "Strategic Planning"

Mattingly "Data Mining Techniques: Classification and Clustering"

Straza "Global collaboration towards equitable and open science: UNESCO Recom...

Lippincott "Beyond access: Accelerating discovery and increasing trust throug...

Kriegsman "Integrating Open and Equitable Research into Open Science"

Mattingly "Ethics and Cleaning Data"

Mercado-Lara "Open & Equitable Program"

Kürzlich hochgeladen

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

Mastering the Unannounced Regulatory InspectionSafetyChain Software

microwave assisted reaction. General introductionMaksud Ahmed

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand

Arihant handbook biology for class 11 .pdfchloefrazer622

Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani

Introduction to AI in Higher Education_draft.pptxpboyjonauth

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George

CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

The basics of sentences session 2pptx copy.pptxheathfieldcps1

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

Kürzlich hochgeladen (20)

A Critique of the Proposed National Education Policy Reform

Mastering the Unannounced Regulatory Inspection

microwave assisted reaction. General introduction

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Concept of Vouching. B.Com(Hons) /B.Compdf

Arihant handbook biology for class 11 .pdf

Science 7 - LAND and SEA BREEZE and its Characteristics

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Q4-W6-Restating Informational Text Grade 3

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991

Introduction to AI in Higher Education_draft.pptx

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17

CARE OF CHILD IN INCUBATOR..........pptx

Sanyam Choudhary Chemistry practical.pdf

Introduction to ArtificiaI Intelligence in Higher Education

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Hybridoma Technology ( Production , Purification , and Application )

The basics of sentences session 2pptx copy.pptx

How to Make a Pirate ship Primary Education.pptx

"In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles

1. “In the Early Days of a Better Nation” Enhancing the power of metadata today with linked data principles John Mark Ockerbloom University of Pennsylvania Libraries NISO Virtual Conference – October 2014

2. Photo of Scottish Parliament building by Bernt Rostad, 2011 licensed CC-BY: https://www.flickr.com/photos/brostad/5841841020/ 10/23/2014 University of Pennsylvania Libraries

3. Linking Open Data cloud diagram 2011, by Richard Cyganiak and Anja Jentzsch http://lod-cloud.net/ Licensed CC-BY-SA 10/23/2014 University of Pennsylvania Libraries

4. Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ Licensed CC-BY-SA 10/23/2014 University of Pennsylvania Libraries

5. OCLC Library Survey: Summer 2014 • 76 linked data projects described (with 46 more mentioned) in an open solicitation – 25 (only) consume linked data • Consumption primarily to enhance their own data – 4 (only) publish linked data • Publication primarily to gain exposure, demonstrate – 47 both consume and publish linked data • More details (and implementation advice):http://oclc.org/research/news/2014/09- 19.html 10/23/2014 University of Pennsylvania Libraries

6. Sir Tim’s Linked Data Principles • Use URIs as names for things • Use HTTP(S) URIs so names can be looked up • Upon lookup, provide useful information “using the standards (RDF*, SPARQL)” • Include links to other URIs, so more things can be discovered (Tim Berners Lee, “Linked Data”, 2009: http://www.w3.org/DesignIssues/LinkedData.html ) 10/23/2014 University of Pennsylvania Libraries

7. A more functional (& fuzzy) take • Use unambiguous identifiers for things • …that can be looked up on the Web • …that provide data a machine can interpret and process usefully, without human intervention • …that can be easily related to data from other sources Also important: Open, easy access Commitment to support the data for its users 10/23/2014 University of Pennsylvania Libraries

8. Linked open data: Not a 1-step process (Catalog photo for W3C’s “5 star linked open data” mug, available for purchase at http://www.cafepress.com/w3c_shop.597992118 ) 10/23/2014 University of Pennsylvania Libraries

9. How open can you get? • Maximally open: CC0, Public Domain Dedication – Relinquishes legal control; not necessarily other norms – Examples: UI original bib records; DPLA; WikiData • Attribution only: ODC-BY, CC-BY – Examples: WorldCat bib data; UI OCLC records – Lightweight attributions: collection-level; original URIs • “Copyleft” or Share-Alike: CC-BY-SA – More cumbersome, but used by Wikipedia & derivatives (including DBPedia) – Note that data might not be copyrightable • Not really “open”: CC-NC, CC-ND 10/23/2014 University of Pennsylvania Libraries

10. VIVO: RDF etc.; days to harvest 10/23/2014 University of Pennsylvania Libraries

11. Elements: Tables & spreadsheets; seconds or minutes to generate 10/23/2014 (From Penn’s Symplectic Elements service) University of Pennsylvania Libraries

12. Use cases: Discoverability 10/23/2014 (From Schema.org web site) University of Pennsylvania Libraries

13. 10/23/2014 University of Pennsylvania Libraries

14. Modeling library collections in schema.org From https://www.w3.org/community/schemabibex/wiki/Holdings_via_Offer 10/23/2014 University of Pennsylvania Libraries

15. Publishing open data on the Web # Cattype data file for Forward to Libraries Service # Ed. by John Mark Ockerbloom, University of Pennsylvania # (contact details at everybodyslibraries.com) # This data file released to the world as CC0 # (though appropriate attribution appreciated) # See docs file for documentation, libraries file for usage of this catalog data # Last updated 2014-09-04 CATTYPE 360search SUBURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=subject&term=${ARG} AUTURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=author&term=${ARG} TITURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=title&term=${ARG} KEYURL ${BASEURL}/results?SS_LibHash=${LCODE}&action=start&dbIDList=${DBID}&field=keyword&term=${ARG} CATTYPE aleph1 TINDEX TTL SUBURL ${BASEURL}/F/?func=find-b&find_code=SUB&local_base=${LCODE}&request=${ARG} (…) 10/23/2014 University of Pennsylvania Libraries

16. Publishing open data on the Web # Libraries data file for Forward to Libraries Service # Ed. by John Mark Ockerbloom, University of Pennsylvania # (…) ID OCLC-WCDGL NAME Worldcat.org COUNTRY 00 LOCATION Union catalog of more than 10,000 libraries worldwide BASEURL http://www.worldcat.org/search CATTYPE worldcat ID AU-ANU NAME Australian National University LOCATION Canberra, ACT COUNTRY AU CATTYPE summon BASEURL http://anu.summon.serialssolutions.com (…) 10/23/2014 University of Pennsylvania Libraries

17. Hubs: Common reference points for communicating, understanding data • Common vocabularies – Schema.org; VIVO ontology; BIBFRAME – RDF itself (and JSON…) • Common identifiers – ISSN; DOI; VIAF; OCLC’s bibliographic & library identifiers – Traditional authority headings, Wikipedia (have to deal w ID change) • Common corpuses of data – Bibliographic records from HathiTrust, Harvard, Illinois, etc. – OCLC’s WorldCat, WorldCat Registry; VIAF, etc. – Authorities & thesauri (LC, Getty, etc.), DBPedia, Wikidata • Facilitate communities that use & improve data – Wikimedia’s great strength 10/23/2014 University of Pennsylvania Libraries

18. Building on shared & linked data 10/23/2014 University of Pennsylvania Libraries

19. In summary • Lots of open data out there libraries can use – Lots more they can contribute • Making it as easy as possible for interested users to find & reuse your data makes big difference – Optimize the top priorities for your users – But support broad audience (standards help here) • Find, and work with, your hubs • Commitment, openness, & community ties key 10/23/2014 University of Pennsylvania Libraries

20. Questions! • John Mark Ockerbloom – ockerblo@pobox.upenn.edu – @JMarkOckerbloom – http://EverybodysLibraries.c om/ All content original to this presentation is licensed CC-BY 10/23/2014 University of Pennsylvania Libraries WORK AS IF YOU LIVE IN THE EARLY DAYS OF A BETTER NATION. ALASDAIR GRAY

Hinweis der Redaktion

Good afternoon, and thank you all for joining me here. I’d like to talk today about linked data, and how the principles of linked data can really improve and amplify the information and services we provide in libraries. I’ve often heard people say lately that linked data is the future of libraries. Actually, I’ve been hearing that for a few years now; and I’ve heard colleagues wonder whether it always will be the future of libraries – and never the present. But I’d like to show how the principles of linked data are increasingly becoming the present of libraries. And I’m seeing that not so much in the adoption of linked data technical standards– although those are important and getting traction over time– but in the adoption of the principles behind linked data, and the increasing level of sharing and aggregation of the knowledge embodied in libraries that those principles make possible.
The title of my talk is taken from a quotation on one of the outer walls of the Scottish Parliament building. It’s a motto frequently used by the Scottish writer Alasdair Gray, “Work as if you live in the early days of a better nation”. The Scottish Parliament itself is a testament to the power of that idea. The very existence of the Scottish Parliament, founded in its present form in 1999, is a sign of the growing autonomy and identity of Scotland as a nation. As an American, I don’t have a strong opinion about whether Scotland should ultimately stay in the United Kingdom or go its own way. But the motto makes it clear that Scots aren’t waiting for the resolution of that question to build the nation they want to live in. Even if the formal basis of their government isn’t now where folks like Alasdair Gray want it to be, they’re still implementing the principles they want to live by today. I see the same sort of principled implementation happening in libraries. Not all of them, yet, but in many of them, and many others can get involved too if they want to; and I hope in the next 20 minutes or so to give you some idea of how.
So, if you’ve gone to a conference or a webinar where they’ve talked about linked data, you’ve probably seen this diagram before. It’s a diagram of a number of services that implement linked data technologies, and how they connect with each other. A fair number of the bubbles in this diagram have to do with libraries, or publications, or research, or government, or related information that researchers and their libraries might take an interest in. This diagram is now about 3 years old.
There’s a new version of the diagram that came out at the end of this summer, and it’s got nearly twice as many bubbles, and they connect quite a bit more densely in the middle. In one respect, it’s an impressive array of data and services In another respect, it gives me some pause that you can still represent the world of linked data, down to the level of individual services, on a single slide. I couldn’t imagine doing the same for, say, the World Wide Web, even a few months after the first graphical web browsers started to appear. Now, I admit I’m not being entirely fair here. The people who made this diagram make it clear that not all linked data instances are represented on it. They have a certain threshold for inclusion. And indeed, there are a lot of linked data projects in the world of libraries that don’t show up here.
We get a better idea of the extent of linked data use in libraries from a survey that OCLC Research did this past summer. They put out a call online asking about linked data projects in libraries, and they got back replies about over 100 such projects. And people went into detail about 76 of them. The vast majority of these projects are consuming linked data from various sources, primarily to enhance the data they already have in their catalogs and elsewhere, so as to provide a better experience for their users. The majority of those projects also publish linked data. Sometimes it’s just for demonstration: “Look Ma, Linked Data!” But it’s also often done to help gain exposure for things the library has to offer, by putting that data into a wider variety of places and contexts. With linked data, you don’t have to purposely go a library’s own website to find out about resources it can offer you.
And that’s possible because the purpose of linked data is to make it easy to share data, reuse it, and have machines be able to understand it and apply it or repurpose it. Tim Berners Lee, the inventor of the Web, set out principles of Linked Data some years back that make this possible. From a technical standpoint, they’re pretty simple to summarize: Use URIs as names for things; use HTTP URIs – that is, ordinary Web addresses– so you can look up information about those things. Make that information available in a form that’s comprehensible to machines, “using the standards” – and here he explicitly mentions the RDF data model and the SPARQL query interface. And link to other URIs so that more things can be discovered. Now while this is simple enough to summarize on one slide, it’s also presenting linked data from a technical perspective. And in some ways– particularly in the bits mentioning RDF and SPARQL– I’d argue it gets more specific than it needs to. There are in fact a number of other standards that are gaining traction in the linked data world: JSON-LD and LDP are two such standards. I think it’s worth stepping back a little bit, getting a little more fuzzy with our specifications, and examining exactly what we’re trying to accomplish with linked data.
Here’s a more functional and fuzzy take on linked data. You want to have unambiguous identifiers for things that you can look up on the Web. A very straightforward way of doing this is to use HTTP URIs, as Tim Berners Lee’s principles recommend. But as we’ll see, there are other ways as well. We want the lookup to provide data that a machine can usefully interpret and process, without needing a human to intervene. That’s what distinguishes linked data applications from things like copy cataloging. In copy cataloging, you’re getting data from another source, but you still need a person to retrieve it, look it over, maybe tweak it a bit, and then stick it into a local catalog. But in order to integrate data at a large scale, we need to be able to automatically retrieve it and interpret and repurpose it without direct manual intervention. And we need to be able to do this from various sources, not just one, so that we can create useful new information from the combination and analysis of existing information. And I’ll show you some examples of that in a bit. There are a couple of other important things too. One is that data can go a lot further when it’s open– when the legal and technical barriers to accessing and repurposing it are as low as possible. Another is that data becomes much more useful in practice when it’s being actively maintains and supported for a community of users, who know that there’s someone making a commitment to keeping the data available, usable, and up to date.
And the World Wide Web Consortium recognizes these things at some level. Their “5 star linked data” mug implicitly recognizes that there are incremental advantages of various ways of putting data on the Web, even if it’s not in the linked RDF paradigm that they see as the “5-star” standard. And they’ve also recognized the importance of open licenses accompanying technical provision.
Let’s talk a little bit about those open licenses. Generally speaking, I think it’s in the interest of libraries to make data about their resources as open as possible. The most straightforward way to do this is to explicitly declare that it’s in the public domain, through a dedication like CC-Zero. (If you’re unfamiliar with CC0, just Google CC0 and you’ll find out all about it.) It relinquishes any sort of legal claim you might make on the data. But it doesn’t mean you won’t get any credit for it– after all, scholars and those who want to be seen as trustworthy are expected to cite their sources, whether they’re protected as intellectual property or not. When you do want to legally require attribution, you can use a license like ODC-BY or CC-BY. This can ensure that anyone wanting to use your data, if it’s protectable as intellectual property, needs to cite you as its source, Or Else. It also means that people reusing and remixing data downstream from you might have to include an ever-increasing set of notices on their data. But there are ways to mitigate that. OCLC, for instance, says that if it’s impractical to attach their usual attribution paragraph to data that’s derived from their, just using their original URIs, which implicitly credit OCLC as their origin, is good enough for them. You can also credit at the collection level rather than at the level of each individual bit of data. When the University of Illinois released their catalog records as open data, for instance, they have one portion licensed as CC0 that includes the records that they originated, and another portion licensed as ODC-By that credits OCLC for records derived from WorldCat. You’ll sometimes also see Creative Commons ShareAlike licenses attached to data. Wikipedia is licensed CC-BY-ShareAlike, so data sets that derive from Wikipedia, such as DBPedia, are released under the same license. That’s a bit more of a cumbersome license to work with, but it is an open license, in that anyone can use the data for any purpose they like, as long as they keep attribution and use the same license on any derivative data they distribute. Creative Commons also has Noncommercial and No-Derivatives licenses. These are not open licenses by most definitions, and I don’t recommend using them for linked data.
Now, there’s lots of ways you can make data available for consumption, some involving linked data standards, and some not. And the standard isn’t always the way you want to to go. At Penn, we run a linked data service on researcher activity called VIVO. VIVO has RDF, a rich set of RDF vocabularies and ontologies, searching and browsing facilities, and a SPARQL interface. It’s definitely 5 stars on that linked data coffee mug, and It’s great for a lot of uses. But if you’re a stranger who wants to download our whole VIVO data set, you’ll find that hard to do. Like many VIVO instances, we don’t leave our SPARQL interface open to the world, because we can’t easily control what queries get run from that interface. Some of them can eat up a lot of resources to complete. Some other VIVO sites keep their SPARQL port closed because they include sensitive data, and there isn’t an effective way to firewall sensitive parts of VIVO data from non-sensitive parts in VIVO’s SPARQL interface. So the way that you currently get a full set of linked data from our VIVO instance is to issue a lot of RDF REST queries. And that can take a while. About a week, one consumer told me. It’d be much faster if we just provided a dump file of our VIVO RDF data.
The linked data in our VIVO instance ultimately comes from some internal databases we have, that manage data in more conventional tables. And those systems can deliver data very quickly and conveniently. We’ve bought a inward-facing system called Symplectic Elements, which will eventually manage nearly all the data we provide to the world in our VIVO service. Authorized users can download Excel spreadsheets that give reports or dumps for just about all of the data in seconds, or minutes at most. The people who want those reports don’t know their way around SPARQL queries, but they’re experts at slicing and dicing spreadsheet data. If meeting the needs of users like this is your top priority, you might want to support a different data model and interface than the usual linked data standards.
But you don’t need a SPARQL interface, or recast all your data as RDF, to take advantage of linked data. One increasingly common way libraries employlinked data is to make the items in their catalogs more readily discoverable. Schema.org, for instances defines a set of markup tags and vocabularies that you can add to ordinary web pages that helps machines understand what your web pages are about. The markup can be expressed in RDF, or in other common encoding standards.
Here’s a catalog record for an open access book in my catalog of online books. To a human, it looks like an ordinary web page. But there’s schema.org markup in the underlying HTML that tells a machine that knows those conventions– and thatincludes all the major search engine crawlers-- that this isn’t just an ordinary web page with some text. It’s specifically a page that describes a book. Not only that, but the part of the page that says “Willinsky, John, 1950-” identifies the author of the book. And the part that says “The Access Principle” is the title. And so on. So if a search engine that’s crawled this page later gets a query from someone looking for books by John Willinsky, they know that this is a result page that’s highly relevant to that query. A growing number of open source and commercial library catalogs and discovery systems will now automatically add schema.org tags to your catalog entries.
There’s been a lot of work lately in expanding the schema.org vocabularies so that they capture more of what libraries provide. One working group has been working on adding vocabulary for library holdings, for instance. So when someone asks about a book at a service that understands the vocabulary, it can respond “I know where you can find THIS BOOK at THIS LIBRARY!’’ “And by the way, that library closes in an hour, but if you go by this route you should be able to drive there in 40 minutes.”
Here’s another example of open data, that I’ve published on the Web. It’s a data set that specifies how to construct links to search various kinds of library catalogs and discovery systems. This file gives the patterns for URIs that launch various kinds of searches in 360Search, Aleph version 1, and dozens of other kinds of systems. I publish this file on Github, along with some open source code for interpreting the data in the file, and I give the data file a CC0 license. This isn’t “5-star linked data”, but it has many of the benefits that linked data provide. It’s got an open license, it can be automatically downloaded from a well-defined location, and it’s in a format that’s fully documented for machines to interpret. So that’s all good.
I have another file on Github that’s related to that file. This file has data on hundreds of individual library systems. For each one, it identifies the specific catalog or discovery user interface it uses, along with any library-specific information one might need to do searches there. This file uses the same open license and basic format as the other file I just showed you. But the data in this file overlaps with another data source out there, and that’s the WorldCat Registry. OCLC’s WorldCat Registry has information on thousands of library systems– actually, a lot more libraries than are in this file. And it has some information on the ILS’s and catalogs that those libraries use, though it doesn’t go as much detail as you need to provide a reliable variety of search links. But you can imagine how it could be quite useful to combine the data in this file with the data in the WorldCat registry. And when you start getting into combining data sources, it becomes more and more useful to follow common conventions and standards. I do that to some extent in this file, because t I’m identifying libraries with OCLC and ISIL identifiers whenever possible. Since those identifiers are also used in the WorldCat registry, it’s possible to match up data records in this file with data records in the WorldCat Registry. The WorldCat registry also publishes its data using RDF. If I also provided an RDF version of this data, then someone could easily match up data from both sources with a standard RDF processor, instead of having to use an RDF processor to parse the WorldCat file and a custom processor to parse my file. So I should probably offer an RDF conversion of this file when I get the chance. It’s in situations like this, where you’re matching and mixing data, that you really start seeing the value of points of commonality for tying data together.
I call these points of commonality “hubs”, and they’re a crucial part of the linked data world, and I think they’re a crucial part of data sharing in general. Hubs can exist on multiple levels. Some hubs provide a common vocabulary for data. Schema.org, which we saw earlier, provides one such common vocabulary set. BIBFRAME will hopefully provide another one. And standards themselves are hubs of a sort: RDF Is one of the main hubs for linked data at the level of standard, but JSON-LD is another. Some hubs provide common identifiers that allow different groups to clearly describe the same things. ISSNs, DOIs, LC Control Numbers, and OCLC IDs are ways we can unambiguously refer to specific works, or people, or subjects. Traditional authority headings, like authorized names and subject headings, are also common identifiers. They’re less opaque than coded IDs, but they do have the problem that they sometimes change. That’s not an insurmountable problem, particularly if they’re maintained in a system that has good cross-references and change tracking, but we could do a whole talk just on that. Some hubs provide comprehensive data sets that can be used in all sorts of places. That’s what makes resources like HathiTrust’s catalog of digitized books, the various OCLC databases, and the various Wikimedia projects so valuable. Wikipedia and Wikimedia Commons aren’t built on linked data as such, but the ease with which others can both extract their information and contribute new information makes them valuable information hubs.
These hubs make it possible for someone like me to build browsers for collections of millions of free online books as a part-time project. Here’s a subject view from The Online Books Page, which fundamentally depends on multiple hubs to function. Most of the catalog records in this collection are automatically imported from HathiTrust– not as linked data, but as open and readily downloadable data-- and from other open bibliographic data sources. The description you see here of open access publishing, and its relationships to other subjects, comes from id.loc.gov, which is the Library of Congress’ linked data service for its authorities and vocabularies. The links to search for books in other libraries are based on the library catalog search data files that I showed you earlier. The link to the Wikipedia article on “Open Access” is derived from an crosswalk that’s in turn derived from other open data sources like OCLC’s VIAF as well as dump files that Wikipedia provides. We’re now using some of these same data hubs to augment and improve the main catalog of the Penn Libraries as well. We takes linked and open data from multiple sources, and we also give back. Besides the schema.org tagging I mentioned earlier, I also publish all of my original catalog records with a CC0 dedication. The mappings I’ve established between LC Subject Headings and Wikipedia article titles are also published CC0 on Github. And I regularly contribute corrections back to HathiTrust, and contribute library links and other information to Wikipedia. This sort of give and take helps keep these community hubs strong.
Those hubs are now numerous enough, and varied enough, that there’s a lot of linked and otherwise open data that libraries can put to good use. And there’s a lot more that we can contribute from our own libraries, whether it’s by putting schema.org markup on our web pages and catalogs, or contributing to a suitable open information hub, or publishing our own open data, whether it’s 5-star RDF linked data or some other suitable form. The important thing is that if you want to share data with the world, you want to make it as easy as possible for users of that data to find it, process it, and repurpose it. Standards can play an important role in this, but a commitment to work with user communities to provide data openly in the ways that make it easiest for them to work with your data is even more fundamental. You haven’t seen a single line of RDF or SPARQL in this presentation, but that’s not what you need fundamentally to get started. We don’t have to wait for BIBFRAME to kill off MARC, or some other technological revolution. We can use the principles I’ve discussed in this presentation to build better library services and outreach right now– in the early days of linked open data.
Thanks for hearing me talk, and now I’d love to hear from you. And if you want to talk with me after this session, here’s how you can get in touch with me.

"In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie "In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles

Ähnlich wie "In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles (20)

Mehr von National Information Standards Organization (NISO)

Mehr von National Information Standards Organization (NISO) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

"In the Early Days of a Better Nation": Enhancing the power of metadata today with linked data principles

Hinweis der Redaktion