Linked (Open) Data

  Linked (Open) Data INFO 4302 - April 18, 2011 Bernhard Haslhofer - Cornell University
  Who am I? • Postdoc at Cornell Information Science • Research areas • linked data • user-contributed data (annotations) • (meta-)data interoperability • Contact: • bernhard.haslhofer@cornell.edu
  Today we talk about... http://www.youtube.com/watch?v=5Cb3ik6zP2I
  Today we talk about... • Movies, actors and other real-world entities • How to make data about these entities available on the Web (Linked Data) • Enabling technologies, best-practices and useful tools that help us in doing so • Other Linked Data projects (BBC, LoC)
  Web Architecture Recap
  The World Wide Web (WWW) • Internet != WWW != Google != Facebook • Fundamental technologies • URI - a simple and generic syntax for identifiers • HTML - a markup language without formal schema binding • HTTP - a simple protocol to access and manipulate resources and resource representations in a distributed environment • W3C Consortium (http://www.w3.org)
  URIs • Identification of resources via Uniform Resource Identifiers (URIs) •The generic syntax consists of a hierarchical sequence of components, scheme, Generic Syntax: authority, path, query, and fragment. URI = scheme ":" hier-path [ "?" query ] [ "#" fragment ] Scheme and hier-path are required, though the path may be empty. Example URIs with components: URI foo://example.com:8042/over/there?name=ferret#nose _/ ________________/_________/ _________/ __/ URL | | | | | URN scheme authority path query fragment
  URIs / Resources • Information Resource • web pages, images, product catalogs, etc • all their essential characteristics can be conveyed in a message • e.g., http://www.flickr.com/user2/photos/image.jpg • Non-Information Resource • other things such as dogs, people, this classroom, concepts • their essence is not information • e.g., http://www.example.com/ontology/meter
  HTTP • A stateless request-response protocol in the client-server computing model • HTTP methods: GET, POST, PUT, DELETE, ... • Agents may use a URI to access the referenced resource = dereferencing the URI
  HTTP Content Negotiation • A URI is not (necessarily) a filename • Conneg = making available multiple resource representations via the same URI Plain Text text/plain HTML (en) URI text/html HTML (jp) http://example.com/The_Shining text/html Resource
  (X)HTML(5) • A resource representation data format... • ... for presentation markup • rendered by user agents (typically browsers) • focus on readability • less formal, user-friendly syntax and semantics
  Web Services • Application-to-application communication based on the Web architecture • simple and open standards (HTTP, XML, JSON, ...) • send data from Application A to Application B through the Web • usually define some API Web Application A Application B
  Linked Data
  Why Linked Data?
  15. 15. Why Linked Data?
  16. 16. Why Linked Data?
  Why Linked Data? • There is lots of information on the Web • ...valuable information that can be (re-)used • Problem • information is usually expressed in the form of HTML documents • the underlying raw data are locked in closed data silos (mostly DBMS)
  18. 18. (c) http://www.flickr.com/photos/docsearls/5500714140
  Why Linked Data? • The Web is successful because it provides • Uniform encoding (HTML) • Uniform addressing (URI) • Uniform transportation (HTTP) for the exchange of documents. • Why not apply the same mechanism to the underlying data?
  What is Linked Data? • A method to build a Web of Data • Architectural style, set of standards Web
  What is Linked Data? • A set of four principles • use URIs as names for things • use HTTP URIs so that people can look up those names • when someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • include links to other URIs, so that they can discover more things
  Enabling Technologies
  Uniform Resource Identifiers (URI) • Name and identify things (resources) • Dereferencable HTTP URIs http://dbpedia.org/resource/ The_Shining_(film) http://data.linkedmdb.org/ resource/film/2014 http://rdf.freebase.com/ns/m/ 04fjzv
  Resource Description Framework (RDF) • A model for representing data on the Web • Several statements (triples) form a graph http://dbpedia.org/ontology/ http://xmlns.com/foaf/0.1/ Film Person rdf:type rdf:type http://dbpedia.org/resource/ http://dbpedia.org/resource/ dbpprop:starring The_Shining_(film) Jack_Nicholson foaf:name rdfs:label rdfs:label dbpedia-owl:birthDate !" (#$) The Shining (film) 1937-04-22 Jack Nicholson
  RDF serialization (RDF/XML, N3, Turtle, etc.) • Data formats for RDF resource representations RDF Serialization Formats: RDF/XML, N3, Turtle, N-Triple, etc • Used to transfer RDF data between apps Data formats for RDF resource representations Used to transfer RDF data from application-to-application N3/Turtle example: @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpedia-owl:<http://dbpedia.org/ontology/> . <http://dbpedia.org/resource/The_Shining_%28film%29> rdf:type dbpedia-owl:Work , dbpedia-owl:Film . @prefix dbpprop:<http://dbpedia.org/property/> . @prefix ns9:<http://dbpedia.org/datatype/> . <http://dbpedia.org/resource/The_Shining_%28film%29> dbpprop:runtime"146.0"^^ns9:minute ;
  RDF Vocabulary Description Language (RDFS) • A language for describing the syntax and semantics of vocabularies in a machine- understandable way http://dbpedia.org/ontology/ Work rdfs:subClassOf http://dbpedia.org/ontology/ Film
  OWL - Web Ontology Language • A more expressive (formal) language for defining the syntax and semantics of vocabularies • Solves RDFS shortcomings but introduces quite some complexity http://www.w3.org/2002/07/ http://dbpedia.org/ontology/ owl#ObjectProperty Work rdf:type rdfs:domain http://dbpedia.org/ontology/ http://dbpedia.org/ontology/ rdfs:range starring Person rdfs:label starring
  Simple Knowledge Organization System (SKOS) • A language for describing controlled vocabularies (taxonomies, thesauri, classification schemes) http://dbpedia.org/resource/ Category:1980s_horror_films skos:subject rdf:type http://dbpedia.org/resource/ skos:broader http://www.w3.org/2004/02/ The_Shining_(film) skos/core#Concept rdf:type http://dbpedia.org/resource/ Category:1980s_films
  Links between Resources • OWL defines properties for linking resources http://dbpedia.org/resource/ http://dbpedia.org/resource/ dbpprop:starring The_Shining_(film) Jack_Nicholson owl:sameAs owl:sameAs owl:sameAs http://data.linkedmdb.org/ resource/film/2014 http://data.nytimes.com/ N5761411277431266513 http://rdf.freebase.com/ns/m/ 04fjzv
  SPARQL • A query language and protocol for accessing SPARQL - RDF Query Language RDF data on the Web A query language and protocol for accessing RDF data on the Web SELECT DISTINCT ?x WHERE {?x skos:subject <http:dbpedia.org/resource/Cate- gory:1980s_horror_films>} LIMIT 10
  Vocabulary / Data Publishing Best Practices
  Publishing Vocabularies • Hash-based URIs • e.g., http://example.com/example1#ClassA • Suited to group the description of a moderate number of related terms into one RDF document • Agent can retrieve terms with a single request • Slash-based URIs • e.g., http://example.com/example1/ClassB • Suited to split terms in large vocabularies into one document per term • No need to download a massive document
  Provide either: human-readable content from vocabulary URI
  or: machine-readable content from vocabulary URI ... depending on what is requested.
  Publishing Data • Distinguish between non-information and information resource • Sample non-information resource • http://dbpedia.org/resource/The_Shining_(film) • Sample information resource • http://dbpedia.org/page/The_Shining_(film) - HTML • http://dbpedia.org/data/The_Shining_(film) - RDF
  Publishing Data GET http://dbpedia.org/resource/The_Shining_(film) Accept: application/rdf+xml 303 See Other Location: http://dbpedia.org/data/The_Shining_(film) GET http://dbpedia.org/data/The_Shining_(film) Accept: application/rdf+xml 200 OK ... <?xml version="1.0" encoding="utf-8"?> <rdf:RDF ...
  The Linking Open Data Community Project
  Linking? Open? Data Project? • Open Data: a philosophy, practice, or policy that data are freely available to everyone without restrictions from copyright, patents, a.s.o. • Linked Data: method / best practices for exposing, sharing, and connecting data using URIs and RDF • Linking Open Data: a W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
  Useful Tools
  RDF APIs • Java • Jena Semantic Web Framework (http://openjena.org/) • Sesame RDF API (http://www.openrdf.org/) • PHP • ARC (http://arc.semsol.org/) • Ruby • RDF.rb: Linked Data for Ruby (http://rdf.rubyforge.org/) • Python • RDFLib (http://www.rdflib.net/) • C • Redland RDF Libraries (http://librdf.org/)
  RDF Stores • OpenLink Virtuoso (http://virtuoso.openlinksw.com/ dataspace/dav/wiki/Main/) • 4Store (http://4store.org/) • AllegroGraph (http://www.franz.com/agraph/ allegrograph/) • Oracle 11g (http://www.oracle.com/technetwork/ database/options/semantic-tech/ index.html) • ...and many more: http://www.w3.org/2001/sw/wiki/Tools
  RDF / Linked Data Wrappers • D2RQ - SPARQL / Linked Data for relational databases (http://www4.wiwiss.fu-berlin.de/ bizer/d2rq/) • OAI2LOD Server - expose any OAI-PMH source as Linked Data • TripFS - filesystem as Linked Data • TripCel - XLS spreadsheets as Linked Dat • ...
  Linked Data debugging Startup your console / terminal - native on Linux / Mac OS X - Windows: http://www.cygwin.com/ Dereference resources with cURL (http://curl.haxx.se/) curl -I -H "Accept: application/rdf+xml" http:// dbpedia.org/resource/The_Shining_%28film%29 curl -H "Accept: application/rdf+xml" http:// dbpedia.org/data/The_Shining_%28film%29
  Linked Data debugging Install the Raptor RDF Syntax Library (http:// librdf.org/raptor/) - Mac: brew install raptor Use the rapper utility to dereference URIs rapper http://dbpedia.org/resource/The_Shining_%28film %29 rapper -o rdfxml http://dbpedia.org/resource/ The_Shining_%28film%29
  Readings
  Required Reading • T. Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space, Chapters 1-5 http://linkeddatabook.com/editions/1.0/
  Recommended Readings • Linked Data Web Site: http://linkeddata.org • Linked Data / Semantic Web Introduction: http:// www.linkeddatatools.com/semantic-web-basics • Tim Berners-Lee. Linked Data Design Issues: http:// www.w3.org/DesignIssues/LinkedData.html • Best Practice Recipes for Publishing RDF Vocabularies: http://www.w3.org/TR/swbp-vocab-pub/ • How to Publish Linked Data on the Web: http:// www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/