The Linked Data and Services presentation was presented by Andreas Harth (KIT) and Barry Norton (KIT) at the PlanetData project Kick-off Meeting on October 11, 2010 in Palma de Mallorca, Spain.
1. Linked Data and Services
Andreas Harth and Barry Norton
Institute AIFB
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association www.kit.edu
2. Outline
! Motivation
! Linked Data Principles
! Query Processing over Linked Data
! Linked Data Services (LIDS) and Linked Open
Services (LOS)
! Conclusion
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
3. Motivation
! Semantic Web/Linked Data technologies are well-suited
for data integration
? !
Common Data
Data Interactive Data
Format/Access
Integration Exploration
Protocol
8/10/11 Taking the LIDS off Data Silos KIT – University of the State of Baden-Wuerttemberg and
Andreas Harth National Laboratory of the Helmholtz Association
4. Linked Data Principles*
1. Use URIs to name things; not only documents, but
also people, locations, concepts, etc.
2. To enable agents (human users and machine agents
alike) to look up those names, use HTTP URIs
3. When someone looks up a URI we provide useful
information; with 'useful' in the strict sense we usually
mean structured data in RDF.
4. Include links to other URIs allowing agents (machines
and humans) to discover more things
(*) http://www.w3.org/DesignIssues/LinkedData.html
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
5. Correspondence between thing-URI and
source-URI
User Agent
http://www.polleres.net/foaf.rdf#me
HTTP RDF
GET
Web Server
http://www.polleres.net/foaf.rdf
5 KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
6. Correspondence between thing-URI and
source-URI
User Agent
http://dbpedia.org/resource/Gordon_Brown
HTTP 303 HTTP RDF
GET GET
http://dbpedia.org/data/Gordon_Brown
Web Server
http://dbpedia.org/page/Gordon_Brown
6 KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
7. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
8. Queries over Linked Data
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
?f ?n
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
9. Querying Data Across Sources
! Data warehousing or materialisation-based approaches
(MAT)
CRAWL INDEX SERVE
! Distributed query processing approaches (DQP)
SELECT * R S
FROM…
R S
9 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
10. DQP on Linked Data
SELECT * R S
FROM…
R S ODBC ODBC
SELECT ?s TP TP
WHERE… HTTP HTTP
TP TP GET GET
10 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
11. Query Processing Overview
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
TP TP
(an:f#ah foaf:knows ?f) (?f foaf:name ?n)
Select source HTTP RDF Select source
HTTP RDF
(s) GET GET (s)
?f ?n
http://danbri.org/foaf.rdf#danbri Dan Brickley
11 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
12. Barry
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
13. Problem: Source Selection for Triple Patterns
! (?s ?p ?o)
! (#s ?p ?o)
! (?s #p ?o)
! (?s ?p #o)
! (#s #p ?o)
! (#s ?p #o)
! (?s #p #o)
! (#s #p #o)
! Given a triple pattern, which source can contribute bindings
for the triple pattern?
13 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
14. Schema-Level Indices [Stuckenschmidt et al.
2004]
! Keep index of properties and/or classes contained in
sources
! (?s #p ?o), (?s rdf:type #o)
! Covers only queries containing schema-level elements
! Commonly used properties select potentially too many
sources
SELECT ?x1 ?x2 WHERE {
SELECT ?f ?n WHERE {
dblppub:HoganHP08 dc:creator ?a1.
an:f#ah foaf:knows ?f.
?x1 owl:sameAs ?a1.
?f foaf:name ?n.
?x2 foaf:knows ?x1.
}
}
14 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
15. Direct Lookup (DL) [Hartig et al. 2009]
! Exploits correspondence between thing-URI and source-URI
! Linked Data sources (aka RDF files) return typically triples with a
subject corresponding to the source
! Sometimes the sources return triples with object corresponding to the
source
! (#s ?p ?o), (#s #p ?o), (#s #p #o)
! (?s ?p #o), (?s #p #o)
! Incomplete wrt. patterns but also wrt. to URI reuse across sources
! Limited parallelism, unclear how to schedule lookups
SELECT ?x1 ?x2 WHERE {
SELECT ?f ?n WHERE {
dblppub:HoganHP08 dc:creator ?a1.
an:f#ah foaf:knows ?f.
?x1 owl:sameAs ?a1.
?f foaf:name ?n.
?x2 foaf:knows ?x1.
}
}
15 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
16. Approximate Data Summaries
! Combined description of schema-level and instance-level
! Use approximation to reduce index size (incurs false positives)
! Possible to use entire query for source selection
! Parallel lookups since sources can be determined for the entire query
! (?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ?
o), (#s ?p #o), (?s #p #o), (#s #p #o)
! and combinations of triple patterns
SELECT ?x1 ?x2 WHERE {
SELECT ?f ?n WHERE {
dblppub:HoganHP08 dc:creator ?a1.
an:f#ah foaf:knows ?f.
?x1 owl:sameAs ?a1.
?f foaf:name ?n.
?x2 foaf:knows ?x1.
}
}
16 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
17. Implementation
! Deploy wrappers „in the cloud“
! Google App Engine: hosting of Java and Python
webapps on Google’s Cloud infrastructure
! Limited amount of processing time (6hrs/day)
! Single-threaded applications
! Suited for deploying wrappers
! e.g. http://twitter2foaf.appspot.com/ converts Twitter
user data to RDF
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
18. Linking Open Data Cloud 2007
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
19. Linking Open Data Cloud 2008
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
20. Linking Open Data Cloud 2009
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
21. Linking Open Data Cloud 2010
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
22. Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
23. Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
24. Geonames Services
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
25. Geonames Services
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
26. Linked Open Service Principles
REST Principles
1. Application state and functionality is divided into resources
2. Every resource is uniquely addressable
3. All resources share a uniform interface:
a) A constrained set of well-defined operations
b) A constrained set of content types
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using
the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
Linked Open Service Principles
1. Describe services as LOD prosumers with input and output
descriptions as SPARQL graph patterns
2. Communicate RDF by RESTful content negotiation
3. The output should make explicit its relation with the input
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
27. LOS Weather Service
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
28. LOS Geo Resources
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
29. Resource-Based Linked Open Services
GET
Accept: text/html
303 REDIRECT /page
GET
Accept: application/rdf
Linked Data
+xml
(or text/n3)
303 REDIRECT /data
GET /weather
Linked Service
Accept: application/rdf
+xml
(or text/n3)
200 <rdf:Description>
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
30. Interlinking Data with Data from Services?
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
31. Data Services
! Given input, provide output
! Input and output are related in a service-specific way
! Do not change the state of the world
Input relation Output
defines
Service
! E.g. GeoNames findNearbyWikipedia service
! Input: lat/lon
! Output: places KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
! Relation: output places that are nearby input place
32. Linked Data Services
! We’d like to integrate data services with Linked Data
1. LIDS need to adhere to Linked Data principles
! We’d like to use data services in software programs
2. LIDS need machine-readable descriptions of input and
output
! Compared to naïve approach: assign URI to service output
! Relationship between input and output is explicitly
described
! Dynamicity is supported KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
33. 1. Data Services as Linked Data
! Input is given as URI Service Endpoint
http://geowrap.openlids.org/findNearbyWikipedia
?lat=37.416&lng=-122.152 Parameters
#point Input Identifier
Output
! Resolving the URI yields
Relation
RDF: Input
@prefix dbp: <http://dbpedia.org/resource/> .
@prefix : <http://geo..Wiki?
lat=37.416&lng=-122.152#>
:point
foaf:based_near dbp:Palo_Alto
KIT – University of the State of Baden-Wuerttemberg and
%2C_California ; National Laboratory of the Helmholtz Association
foaf:based_near dbp:Packard%27s_garage .
34. 2. LIDS Descriptions
! LIDS characterised by
! Endpoint URI ep, which is the base for all input entities
! Local identifier i of input entity
! List of parameters Xi
! Basic graph pattern Ti describing conditions on parameters
! Basic graph pattern To describing minimum output data
! Example:
ep = <http:/geowrap.openlids.org/findNearbyWikipedia>
i = point
Xi = {?lat, ?lng}
Ti = ?point a Point . ?point geo:lat ?lat .
?point geo:long ?lng
To = ?point foaf:based_near ?feature
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
35. Interlink LIDS and Linked Data
! Generate service URIs
with input bindings,
from evaluating :
select Xi where Ti
! sameAs: binding for i
36. Scale-Up Experiment: Link BTC to GeoNames
! 3 billion triples from the Billion Triple Challenge (BTC) 2010
data set:
! Annotate with LIDS wrapper of GeoNames findNearby
service
! Annotation time: < 12 hours on laptop!
! ~ 12 hours for uncompressing the data set, cleaning
results, and gather statistics
! Original BTC data: 74 different domains that linked to
GeoNames URIs
! Interlinking process added 891 new now linked to LIDS
geowrap
! In total 2,448,160 new links were added
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
37. Query Answering using LIDS and Linked Data
! Query execution
resolves URIs
! => enlarges data set
! LIDS are interlinked
! Query is executed
again on new data set
! Repeat until no new
links or no new data
! Combine results
38. Experiment: Query Answering
! Input:
List of 562 (potential) universities from Facebook Graph
API
! Output:
Facebook fans and DBpedia student numbers for 104
universities
! PREFIX u: <http://openlids.org/
universities.rdf#> SELECT ?n ?f ?s WHERE {
u:list foaf:topic ?u . ?u foaf:name ?
n .
?u og:fan_count ?f .?u
d:numberOfStudents ?s }
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
39. Linked Services and PlanetData
! Several areas seem likely to produce services:
! Stream, inc. Sensor, resources (latest values)
! Any others exposing dynamic resources
! Dynamic computations, inc. on-the-fly quality
assessments
! Other areas seem likely to consider service
technologies and move towards more service-like
HTTP interactions
! Access control (OpenID, OAuth, etc.)
! Finally, remaining areas could serve to complement
LIDS/LOS alignment
! Provenance
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association