Executing SPARQL Queries of the Web of Linked Data
1. Executing SPARQL Queries
over the
Web of Linked Data
Olaf Hartig*
Christian Bizer˚
Johann-Christoph Freytag*
*Humboldt-Universität zu Berlin ˚Freie Universität Berlin
2. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
● When someone looks up a
URI, provide useful
information.
● Include links to other URIs so
that they can discover more
things.
Tim Berners-Lee, July 2006
My Movie DB
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
3. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
● When someone looks up a
URI, provide useful
information.
● Include links to other URIs so
that they can discover more
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
4. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
5. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
6. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
7. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://geo.db/country21
http://geo.db/country7
http://mymovie.db/movie5112
My Movie DB http://geo.db/cityCJ
http://geo.db/cityXA
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
8. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://geo.db/country21
http://geo.db/country7
http://mymovie.db/movie5112
My Movie DB http://geo.db/cityCJ
http://geo.db/cityXA
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
9. ● The Web: a huge, globally distributed dataspace
● Querying this dataspace opens new possibilities:
● Aggregating data from different sources
● Integrating fragmentary information
● Achieving a more complete view
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
10. Traditional approach 1:
data centralization
● Querying a collection of
copies from all relevant
datasets
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
11. Traditional approach 1: data centralization
● Querying a collection of
copies from all relevant
datasets
● Misses unknown or new sources
● Collection probably out of date
● Will it scale?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
12. Traditional approach 2:
federated query processing
● Querying a mediator which ?
distributes subqueries to
relevant sources and
integrates the results
?
? ?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
13. Traditional approach 2: federated query processing
● Querying a mediator which distributes
subqueries to relevant sources and
integrates the results ?
● Requires sources to
provide a query service
Requires information
?
●
about the sources
? ?
● Misses unknown
or new sources
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
14. Main drawback:
You have to know the relevant
data sources in advance.
You restrict yourself to
the selected sources.
You do not tap the
full potential of
the Web !
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
15. A novel approach:
Link Traversal Based Query Execution
Allows data sources to be discovered at runtime
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
16. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
17. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
18. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
19. Main Idea
● Intertwine query evaluation with traversal of RDF links
Alternately:
htt
●
p:/
/.
Evaluate parts of the query on a
../m ?
●
continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
20. Main Idea
● Intertwine query evaluation with traversal of RDF links
Alternately:
htt
●
p:/
/.
Evaluate parts of the query on a
../m ?
●
continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
21. Main Idea
● Intertwine query evaluation with traversal of RDF links
Alternately:
htt
●
p:/
/.
Evaluate parts of the query on a
../m ?
●
continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
22. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
23. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
filmingLocation
http://.../movie2449 http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
24. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
filmingLocation
http://.../movie2449 http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
25. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
? aly
continuously augmented set of data
./I t
..
g eo
Look up URIs in intermediate
://
●
p
htt
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
26. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
? aly
continuously augmented set of data
./I t
..
g eo
Look up URIs in intermediate
://
●
p
htt
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
27. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
? aly
continuously augmented set of data
./I t
..
g eo
Look up URIs in intermediate
://
●
p
htt
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
28. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
29. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
30. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
tics http://stat.db/.../it
statis
http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
31. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate ?loc ?stat
solutions and add retrieved data http://geo.../Italy http://stats.db/../it
to the queried data set
tics http://stat.db/.../it
statis
http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
32. In a Nutshell
● Link traversal based query execution:
● Evaluation on a continuously augmented dataset
● Discovery of potentially relevant data during execution
● Discovery driven by intermediate solutions
● Main advantage:
● No need to know all data sources in advance
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
33. Real-World Examples
SELECT DISTINCT ?author ?phone WHERE {
?pub swc:isPartOf
<http://data.semanticweb.org/conference/eswc/2009/proceedings> .
?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .
?pub swrc:author ?author .
{ ?author owl:sameAs ?authorAlt }
Return phone numbers of
authors of ontology engineering papers
UNION
at ESWC'09.
{ ?authorAlt owl:sameAs ?author }
?authorAlt foaf:phone ?phone # of query results 2
} # of retrieved graphs 297
# of accessed servers 16
avg. execution time 1min 30sec
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
34. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
35. Iterator based Query Execution
● Iterator:
● implements an operation
● is a group of functions:
OPEN, GETNEXT, CLOSE
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
36. Iterator based Query Execution
● Iterator:
● implements an operation
I1
● is a group of functions:
OPEN, GETNEXT, CLOSE
I2
● Query execution uses
a chain of iterators
I3
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
37. Iterator based Query Execution
● Iterator:
● implements an operation
http://.../movie2449
I1
● is a group of functions: filmin
gLoc
ation ?loc
OPEN, GETNEXT, CLOSE
stati
stics
?stat I2
?loc
● Query execution uses
a chain of iterators
?stat
I3
Each iterator responsible
unem
p
● _rate
?ur
for a single triple pattern
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
38. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Find matching triples match(tpcur ) in queried data set
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
39. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
2. Find matching triples match(tpcur ) in queried data set
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
40. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
41. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
42. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
3. Create solution μ' for each t in match(tpcur )
μ' = { ?s → http://db... }
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
43. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
3. Create solution μ' for each t in match(tpcur )
μ' = { ?s → http://db... }
4. Return each μcur U μ' as a result
{ ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... }
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
44. Iterator based Query Execution
● Results of Ii are solutions for tp1 , … , tpi
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
45. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
46. Application to Link Traversal
● The queried data set grows
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
47. Application to Link Traversal
● The queried data set grows
● Look-up Requirement:
Ii-1 for tpi-1
Do not evaluate tpcur until the
queried data set contains all
data that can be retrieved from Ii for tpi
all URIs in tpcur
Ii+1 for tpi+1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
48. Application to Link Traversal
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
49. Application to Link Traversal
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Return each μcur U μ' as a result
Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
50. Application to Link Traversal
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Return each μcur U μ' as a result
Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
51. Blocked Query Execution
● Waiting for URI look-ups
blocks query execution
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
52. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
53. URI Prefetching
● Waiting for URI look-ups
blocks query execution
● URI prefetching: when a URI Ii-1 for tpi-1
is bound to a variable initiate
look-up in the background
Initiate look-up
Ii for tpi
Ii+1 for tpi+1
Ensure look-up
is finished Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
54. URI Prefetching
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Initiate parallel look-up for each new URI in μ'
6. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
55. URI Prefetching
Ii-1 for tpi-1
Initiate look-up
Ii for tpi
Ii+1 for tpi+1
Ensure look-up
is finished Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
56. URI Prefetching
Ii-1 for tpi-1
Initiate look-up
Ii for tpi
Ii+1 for tpi+1
Wait until look-up
is finished Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
57. URI Prefetching
● Even with URI prefetching
query execution may block
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Wait until look-up
is finished
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
58. URI Prefetching
● Even with URI prefetching
query execution may block
Ii-1 for tpi-1
Ii for tpi
● Possible solutions:
● Program parallelism
● Asynchronous pipeline Ii+1 for tpi+1
● Drawback: requires major Wait until look-up
rewrite of existing is finished
query engines
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
59. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
60. Postponing Iterator
● Enabled by an extension of the iterator paradigm:
● New function POSTPONE: take most recently provided
result back
● Adjusted GETNEXT: either return the next result or return
a formerly postponed result
● POSTPONE allows to temporarily reject input solution μcur
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
61. Postponing Iterator
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. POSTPONE μcur if look-up requirement doesn't hold for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Initiate parallel look-up for each new URI in μ'
6. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
62. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
63. Evaluation
● Implementation: Semantic Web Client Library (SWClLib)
http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
● Berlin SPARQL Benchmark (BSBM)
● Simulates e-commerce scenario
● Mix of 12 SPARQL queries
● Generates datasets of different sizes (scaling factor)
● Simulation of the Web of Linked Data
● Linked Data server publishes BSBM datasets
● Experiment
● Adjusted BSBM queries link to the simulation server
● Execute query mix with SWClLib
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
64. Evaluation
250
w/o prefetching
w/ prefetching
avg. execution time per query mix in seconds
non-blocking +
200 prefetching
all data retrieved
in advance
150
100
50
scal.factor # of triples # of entities
10 4,971 613
20 8,485 928
30 11,999 1,245
0 40 16,918 1,845
10 20 30 40 50 60
BSBM scaling factor 50 22,616 2,599
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data 60 26,108 2,914
65. Take-away Summary
● Novel query execution approach for the Web of Data:
● Utilizes the characteristics of the Web
● Traverses RDF links during query execution
● Discovery of new data sources
● No need to know all data sources in advance
● Implementation approach:
● Iterator based execution with URI Prefetching
● Extension of the iterator paradigm (POSTPONE)
● New research challenges:
● Improving result completeness
● Investigating suitable caching strategies
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
66. Try it!
● SQUIN http://squin.org
● Provides SWClLib functionality as a Web service
● Accessible like a SPARQL endpoint
● Public SQUIN service at
http://squin.informatik.hu-berlin.de/SQUIN/
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
67. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data