The document discusses link traversal based query execution for querying linked data on the web. It describes an approach that alternates between evaluating parts of a query on a continuously augmented local dataset, and looking up URIs in solutions to retrieve more data and add it to the local dataset. This allows querying linked data as if it were a single large database, without needing to know all data sources in advance. A key issue is how to efficiently cache retrieved data to avoid redundant lookups.
The Impact of Data Caching of on Query Execution for Linked Data
1. The Impact of
Data Caching on
Query Execution for Linked Data
Olaf Hartig
http://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research Group
Humboldt-Universität zu Berlin
2. Can we query the Web of Data
as of it were a single,
giant database?
SELECT DISTINCT ?i ?label
WHERE {
?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ;
foaf:topic_interest ?i .
}
OPTIONAL {
}
?i rdfs:label ?label
FILTER( LANG(?label)="en" || LANG(?label)="")
ORDER BY ?label
?
Our approach: Link Traversal Based Query Execution
[ISWC'09]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
3. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
4. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
5. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
6. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
7. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
8. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
“Descriptor object”
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
9. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
10. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://bob.name
Query kno
ws
http://bob.name
http://alice.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
11. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://bob.name
Query kno
ws
http://bob.name
http://alice.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
12. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
13. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
14. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
15. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
16. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
17. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://alice.name
Query pr o
http://bob.name jec
t
?prjName http://.../AlicesPrj
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
18. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset
http://alice.name
Query pr o
http://bob.name jec
t
?prjName http://.../AlicesPrj
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
19. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
20. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset ?prj ?prjName
http://.../AlicesPrj “…“
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
21. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset ?prj ?prjName
http://.../AlicesPrj “…“
Query ?acq ?prj ?prjName
http://bob.name
?prjName http://alice.name http://.../AlicesPrj “…“
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
22. Characteristics
● Link traversal based query execution:
● Evaluation on a continuously augmented dataset
● Discovery of potentially relevant data during execution
● Discovery driven by intermediate solutions
● Main advantage:
● No need to know all data sources in advance
● Limitations:
● Query has to contain a URI as a starting point
●
Ignores data that is not reachable* by the query execution
*
formal definition in [LDOW'11a]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
23. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
24. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
htt query-local
p: //b
ob dataset
? .nam
e
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
25. The Issue
Query
?acq interest http://bob.name
?i
kno
s
ow
w s
label
kn
http://alice.name
http://bob.name
?iLabel
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
26. The Issue
Query
?acq interest http://bob.name
?i
kno
s
ow
w s
label
kn
http://alice.name
http://bob.name
?iLabel
query-local
dataset
?acq ?i ?iLabel
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
27. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
28. Reusing the Query-Local Dataset
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
29. Reusing the Query-Local Dataset
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
http://alice.name
o ws
Query kn
http://bob.name
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
30. Reusing the Query-Local Dataset
Query
?acq interest
?i ?acq
s
ow
http://alice.name
label
kn
http://bob.name
?iLabel
http://alice.name
o ws
Query kn
http://bob.name
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
31. Hypothesis
Re-using the query-local dataset (a.k.a. data caching)
may benefit
query performance + result completeness
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
32. Contributions
● Systematic analysis of the impact of data caching
●
Theoretical foundation*
●
Conceptual analysis*
● Empirical evaluation of the potential impact
*
see [LDOW'11a]
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
33. Experiment – Scenario
● Information about the
distributed social
network of FOAF
profiles
● 5 types of queries
● Experiment Setup:
● 20 persons
● Sequential use
➔ 100 queries
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
34. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
query
per ● no reuse experiment:
0,01 0,1 1 10 100
ContactInfoDanBri
● No data caching
(Query No. 61) ● reuse per query experiment
UnsetPropsDanBri ● Reuse of query-local dataset
(Query No. 62) for 3 executions of each query
2ndDegree1DanBri
● Third execution measured
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
35. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
query
per ● no 0,01
reuse experiment:
0,1 1 10 100
ContactInfoDanBri
● No data caching
(Query No. 61) ● reuse per query experiment
UnsetPropsDanBri ● Reuse of query-local dataset
(Query No. 62) for 3 executions of each query
2ndDegree1DanBri
● Third execution measured
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
36. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
37. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
38. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
39. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
query
per reuse 40
queries
● reuse all queries experiment: 100
0,01 0,1 1 10
ContactInfoDanBri
● Reuse of the query-local
(Query No. 61) dataset for the complete
sequence of all 100 queries
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
40. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
41. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
42. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
43. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
44. Outlook
● Requirements of a data cache:
● Replacement mechanism
● Coherency mechanism
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
45. Cache Replacement
● Cache full → remove descriptor objects
● Replacement strategy
● Primary goal: maximize hit rate
● Recency-based
● Frequency-based
● Function-based
● Randomized
● Replacement process
● Watermarks: high and low
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
47. Studying Cache Replacement?
“Web cache replacement in its general
form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of
Web Cache Replacement Strategies, 2003
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
48. Studying Cache Replacement?
“Web cache replacement in its general
form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of
Web Cache Replacement Strategies, 2003
●
6 quad indexes* in main memory
● Size grows linear in the number of quads
● Example (after reuse all queries experiment, 100 queries):
● 905 descriptor objects, overall number of 745,756 triples
● ca. 103 MB
➔ Available main memory is almost no limit
*
as introduced in [LDOW'11b]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
49. Cache Coherency
● Data items in the cache may become inconsistent
● Strong cache consistency
● Server validation
● Client validation
● Weak cache consistency
● Time to live (TTL)
● Adaptive TTL
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
50. Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET
● Request with If-Modified-Since header
● Possible response: 304 Not Modified
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
51. Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET
● Request with If-Modified-Since header
● Possible response: 304 Not Modified
● Not supported by most Linked Data servers
● Experiment based on the CKAN catalog of linked datasets
● 41 out of 154 example resources (26.6%) from 110 datasets
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
52. Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:
● Expires
● Cache-Control: max-age
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again
● Conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
53. Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:
● Expires 37.0%
● Cache-Control: max-age 37.7%
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again
● Conditional GET 26.6%
● Alternative (due to lack of support in Linked Data servers):
● Assume a default TTL for each object
● Ordinary GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
54. Adaptive TTL
● Assumption:
● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:
● Threshold = 10% ; age = 30 days → TTL = 3 days
● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation:
● Calculation of age: use Last-Modified response header
● Verification with conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
55. Adaptive TTL
● Assumption:
● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:
● Threshold = 10% ; age = 30 days → TTL = 3 days
● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation: 35.1%
● Calculation of age: use Last-Modified response header
● Verification with conditional GET
● Alternative (due to lack of support in Linked Data servers):
● Assume Last-Modified is time of first retrieval
● Verification by comparing a response to the current version
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
56. Summary
● Systematic analysis of the impact of data cache
● Theoretical foundation
● Conceptual analysis
● Empirical evaluation
● Main findings:
● Additional results possible (for semantically similar queries)
● Impact on performance may be positive but also negative
● Future work:
● Analysis of caching strategies in our context
● Main issue: invalidation
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
58. Contributions
● Theoretical foundation (extension of the original definition)
● Reachability by a Dseed-initialized execution of a BGP query b
● Dseed-dependent solution for a BGP query b
● Reachability R(B) for a serial execution of B = b1 , … , bn
➔ Each solution for bcur is also R(B)-dependent solution for bcur
● Conceptual analysis of the impact of data caching
● Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B )
● Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] )
● Empirical verification of the potential impact
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
60. Query Template UnsetProps
SELECT DISTINCT ?result ?resultLabel WHERE
{
?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> .
?result rdfs:domain foaf:Person .
OPTIONAL { <PERSON> ?result ?var0 }
FILTER ( !bound(?var0) )
<PERSON> foaf:knows ?var2 .
?var2 ?result ?var3 .
?result rdfs:label ?resultLabel .
?result vs:term_status ?var1 .
}
ORDER BY ?var1
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
61. Query Template Incoming
SELECT DISTINCT ?result WHERE
{
?result foaf:knows <PERSON> .
OPTIONAL
{
?result foaf:knows ?var1 .
FILTER ( <PERSON> = ?var1 )
<PERSON> foaf:knows ?result .
}
FILTER ( !bound(?var1) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
62. Query Template 2ndDegree1
SELECT DISTINCT ?result WHERE
{
<PERSON> foaf:knows ?p1 .
<PERSON> foaf:knows ?p2 .
FILTER ( ?p1 != ?p2 )
?p1 foaf:knows ?result .
FILTER ( <PERSON> != ?result )
?p2 foaf:knows ?result .
OPTIONAL {
<PERSON> ?knows ?result .
FILTER ( ?knows = foaf:knows )
}
FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
63. Query Template 2ndDegree2
SELECT DISTINCT ?result WHERE
{
<PERSON> foaf:knows ?p1 .
<PERSON> foaf:knows ?p2 .
FILTER ( ?p1 != ?p2 )
?result foaf:knows ?p1 .
FILTER ( <PERSON> != ?result )
?result foaf:knows ?p2 .
OPTIONAL {
<PERSON> ?knows ?result .
FILTER ( ?knows = foaf:knows )
}
FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
64. Experiment – Single Query
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
1
Averaged over all 100 queries
● In the ideal case for Bupper= [ bcur , bcur ] :
● pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] )
● supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
65. Experiment – Single Query
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
1
Averaged over all 100 queries
● Summary (measurement errors aside):
● Same number of query results
● Significant improvements in query performance
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
66. Experiment – Complete Sequence
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
44.87 0.991 37.91 s
reuse all queries
(178.36) (0.053) (112.94)
1
Averaged over all 100 queries
● Summary:
● Data cache may provide for additional query results
● Impact on performance may be positive but also negative
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
67. Experiment – Complete Sequence
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
44.87 0.991 37.91 s
reuse all queries
(178.36) (0.053) (112.94)
reuse all queries 118.18 0.992 20.61 s
(random orders) (867.07) (0.016) (216.61)
● Executing the query sequence in a random order results in
measurements similar to the given order.
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
68. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68