This document summarizes the theoretical foundations of linked data query processing presented in a tutorial. It discusses the SPARQL query language, data models for linked data queries, full-web and reachability-based query semantics. Under full-web semantics, a query is computable if its pattern is monotonic, and eventually computable otherwise. Reachability-based semantics restrict queries to data reachable from a set of seed URIs. Queries under this semantics are always finitely computable if the web is finite. The document outlines computability results and properties regarding satisfiability and monotonicity for different semantics.
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)
1. Linked Data Query Processing
Tutorial at the 22nd International World Wide Web Conference (WWW 2013)
May 14, 2013
http://db.uwaterloo.ca/LDQTut2013/
2.
Theoretical Foundations
Olaf Hartig
University of Waterloo
2. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 2
● Data retrieval approach
● Data source selection
● Data source ranking
(optional, for optimization)
Query-local data
GET http://.../movie2449
● Result construction approach
● i.e., query-local data processing
● Combining data retrieval
and result construction
http://mdb.../Paul http://geo.../Berlin
http://mdb.../Ric http://geo.../Rome
?loc?actor
“Ingredients” for LD Query Execution
3. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 3
Outline
Basics of the SPARQL Query Language
Data Model for Linked Data Queries
Full-Web Query Semantics
➢ Definition
➢ Theoretical Properties
Reachability-Based Query Semantics
➢ Definition
➢ Theoretical Properties
4. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 4
General Idea of SPARQL
● SPARQL: Declarative query language for RDF data
● Main idea: pattern matching
● Query describes a pattern to be found in RDF data
● Basically, query patterns consist of RDF triples with variables
( ?p , affiliated with , orgaX ) ,
( ?p , interested in , ?i )
Basic Query Pattern:
5. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 5
General Idea of SPARQL
● SPARQL: Declarative query language for RDF data
● Main idea: pattern matching
● Query describes a pattern to be found in RDF data
● Basically, query patterns consist of RDF triples with variables
● Any part of the queried data that matches the query pattern
contributes a solution to the query result
( ?p , affiliated with , orgaX ) ,
( ?p , interested in , ?i )
Basic Query Pattern:
( alice , affiliated with , orgaX ) ,
( bob , affiliated with , acme ) ,
( alice , interested in , yoga ) ,
( alice , interested in , tea ) ,
( bob , interested in , tea ) ,
Data:
?p ?i
alice yoga
alice tea
Query Result:
6. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 6
Some Terminology
● Any query result is represented by a set of valuations
Ω = { μ1 , … , μn }
● Valuation: mapping of query variables to RDF terms
μ = { ?p → alice , ?i → yoga }
● Any valuation in a query result is a solution
● Two valuations μ1 and μ2 are compatible if they agree on
the binding for any shared variable
● i.e., μ1(?v) = μ2(?v) for all ?v Є dom(μ1) ∩ dom(μ2)
?p ?i
alice yoga
alice tea
7. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 7
Query Patterns and their Semantics
● Triple pattern: An RDF triple that may have a variable as
subject, predicate, or object, respectively
●
eval( tp, G ) :═ { μ | μ[tp] Є G and dom(μ) = vars(tp) }
● Group graph pattern: Conjunction of two query patterns
●
eval( (P1 AND P2), G ) :═ { μ1 U μ2 | μ1 Є eval(P1,G) and
μ2 Є eval(P2,G) and
μ1 and μ2 are compatible }
● Associative and commutative
● Basic graph pattern (BGP): A set of triple patterns
● B = { tp1 , … , tpn } is equivalent to ( tp1 AND … AND tpn )
● Filter, union, optional, etc.
8. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 8
Outline
Basics of the SPARQL Query Language
Data Model for Linked Data Queries
Full-Web Query Semantics
➢ Definition
➢ Theoretical Properties
Reachability-Based Query Semantics
➢ Definition
➢ Theoretical Properties
√
9. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 9
Data Model
● Static view (i.e., no changes during computation)
● Web of Linked Data: W = ( D, adoc, data )
● D – set of symbols that represent Web documents
from which Linked Data can be extracted
● adoc – partial mapping from URIs to D
● data – total mapping from D to finite sets of RDF triples
●
All data in W: AllData(W) :═ Ud є D data(d)
D = adoc( ) = data( ) =
10. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 10
Linked Data Query
Q(W)
11. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 11
( ?p , affiliated with , orgaX ) ,
( ?p , interested in , ?i )
?p ?i
alice yoga
alice tea
QP
(W) = { μ1 , μ2 , ... }
SPARQL-Based Linked Data Query
12. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 12
Outline
Basics of the SPARQL Query Language
Data Model for Linked Data Queries
Full-Web Query Semantics
➢ Definition
➢ Theoretical Properties
Reachability-Based Query Semantics
➢ Definition
➢ Theoretical Properties
√
√
13. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 13
Full-Web Semantics
QP
(W) :═ eval(P,AllData(W))
14. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 14
Computability
● (Ordinary) Turing machines
are unsuitable:
● Limited data access capabilities
not properly captured
● Web machines
● Abiteboul and Vianu, 1997
● Mendelzon and Milo, 1997
QP
(W) :═ eval(P,AllData(W))
TM
15. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 15
LD Machine
● Multi-tape Turing machine
➔ Input
➔ Work
➔ Output
16. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 16
LD Machine
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
● Multi-tape Turing machine
➔ Web Input
➔ Input
➔ Work
➔ Output
17. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 17
LD Machine
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
● Multi-tape Turing machine
➔ Web Input
➔ Input
➔ Work
➔ Output
● Access to Web Input is restricted
● Only by performing
a particular procedure
in a particular state
18. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 18
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
● MQ halts after a finite number of computation steps, and
● MQ outputs the complete result Q(W)
Finitely Computable LD Queries
step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k
∙ ∙ ∙
# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #
19. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 19
Eventually Computable LD Queries
step
k + 2
∙ ∙ ∙∙ ∙ ∙
step
k - 3
step
k - 2
step
k - 1
step
k
step
k + 1
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
# enc(μ1) # enc(μ2)
➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W), and
2. Each μ Q(W) eventually appears on the output
✗ No guarantee for termination
20. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 20
Not (LD Machine) Computable
step
k + 2
∙ ∙ ∙∙ ∙ ∙
step
k - 3
step
k - 2
step
k - 1
step
k
step
k + 1
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
# enc(μ1) # enc(μ2)
➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W), and
2. Each μ Q(W) eventually appears on the output
✗ No guarantee for termination
21. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 21
Theorem [Har12]: If a satisfiable SPARQLLD query
QP
under full-Web semantics is monotonic, then it is
eventually computable (but not finitely computable);
otherwise, it is not even eventually computable.
Theorem [Har12]: If a satisfiable SPARQLLD query
QP
under full-Web semantics is monotonic, then it is
eventually computable (but not finitely computable);
otherwise, it is not even eventually computable.
● Main reason: Set of all URIs is infinite (but countable)
Computability (Full-Web Semantics)
QP
(W) :═ eval(P,AllData(W))
22. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 22
Satisfiability and Monotonicity
Q( ) = { …, ri , … } ≠ Ø
⟹ Q( ) ⊆ Q( )
Proposition [Har12]: Let QP
be a SPARQLLD query under full-Web
semantics.
● QP
is satisfiable ⟺ its query pattern P is satisfiable
● QP
is monotonic ⟺ its query pattern P is monotonic
Proposition [Har12]: Let QP
be a SPARQLLD query under full-Web
semantics.
● QP
is satisfiable ⟺ its query pattern P is satisfiable
● QP
is monotonic ⟺ its query pattern P is monotonic
Unfortunately:
Satisfiability and
monotonicity of SPARQL
query patterns is undecidable
Satisfiability:
Monotonicity:
O
PT
tp
A
N
D
FILTER
U
N
IO
N
23. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 23
Theorem [Har12]: If a satisfiable SPARQLLD query
QP
under full-Web semantics is monotonic, then it is
eventually computable (but not finitely computable);
otherwise, it is not even eventually computable.
Theorem [Har12]: If a satisfiable SPARQLLD query
QP
under full-Web semantics is monotonic, then it is
eventually computable (but not finitely computable);
otherwise, it is not even eventually computable.
● Main reason: Set of all URIs is infinite (but countable)
Computability (Full-Web Semantics)
QP
(W) :═ eval(P,AllData(W))
24. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 24
Outline
Basics of the SPARQL Query Language
Data Model for Linked Data Queries
Full-Web Query Semantics
➢ Definition
➢ Theoretical Properties
Reachability-Based Query Semantics
➢ Definition
➢ Theoretical Properties
√
√
√
25. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 25
Reachability-Based Query Semantics
26. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 26
●
Seed URIs S ᑕ U
Reachability-Based Query Semantics
27. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 27
●
Seed URIs S ᑕ U● Reachability criterion c
●
(Turing) computable function c: T ×U × P → {true,false}
Reachability-Based Query Semantics
32. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 32
Satisfiability and Monotonicity
Q( ) = { …, ri , … } ≠ Ø
⟹ Q( ) ⊆ Q( )
Satisfiability:
Monotonicity:
Proposition [Har12]: Let c be a reachability criterion and QP
c
S
be a
SPARQLLD query under c-semantics.
● QP
c
S
is satisfiable ⟺ its SPARQL expression P is satisfiable
● QP
c
S
is monotonic ⇒ its SPARQL expression P is monotonic
Proposition [Har12]: Let c be a reachability criterion and QP
c
S
be a
SPARQLLD query under c-semantics.
● QP
c
S
is satisfiable ⟺ its SPARQL expression P is satisfiable
● QP
c
S
is monotonic ⇒ its SPARQL expression P is monotonic
Counterexample: Let QP
c
S
be a SPARQLLD query under c-semantics.
If c = cNone and |S| = 1, then QP
c
S
is monotonic.
Counterexample: Let QP
c
S
be a SPARQLLD query under c-semantics.
If c = cNone and |S| = 1, then QP
c
S
is monotonic.
33. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 33
LD Machine-Based Computability
Theorem [Har12]: For any reachability criterion c, any
SPARQL based Linked Data query under c-semantics
is finitely computable.
Theorem [Har12]: For any reachability criterion c, any
SPARQL based Linked Data query under c-semantics
is finitely computable.
● If we restrict our data model such that Webs
of Linked Data are finite (i.e., consist of a finite
number of documents), then:
For results w.r.t. the unrestricted data model (i.e.,
Webs of Linked Data that may be infinitely large),
refer to [Har12].
34. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 34
Outline
Basics of the SPARQL Query Language
Data Model for Linked Data Queries
Full-Web Query Semantics
➢ Definition
➢ Theoretical Properties
Reachability-Based Query Semantics
➢ Definition
➢ Theoretical Properties
√
√
√
√
Next part: 3. Source Selection Strategies ...
35. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 35
These slides have been created by
Olaf Hartig
for the
WWW 2013 tutorial on
Link Data Query Processing
Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)