HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
Data Exchange over RDF
1. Data Exchange over RDF
Andr´s Letelier
e
Advisor: Marcelo Arenas
Pontificia Universidad Cat´lica de Chile
o
September 1, 2011
2. What is data exchange?
Problem
Data under one schema S needs to be restructured and translated
into a target schema T
S −→ T
IS −→ IT
3. Schema mappings
Question
Which source instances corresponds to which target instances?
Answer
Schema mappings:
M ⊆ Instances(S) × Instances(T)
Usually, schema mappings are defined as M = (S, T, ΣST )
4. Definition (Solution)
I2 is a solution of I1 under M iif (I1 , I2 ) ∈ M
The set of all solutions for I1 under M is denoted by SolM (I1 )
5. Resource Description Framework (RDF)
Data model for representing information about World Wide
Web resources
W3C Recommendation (1998)
Part of the semantic web stack
Directed, labeled graphs
Blank nodes (labeled nulls)
Basically, sets of triples (s, p, o)
6. Example
D= {
(B1 name paul)
(B1 email paul@example.edu)
(B2 name john)
(B2 city Liverpool)
}
7. SPARQL (pronounced “sparkle”)
Query language for RDF
W3C Recommendation(2008)
Standard for querying RDF datasets
Returns sets of partial mappings
Operators:
Projection
AND (inner join)
OPT (left join)
FILTER
UNION
and more
8. Example
P1 = (?X, name, ?Y )
?X ?Y
P1 D = B1 paul
B2 john
9. Example
P2 = (?X, name, ?Y ) AND (?X, email, ?Z)
?X ?Y ?Z
P2 D =
B1 paul paul@example.edu
10. Example
P3 = (?X, name, ?Y ) OPT (?X, email, ?Z)
?X ?Y ?Z
P3 D = B1 paul paul@example.edu
B2 john
11. Well-designed SPARQL patterns
Definition (Well-designed patterns)
A pattern P is well designed if for every subpattern P of the form
P1 OPT P2 , every variable that appears in P2 and outside P also
appears in P1 .
Example
(?X, name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A))
is well-designed
(?X, name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A))
is not
12. Data Exchange over RDF
S and T are fixed to be RDF triples
Tuple generating dependencies have to be redefined
But first, we need some definitions...
13. RDF Tuple Generating Dependencies
Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and
Ω1 and Ω2 be sets of mappings. Then:
var(P ) are the variables mentioned in P
dom(µ1 ) is the domain of µ1
A SPARQL SELECT query (denoted by (W, P ), where
W ⊆ var(P )) is the projection of the evaluation of P onto
the variables in W
14. RDF Tuple Generating Dependencies
Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and
Ω1 and Ω2 be sets of mappings. Then:
µ1 is subsumed by µ2 (µ1 µ2 ) if dom(µ1 ) ⊆ dom(µ2 ), for
every ?X in dom(µ1 ) that is not bound to a blank node we
have that µ1 (?X) = µ2 (?X) and for every pair of variables
?X and ?Y in dom(µ1 ) such that µ1 (?X) = µ1 (?Y ) it is the
case that µ2 (?X) = µ2 (?Y ).
Ω1 is subsumed by Ω2 (Ω1 Ω2 ) if for every mapping µ1 in
Ω1 there exists a mapping µ2 in Ω2 such that µ1 µ2 .
15. RDF Tuple Generating Dependencies
(Re)Definition (Tuple Generating Dependencies)
Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1 ) ∩ var(P2 ).
An RDF tgd is a sentence of the form
(W, P1 ) → (W, P2 )
Given two RDF graphs G1 and G2 , and a set of tgds Σ,
(G1 , G2 ) |= Σ if for every tgd (W, P1 ) → (W, P2 ) in Σ it is the
case that (W, P1 ) G1 (W, P2 ) G2
16. RDF Schema Mappings
Since S and T are fixed,
M=Σ
G2 ∈ SolM (G1 ) ←→ (G1 , G2 ) |= Σ
17. Universal solutions
Example
Let W = {?X}, Σ =
{(W, (?X, name, ?Y ) AND (?X, email, ?Z)) →
(W, (?Y, hasmail, ?Z))}
and consider the dataset D:
Solution 1
G2 = {
(paul hasmail paul@example.edu)
}
Solution 2
G2 = {
(paul hasmail paul@example.edu)
(john hasmail n)
}
18. Universal solutions
Definition
A solution G2 is universal if for every other solution G2 , G2 G2
Solution 1 is universal
Solution 2 is not
19. Universal solutions
Not all settings have universal solutions:
Consider G1 = {(1, 2, 3)}, W = {?X, ?Y } and
Σ = {(W, (?X, ?Y, ?Z)) →
(W, ((?X, a, b) OPT (?W, b, ?Y ))
AND ((?X, c, d) OPT (?Z, d, ?Y )))}
20. Solution 1
G2 = {
(1 a b)
( n1 b 2)
(1 c d)
}
Solution 2
G2 = {
(1 a b)
( n2 d 2)
(1 c d)
}
This setting has no universal solution!
21. Good and bad news
Bad news
There is no ensurance that an exchange setting that has a solution
will have a universal solution
Good news
If the heads of all tgds in Σ are well-designed and there is a
solution, there is always a universal solution
Better news
We have an algorithm
22. “Chasing” SPARQL queries
input A mapping µ and a (well-designed) SPARQL pattern P
output An RDF graph G such that µ ∈ P G
Chase(µ, ν, P, G)
t:
add unbound variables in t as fresh blank nodes to ν
add ν(t) to G
P1 AND P2 :
Chase(µ, ν, P1 , G)
Chase(µ, ν, P2 , G)
P1 OPT P2 :
Chase(µ, ν, P1 , G)
if dom(µ) dom(ν) ∩ var(P2 ) = ∅: Chase(µ, ν, P2 , G)
23. After chasing:
µ ν
ν∈ P G
{µ} P G
If we chase with every P2 in Heads(Σ) the evaluations of
(W, P1 ) G1 , we get a universal solution.
24. Certain answers
Definition (Certain answers on a regular data exchange setting)
The set of certain answers is the intersection of the evaluation of
the query over all the valid solutions
Example
Consider G1 = {(1, 2, 3)} and
{({?X},(?X, ?Y, ?Z)) →
({?X}, (?X, 1, 2) OPT (?X, ?Y, 3))}
26. Certain answers
Given a pattern P and a set of RDF graphs G, let Lower(P, G) be
the set of all lower bounds of G w.r.t. subsumption.
(Re)Definition (Certain Answers)
The set of certain answers of a set of RDF graphs and a SPARQL
pattern P is defined as any mapping Ω in Lower(P, G), such that
for any other Ω in Lower(P, G) it is the case that Ω Ω .
Claim
All the possible sets of certain answers to an RDF data exchange
setting are homomorfically equivalent.
27. Back in our previous example...
Solution 1
G2 = { (W, P2 ) G2 = {{?X → 1}}
(1 1 2)
}
Solution 2
G2 = {
(1 1 2)
(1 2 3)
(W, P2 ) G2 = {{?X → 1, ?Y → 2}}
}
The set of certain answers is now {{?X → 1}}
28. In conclusion...
Our contributions so far:
RDF and SPARQL TGDs
RDF Schema mappings
Universal solutions
Materialization of universal solutions
Certain answers
29. In conclusion...
To do:
Prove remaining claims
Query answering (using universal solutions)
Incomplete information in the source instance
Knowledge exchange over RDFs