Orsi Vldb11

Optimizing Query Answering
under
Ontological Constraints

Giorgio Orsi1,2 and Andreas Pieris2
1
Institute for the Future of Computing
Oxford Martin School
University of Oxford
2
Department of Computer Science
University of Oxford

VLDB 2011

Ontological Databases

Ontological Reasoning DB Constraints

Ontological DB



Ontological DB

D
ABox D

 D

TBox



Ontological DB

D
ABox D

 D

TBox Q(X)  9Y (X,Y)



Ontological DB

D
ABox D
, { t | D [  ² 9u (t,u) }

 D

TBox Q(X)  9Y (X,Y)

Ontological Constraints (examples)

Concept Inclusions: 8X emp(X)  person(X)

(Inverse) Relation Inclusion: 8X8Y manages(X,Y)  isManaged(Y,X)

Relation Transitivity: 8X8Y8Z mgs(X,Y),mgs(Y,Z)  mgs(X,Z)

Participation: 8X emp(X)  9Y report(X,Y)

Disjointness: 8X emp(X), customer(X)  ?

Functionality: 8X8Y8Z reports(X,Y),reports(X,Z)  Y = Z

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:
- 9-variables ! TGDs 8X8Y (X,Y)  9Z (X,Z)
- Equality atoms ! EGDs 8X (X)  Xi=Xj Datalog+
- Constant false (?) ! NCs 8X (X)  ?



¡ But, query answering under Datalog+ is undecidable




¡ Datalog+ is syntactically restricted ! Datalog§




¡ Datalog+ is syntactically restricted ! Datalog§

¡ TGDs more expressive than inclusion dependencies

8D8P8A runs(D,P),area(P,A)  9E employee(E,D,A)

The Chase Procedure

Input: Database D, set of TGDs 

Output: A model of D [ 

D
person(john)


8X person(X)  9Y father(Y,X) 8X8Y father(X,Y)  person(X)

chase(D,) = D [ ?

The Chase Procedure



D
person(john)



chase(D,) = D [ {father(z1,john)

The Chase Procedure



D
person(john)



chase(D,) = D [ {father(z1,john), person(z1)

The Chase Procedure



D
person(john)



chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1)

The Chase Procedure



D
person(john)



chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}

Query Answering via Chase
Q h

C = chase(D,)

D

h1 h2

h2(C)
h1(C) . . .
M1
M2

D[²Q , chase(D,) ² Q
[see, e.g., Deutsch, Nash & Remmel, PODS 08]

Query Answering via Chase
Q h

C = chase(D,)

D

h1 h2

h2(C)
h1(C) . . .
M1
M2

Possibly Infinite! 

Query answering via rewriting

Q 


Q 

compilation
Q


Q 

Q compilation
Q
evaluation
D

Linear TGDs

8X8Y r(X,Y)  9Z (X,Z)

single body atom

¡ Properly generalize inclusion dependencies.

¡ Enjoy the bounded-derivation depth property.

¡ FO-rewritable  query answering in AC0 (data complexity).

Bounded derivation depth property (BDDP)

D

Q


D

Q
k levels

constant
w.r.t. D

chase(D,) ² Q ) chasek(D,) ² Q


D

Q
k levels

constant
w.r.t. D

BDDP ) First-Order Rewritability

FO-rewritability: Example [Gottlob et Al., ICDE 11]

 promoter(X)  Y promotesTo(X,Y)
promotesTo(X,Y)  customer(Y)

Q q  promotesTo(A,B), customer(B)

Q

q  promotesTo(A,B), customer(B) (original query)




Q

q  promotesTo(A,B), customer(B) {Y=B}

q  promotesTo(A,B), customer(V0,B) ( V0 is fresh )




Q

q  promotesTo(A,B), customer(B)
factorization
q  promotesTo(A,B), promotesTo(V0,B) ans(A)  promotesTo(A,B)
{ A = V0 }




Q


q  promotesTo(A,B) {X = A, Y = B}
q  promoter(A)




Q


q  promotesTo(A,B) UCQ rewriting
(first-order)
q  promoter(A)

FO-rewritability

¡ Desirable properties of a FO-rewriting:
 independent on the DB
 executable by any DBMS
 easy to compute (e.g., polynomial time)
 small size (e.g., polynomial size)

FO-rewritability

¡ Desirable properties of a FO-rewriting:
 independent on the DB
 easy to compute (e.g., polynomial time)
 small size (e.g., polynomial size)

¡ Unions of Conjunctive Queries (UCQs)
Calvanese et Al, JAR 07
Perez Urbina et Al, JAL 09
 DB independent Cali’ et Al, PODS 09
 easy to optimize and distribute Gottlob et Al, ICDE 11
and others…
 worst-case exponential size in Q and 

FO-rewritability

¡ Combined and hybrid FO-rewriting
 good computational properties Kontchakov et Al., KR 10
Gottlob and Schwentick, DL 11
(e.g., polynomial in size)
 requires access to the DB

FO-rewritability

¡ Combined and hybrid FO-rewriting
 good computational properties Kontchakov et Al., KR 10
Gottlob and Schwentick, DL 11
(e.g., polynomial in size)
 requires access to the DB

¡ Purely intensional Datalog rewriting
Perez Urbina et Al, JAL 09
 very compressed representation Rosati and Almatelli., KR 10
 purely intensional Orsi and Pieris., VLDB 11 
 requires view-creation or Datalog engine

Datalog rewriting: keep it first-order!

¡ A Datalog query is (in general) not a first-order query
 a non-recursive Datalog query is a first-order query
 a bounded Datalog query is a first-order query
 an output-predicate-bounded Datalog query is a first-order query

Datalog rewriting: keep it first-order!

¡ A Datalog query is (in general) not a first-order query
 a non-recursive Datalog query is a first-order query
 a bounded Datalog query is a first-order query
 an output-predicate-bounded Datalog query is a first-order query

¡ Input:
 a (w.l.o.g. boolean) conjunctive query Q = <q,ρ>

Q : q(X)  p(X), s(X,Y)  <q, q(X) p(X),s(X,Y) >

 a set of linear TGDs 

¡ Output:
 a bounded Datalog query Q = <q,π >

Predicate-bounded Datalog


¡ bounded Datalog program
r(X,Y), t(Y) → t(X)
 all IDB predicates are bounded.
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X) ¡ p-bounded Datalog program
 the predicate p is bounded.



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)

t is unbounded  t(X) cannot be computed using
a DB-independent FO-query



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)


t(X)  r(X,Y), t(Y)



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)


t(X)  r(X,Y), t(Y) v
t(X)  r(X,Y), r(Y,Z), t(Z)



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)


t(X)  r(X,Y), t(Y) v
t(X)  r(X,Y), r(Y,Z), t(Z) v
t(X)  r(X,Y), r(Y,Z), r(Z,W), t(W)



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)


t(X)  r(X,Y), t(Y) v
t(X)  r(X,Y), r(Y,Z), t(Z) v
t(X)  r(X,Y), r(Y,Z), r(Z,W), t(W) v
…



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)

q is bounded  q(X) can be computed using



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)


q(X)  r(X,Y)



r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)


q(X)  r(X,Y) v
q(X)  r(Y,X)

Output-predicate-bounded Datalog and BDDP

OPB
DATALOG
PROGRAM


OPB
DATALOG
PROGRAM

INPUT BDDP OUTPUT
RULES RULES RULES


OPB
DATALOG
PROGRAM

INPUT BDDP OUTPUT
RULES RULES RULES

enjoy
non recursive BDDP non recursive
property

Datalog Rewriting: Skolemization (and renaming)


r(X,Y)  Z s(Y,Z)
s(X,Y)  Z p(Y,Y,Z)
p(X,Y,Z)  t(Z)


 f
r(X,Y)  Z s(Y,Z) r(X1,Y1)  s(Y1,f1(Y1))
s(X,Y)  Z p(Y,Y,Z) s(X2,Y2)  p(Y2,Y2,f2(Y2))
p(X,Y,Z)  t(Z) p(X3,Y3,Z3)  t(Z3)


 f
r(X,Y)  Z s(Y,Z) r(X1,Y1)  s(Y1,f1(Y1))
s(X,Y)  Z p(Y,Y,Z) s(X2,Y2)  p(Y2,Y2,f2(Y2))
p(X,Y,Z)  t(Z) p(X3,Y3,Z3)  t(Z3)

¡ f and  are equisatisfiable (not equivalent)

¡ Introduce one Skolem function for each existential variable

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f
 at least one of the rules contains Skolem terms

f
δ1 : r (X1,Y1)  s(Y1,f1(Y1))
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f
 at least one of the rules contains Skolem terms

f [f]
δ1 : r (X1,Y1)  s(Y1,f1(Y1)) …
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) r(X1,Y1)  p(f1(Y1) ,f1(Y1), f2(f1(Y1)))
δ3 : p(X3,Y3,Z3)  t(Z3) …

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations.



δ1 : r (X1,Y1)  s(Y1,f1(Y1))
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)



δ1 : r (X1,Y1)  s(Y1,f1(Y1))
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)

¡ [f] depends only on .

¡ [f] is possibly infinite
linear TGDs have BDDP: suffices to construct it up to k steps [f]k.

Datalog Rewriting: Query Saturation

¡ resolve [f] with the query Q.
 use only rules with Skolem terms.


¡ resolve [f] with the query Q.
 use only rules with Skolem terms.

[f] … Q
Q  s(A,B), p(B,B,C)
δ1 : r (X1,Y1)  s(Y1,f1(Y1))
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)
…
[δ12]] : r (X1,Y1)  p(f1(Y1) ,f1(Y1), f2(f1(Y1)))
…

…
[Q,f] Q  r(X1,Y1), p(f1(Y1), f1(Y1),C)
…


¡ bypasses chase derivations with function symbols

Q
Q  s(A,B), p(B,B,C)

[f] …
δ1 : r (X1,Y1)  s(Y1,f1(Y1))
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)
…
[δ12]] : r (X1,Y1)  p(f1(Y1) ,f1(Y1), f2(f1(Y1)))
…

Datalog Rewriting: Finalization

¡ keep only the function-free rules from [f] [ [Q,f]

¡ derivations producing certain answers are captured by function-
symbol-free rules.

¡ (optional) add auxiliary rules to make the rewriting a proper Datalog
query  i.e., clear distinction IDB/EDB.

OPB-Datalog and BDDP revisited

OPB
DATALOG
PROGRAM

INPUT BDDP OUTPUT
RULES RULES RULES

aux ff([f]) ff([Q,f])

enjoy
non recursive BDDP non recursive
property

Optimizations: Pruning

¡ use the predicate graph to reduce the number of rules in f

f Q
δ1 : r (X1,Y1)  s(Y1,f1(Y1)) Q  s(A,B), p(B,B,C)
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)



f Q
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)

t p

r s Q



f Q
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3)  t(Z3)

t p
¡ we are no longer
independent on Q!
r s Q

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation

f Q
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))



f Q
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))

s(A,B) ² p(B,B,C) atom coverage



f Q
δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))

s(A,B) ² p(B,B,C) atom coverage

Q  s(A,B), p(B,B,C) ≡ Q  s(A,B)


¡ atom coverage under linear TGDs can be checked in polynomial time
 see paper.

¡ unique elimination strategy (w.r.t. the number of eliminated atoms)
 see paper.

¡ given m = |body(ρ)| and n = ||
 worst-case size of [f] [ [Q,f] is O((n∙m)m)
 worst-case size of Q = <q,π > is O(n+m)m

Discussion

¡ Datalog rewriting is substantially more compact than UCQ rewriting.

¡ Does Datalog improve the end-to-end performance?

¡ Extend the procedure to larger classes of TGDs
guarded TGDs [Cali’ et Al, PODS 09]  non FO-rewritable
sticky-join TGDs [Cali’ et Al, VLDB 10]

Thank you!

The Datalog Family

Georg Thomas Andreas Andrea Giorgio
Gottlob Lukasiewicz Pieris Calì Orsi

Orsi Vldb11

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (7)

Similar to Orsi Vldb11

Similar to Orsi Vldb11 (20)

More from Giorgio Orsi

More from Giorgio Orsi (20)

Orsi Vldb11