The Chase in Database Theory

The Chase Algorithm in Database Theory
Jan Hidders
1 / 31

Outline
1 Introduction
2 A Recap of First-order Logic for Database Theory
3 Dependencies
4 The Chase Algorithm
5 Correctness
6 Termination
2 / 31

Introduction
Outline
1 Introduction
3 Dependencies
5 Correctness
6 Termination
3 / 31

Introduction
Motivation
The Chase is an algorithm developed for determining if a dependency
D follows from a certain given set of dependencies S.
The term dependency is used here as a particular type of implication
that is a generalisation of functional dependencies, multi-valued
dependencies, join dependencies and inclusion dependences.
Basic idea:
1 We start with D.
2 Using the dependencies in S we rewrite D as much as possible to a
strictly weaker dependency D that is equivalent to D on instances that
satisfy all dependencies in S.
We do this by looking in S for rules with a premise implied by the
premise of D but whose conclusion is not implied by the premise of D.
For such rules we add the conclusion of the rule to the premise of D.
Repeat this until we ﬁnd no more such rules in S.
3 We check if D is a tautology. If so, then D indeed follows from S,
otherwise it does not.
4 / 31

A Recap of First-order Logic for Database Theory
Outline
1 Introduction
3 Dependencies
5 Correctness
6 Termination
5 / 31

First-order Logic Formulas
We postulate two disjoint countably inﬁnite sets: the set of variable
names V and the set of constants C.
In the examples we assume C to contain the natural numbers.
We will consider formulas in ﬁrst-order logic with equality and
constants, but without function symbols.
Formulas: ∃x : ϕ, (ϕ ∧ ψ), ¬ϕ, P(s1, . . . , sn), (s1 = s2)
ϕ and ψ are formulas, x a variable in V, all si symbols from V ∪ C.
Short-hands:
∀x : ϕ ≡ ¬(∃x : ¬ϕ)
(ϕ ∨ ψ) ≡ ¬(¬ϕ ∧ ¬ψ)
(ϕ → ψ) ≡ ¬(ϕ ∧ ¬ψ)
Example (Formulas)
∀y : ∀z : ((P(1, y) ∧ P(y, z)) → P(1, z))
∀x : (x = 1 ∧ ∃y : ¬(x = y))
6 / 31

Models and Database Instances
We assume the domain, i.e., the set over which the quantifiers
quantify, to be equal to C.
The constants in formulas represent themselves, so the constant 5
represents the number 5.
Therefore the Unique Name Assumption holds, i.e., two distinct
constants always represent two distinct domain elements.
Given these assumptions we can define a model (or instance as we
will call them, since they correspond to database instances) simply as
a finite set of atoms, where an atom is defined as P(c1, . . . , cn) with
P a predicate name and c1, . . . , cn all constants.
Example (Models / Instances)
I1 = {P(1, 2), P(2, 3), Q(4)}
I2 = {P(), Q(1, 1)}
7 / 31

Domain Value Substitution
Applying a substitution f : V → C ∪ V, i.e., a partial function from
variables to constants and variables, to a formula ϕ, denoted as ϕf , is
deﬁned by:
1 (∃y : ψ)f
= (∃y : ψg
) where g = {(x, c) ∈ f | x = y}
2 (ϕ ∧ ψ)f
= ϕf
∧ ψf
3 (¬ϕ)f
= ¬(ϕf
)
4 P(s1, . . . , sn)f
= P(sf
1 , . . . , sf
n )
5 (s1 = s2)f
= (sf
1 = sf
2 )
where for symbols s ∈ V ∪ C we let sf =
f (s) if s ∈ dom(f )
s if s ∈ dom(f )
.
Example
(∃x : ∃y : P(x, y, z)){z→5} = (∃x : ∃y : P(x, y, 5))
(∃x : (∃y : P(x, y, z)) ∧ (Q(y, z))){y→5} = ∃x : (∃y : P(x, y, z)) ∧ (Q(5, z)))
8 / 31

Formula Semantics
The proposition that an instance I satisfies a formula ϕ, denoted as
I |= ϕ, is defined by the following rules:
1 I |= ∃y : ϕ iff there is a constant c ∈ C such that I |= ϕ{y→c}
2 I |= (ϕ ∧ ψ) iff I |= ϕ and I |= ψ
3 I |= ¬ϕ iff I |= ϕ (i.e. if not I |= ϕ)
4 I |= P(s1, . . . , sn) iff P(s1, . . . , sn) ∈ I
5 I |= (s1 = s2) iff s1 = s2
Example
Assume the instance I = {P(1, 2), P(1, 1), Q(1), Q(2)}, then it holds that
I |= ¬∃x : (P(1, x) ∧ ¬Q(x))
I |= ¬∃x : (Q(x) ∧ ¬∃y : P(y, x))
I |= ¬∃x : ¬(x = x)
but also that
I |= ¬∃x : (Q(x) ∧ ¬P(x, 1))
I |= ¬∃x : ¬Q(x)
9 / 31

Free Variables and Closed Formulas
The free variables of a formula ϕ, denoted as FV (ϕ), is deﬁned by:
1 FV (∃x : ϕ) = FV (ϕ) {x}
2 FV ((ϕ ∧ ψ)) = FV (ϕ) ∪ FV (ψ)
3 FV (¬ϕ) = FV (ϕ)
4 FV (P(s1, . . . , sn)) = {s1, . . . , sn} ∩ V
5 FV ((s1 = s2)) = {s1, s2} ∩ V
A formula ϕ is said to be closed if FV (ϕ) = ∅
Example (Free variables)
FV (∃x : P(x, y)) = {y}
FV (∃x : ∃y : P(x, y)) = ∅
Proposition
Formula ϕ is closed iﬀ ϕ{x→c} = ϕ for any x ∈ V and c ∈ C.
10 / 31

Satisfiability and Tautologies
Let ϕ be a closed formula:
We say that ϕ is satisfiable if there is an instance I such that I |= ϕ.
We say that ϕ is a tautology if for every instance I holds that I |= ϕ.
Example
∀x : (x = x) is a tautology.
∀x : ∀y : (P(x, y) → P(x, y) is a tautology.
∃x : ∃y : (P(x, y) ∧ ¬P(x, y) is unsatisfiable.
∀x : ∀y : (P(x, y) ∨ ¬P(x, y) is a tautology.
∀x : (P(x) → ∃y : Q(x, y)) is satisfiable, but not a tautology.
Proposition
A closed formula ϕ is unsatisfiable iff ¬ϕ is a tautology.
11 / 31

Notation
We will omit brackets if there is no semantic ambiguity:
((ϕ1 ∧ ϕ2) ∧ ϕ2) and (ϕ1 ∧ (ϕ2 ∧ ϕ2)) are written as (ϕ1 ∧ ϕ2 ∧ ϕ2)
((ϕ1 ∨ ϕ2) ∨ ϕ2) and (ϕ1 ∨ (ϕ2 ∨ ϕ2)) are written as (ϕ1 ∨ ϕ2 ∨ ϕ2)
The binary operators have precedence rules: ∧ ∨ →
The scope of a quantiﬁer extends as far as possible
Example
∀x : ∃y : P(y) ∧ Q(x) denotes ∀x : ∃y : (P(y) ∧ Q(x))
P(y) ∧ Q(x) → R(y) ∨ R(x) denotes (P(y) ∧ Q(x)) → (R(y) ∨ R(x))
Chains of similar quantiﬁers can be merged:
∀x1 : ∀x2 : . . . ∀xn : ϕ can be written as ∀x1, x2, . . . , xn : ϕ
∃x1 : ∃x2 : . . . ∃xn : ϕ can be written as ∃x1, x2, . . . , xn : ϕ
12 / 31

Dependencies
Outline
1 Introduction
3 Dependencies
5 Correctness
6 Termination
13 / 31

Dependencies
What is a Dependency?
A simple conjunction is a formula of the form ϕ1 ∧ . . . ∧ ϕn where
each ϕi is either of the form P(s1, . . . , sm) or (s1 = s2).
Example (Simple conjunction)
P(1, x, x, y) ∧ Q(z, v, 1) ∧ x = z
A dependency is a closed formula of the form ∀¯x : ϕ → ∃¯y : ψ with
¯x = x1, . . . , xn and ¯y = y1, . . . , nm and where (1) ϕ and ψ are simple
conjunctions, (2) no variable is in both ¯x and ¯y, (3) each variable in ¯x
appears in at least one atom in ϕ and (4) each variable in ¯y appears
in at least one atom in ψ.
Example (Dependencies)
∀x, y : P(x, 1, y) ∧ Q(x, x) → ∃z : R(x, y, z) ∧ x = y
∀x, y, z : P(x, y) ∧ P(x, z) → y = z
∀x, y, z, u, v, w : P(x, y, z) ∧ P(u, v, w) ∧ v = y → P(x, y, w) ∧ P(u, y, z)
∀x, y : P(x, y) → ∃z : Q(y, z)
14 / 31

Dependencies
Abbreviated Notation for Dependencies
We will write dependencies ∀¯x : ϕ → ∃¯y : ψ usually simply as ϕ ⇒ ψ.
Example (Dependencies of the previous example in abbreviated notation)
P(x, 1, y) ∧ Q(x, x) ⇒ R(x, y, z) ∧ x = y
P(x, y) ∧ P(x, z) ⇒ y = z
P(x, y, z) ∧ P(u, v, w) ∧ v = y ⇒ P(x, y, w) ∧ P(u, y, z)
P(x, y) ⇒ Q(y, z)
This notation is unambiguous, since for every dependency
∀¯x : ϕ → ∃¯y : ψ it holds that ¯x contains exactly all variables in
FV (ϕ) and ¯y contains exactly all variables in FV (ψ) FV (ϕ).
15 / 31

Dependencies
Connection with Database Normalisation Dependencies
These dependencies generalise classical dependencies.
Assume relations R(A, B, C, D) and S(E, F, G)
Functional dependencies: AB → C for R:
R(x, y, z, u) ∧ R(x, y, z , u ) ⇒ z = z
Multivalued dependencies: AB C for R:
R(x, y, z, u) ∧ R(x, y, z , u ) ⇒ R(x, y, z, u)
Join dependencies: ∗(AB, BC, CD) for R:
R(x, y, z1, u1) ∧ R(x2, y, z, u2) ∧ R(x3, y3, z, u) ⇒ R(x, y, z, u)
Inclusion dependencies / foreign keys: R[C, D] ⊆ S[E, F]
R(x, y, z, u) ⇒ S(z, u, v, w)
New types of dependencies, e.g., R[A] ∩ R[B] = ∅
R(x, x, z, u) ⇒ 1 = 2
16 / 31

Dependencies
Condensing Conjunctions and Dependencies
A simple conjunction ϕ / dependency ϕ ⇒ ψ can be condensed to an
equivalent formula by applying exhaustively the following rules:
1 if ϕ contains a = a with a a constant, remove it
2 If ϕ contains x = a or a = x with x a variable and a a constant,
remove it and replace all occurrences of x with a
3 If ϕ contains x = y with x and y variables, remove it and replace all
occurrences of x with y
Example (condensing)
P(x, 1, y) ∧ x = y ⇒ Q(y, z) condenses to P(y, 1, y) ⇒ Q(y, z).
P(x, y) ∧ y = 1 ⇒ Q(y) condenses to P(x, 1) ⇒ Q(1)
P(x, y) ∧ x = y ⇒ y = 1 condenses to P(y, y) ⇒ y = 1
Proposition
If a simple conjunction ϕ / dependency ϕ ⇒ ψ is condensed then ϕ
contains only equations of the form a = b with a and b distinct constants.
17 / 31

The Chase Algorithm
Outline
1 Introduction
3 Dependencies
5 Correctness
6 Termination
18 / 31

The Chase Algorithm
Satisfiability of Simple Conjunctions
A non-closed formula ϕ is said to be satisfiable if ∃x1, . . . , xn : ϕ is
satisfiable, where {x1, . . . , xn} = FV (ϕ).
Proposition
A condensed simple conjunction is satisfiable iff it does not contain an
equation.
Proof.
If: Take the atoms of the conjunction as the instance where each variable
is replaced with a distinct constant not in the formula. Since the
conjunction has no equations, this instance will satisfy it.
Only-if: Assume it does contain an equation. This equation will be of the
form a = b with a and b distinct constants. The formula will therefore not
be satisfied by any instance.
19 / 31

The Chase Algorithm
Embeddings into Instances
An embedding of a simple conjunction ϕ into an instance I is a
function f : FV (ϕ) → C such that
1 for every P(s1, . . . , sn) in ϕ there is a P(s1, . . . , sn)f
in I, and
2 for every (s1 = s2) in ϕ it holds that sf
1 = sf
2 .
Proposition
For every dependency ϕ ⇒ ψ it holds that I |= ϕ ⇒ ψ iﬀ for every
embedding f of ϕ into I there is an extension of f that embeds ψ into I.
20 / 31

The Chase Algorithm
Tautological Dependencies
An embedding of a simple conjunction ϕ into a simple conjunction ψ
is a function f : FV (ϕ) → (FV (ψ) ∪ C) such that
1 for every P(s1, . . . , sn) in ϕ there is a P(s1, . . . , sn)f
in ψ, and
2 for every (s1 = s2) in ϕ it holds that sf
1 = sf
2 .
Proposition
A condensed dependency ϕ ⇒ ψ is a tautology iff (1) ϕ is not satisfiable
or (2) there is an embedding of ψ into ϕ that maps the variables in ϕ to
themselves.
Proof.
If: If ϕ is not satisfiable then the implication holds a tautologyly. If there is an embedding h of
ϕ into an instance, and f is the embedding of ψ into ϕ, then h ◦ f is an embedding of ψ in the
instance.
Only-if: Let ϕ be satisfiable, so without equations, and assume there is no embedding of ψ into
ϕ that is the identity on the variables in ϕ. Then on a corresponding instance of ϕ the
dependency does not hold.
21 / 31

The Chase Algorithm
Corresponding Instances
For a simple conjunction ϕ, a corresponding instance consists of all
atoms in ϕ where distinct variables are replaced with distinct
constants not already used in ϕ.
Example
P(x, x, 1) ∧ P(x, y, 2) has corresponding instance {P(3, 3, 1), P(3, 4, 2)}
A simple conjunction ϕ is said to satisfy a dependency ϕ ⇒ ψ if
every embedding of ϕ into ϕ can be extended to an embedding of ψ
into ϕ.
Proposition
Let ϕ be a simple conjunction containing only atoms and I a
corresponding instance of ϕ. Then, for every dependency ϕ ⇒ ψ it holds
that ϕ satisfies ϕ ⇒ ψ iff I satisfies ϕ ⇒ ψ .
22 / 31

The Chase Algorithm
The Inference Problem
The problem we are aiming to solve:
A set of dependencies S is said to logically imply a dependency
ϕ ⇒ ψ if for every instance that satisﬁes all dependencies in S it
holds that it also satisﬁes ϕ ⇒ ψ .
Example
The question “for the relation R(A, B, C) does the join dependency
∗(AB, BC) follow from the functional dependency B → C?” can be
formulated as “is R(x, y, z1) ∧ R(x2, y, z) ⇒ R(x, y, z) logically implied by
R(x1, y, z1) ∧ R(x2, y, z2) ⇒ z1 = z2?”
23 / 31

The Chase Algorithm
The Core Intuition of the Chase Algorithm
Suppose we want to show that the (join) dependency
D1 = R(x, y, z ) ∧ R(x , y, z) ⇒ R(x, y, z) follows from the (functional)
dependency D = R(x, y, z) ∧ R(x , y, z ) ⇒ z = z .
Note that we can embed the premise of D into the premise of D1 via
the embedding f = {x → x, x → x , y → y, z → z , z → z}.
So if D1 applies to an instance, then so will D .
So we can construct a D2 by strengthening the premise of D1 with
the conclusion of D with f applied, i.e., (z = z )f , which leads to
D2 = R(x, y, z ) ∧ R(x , y, z) ∧ z = z ⇒ R(x, y, z).
On instances that satisfy D the dependencies D1 and D2 are
equivalent, i.e., are either both satisﬁed or both not satisiﬁed.
If we condense D2 we get D2 = R(x, y, z) ∧ R(x , y, z) ⇒ R(x, y, z).
Since D2 is a tautology (the conclusion is part of the premise) it
follows that D1 holds on all instances that satisfy D , and so logically
follows.
24 / 31

The Chase Algorithm
The Chase Algorithm
Input: a set S of dependencies and a dependency ϕ ⇒ ψ
Output: true if ϕ ⇒ ψ is logically implied by S, and false otherwise.
Procedure:
1 Condense ϕ ⇒ ψ
2 While (1) ϕ ⇒ ψ is not a tautology and (2) there is a dependency
ϕ ⇒ ψ in S that is not satisﬁed by ϕ, do:
1 let f be an embedding of ϕ into ϕ that cannot be extended to an
embedding of ψ into ϕ
2 let g be an extension of f that maps each variable in ψ that does not
occur in ϕ to a distinct variable not in ϕ or ψ
3 let ϕ become ϕ ∧ (ψ )g
4 condense ϕ ⇒ ψ
3 If ϕ ⇒ ψ is a tautology, return true, else false.
25 / 31

The Chase Algorithm
Chase Example
Example
Assume for relation R(A, B, C, D) we have a dependency set S = {D , D } with D the
functional dependency B → A and D the multivalued dependency C D, and a join
dependency D = ∗(AB, BC, CD).
In our notation:
D = R(x1, y, z1, u1) ∧ R(x2, y, z2, u2) ⇒ x1 = x2
D = R(x1, y1, z, u1) ∧ R(x2, y2, z, u2) ⇒ R(x1, y1, z, u2)
D = R(x, y, z1, u1) ∧ R(x2, y, z, u2) ∧ R(x3, y3, z, u) ⇒ R(x, y, z, u)
We now will chase D with D and D :
1 Initialise: R(x, y, z1, u1) ∧ R(x2, y, z, u2) ∧ R(x3, y3, z, u) ⇒ R(x, y, z, u)
2 Apply D : R(x, y, z1, u1) ∧ R(x2, y, z, u2) ∧ R(x3, y3, z, u) ∧ x = x2 ⇒ R(x, y, z, u)
Condense: R(x2, y, z1, u1) ∧ R(x2, y, z, u2) ∧ R(x3, y3, z, u) ⇒ R(x2, y, z, u)
3 Apply D : R(x2, y, z1, u1) ∧ R(x2, y, z, u2) ∧ R(x3, y3, z, u) ∧ R(x2, y, z, u) ⇒ R(x2, y, z, u)
Condense: nothing changes
4 The dependency has become a tautology, so we stop and conclude that D logically follows
from S.
26 / 31

Correctness
Outline
1 Introduction
3 Dependencies
5 Correctness
6 Termination
27 / 31

Correctness
Correctness of the Chase
Theorem
If the Chase ends for S and ϕ ⇒ ψ, it returns true iff S implies ϕ ⇒ ψ.
Proof Sketch.
If: It can be shown that after every iteration the Chase computes a dependency that is
equivalent to the previous dependency on instances that satisfy the dependencies in S, i.e., it is
satisfied by such an instances iff the previous dependency is. Therefore, if the result is a
tautology, i.e., satisfied by all instances, then the input dependency ϕ ⇒ ψ holds for all
instances that satisfy the dependencies in S.
Only-if: Assume that the final ϕ ⇒ ψ is not a tautology. Then ϕ is satisfiable, and therefore
contains no equations. It follows that a corresponding instance of ϕ (1) satisfies all
dependencies in S (or the chase would not have terminated) but (2) does not satisfy ϕ ⇒ ψ
(otherwise ϕ would satisfy ϕ ⇒ ψ, and so the embedding of ϕ into itself can be extended to an
embedding of ψ into ϕ that maps all variable in ϕ to themselves, and so there would be such an
embedding of ψ into ϕ, but that would make ϕ ⇒ ψ a tautology). Since the final ϕ ⇒ ψ is
equivalent to the input ϕ ⇒ ψ on instances that satisfy all dependencies in S, it follows that
this instance does not satisfy the input ϕ ⇒ ψ.
28 / 31

Termination
Outline
1 Introduction
3 Dependencies
5 Correctness
6 Termination
29 / 31

Termination
When does the Chase terminate?
In general, the Chase does not always terminate
Example (Non-termination of the Chase)
If we use S = {R(x, y) ⇒ R(x, z) ∧ R(z, y)} to chase R(x, y) ⇒ x = y, the
chase will not terminate.
There are, however, interesting suﬃcient conditions for termination.
We call a dependency ϕ ⇒ ψ full if all variables in ψ appear in ϕ.
Proposition
If S is a set of full dependencies then the Chase with S terminates on any
dependency.
This covers all classical dependencies from normalisation theory such
as functional dependencies, multivalued dependencies and join
dependencies, but not inclusion dependencies.
30 / 31

Termination
Acyclic Sets of Dependencies
Another way of guaranteeing terminating is to consider how they can
trigger the creation of new variables during the chase.
Given a set S of dependencies we define the variable creation graph of
S as the graph (V , E) where V contains all the predicate names and
E contains an edge (P, Q) iff there is in S a non-full dependency
ϕ ⇒ ψ where ϕ mentions P and ψ mentions Q.
Proposition
If the variable creation graph of S is acyclic then the Chase with S
terminates on any dependency.
This covers data integration approaches where the relationships
between the original datasets and the integrated datasets are specified
by non-full dependencies that go only in one direction.
31 / 31

The Chase in Database Theory

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie The Chase in Database Theory

Ähnlich wie The Chase in Database Theory (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Chase in Database Theory