3. You must also remember!
• Relation data languages are based on
relational algebra
• Relational algebra consist of a set of operators
on relations, which include:
– Selection
– Projection
– Union
– Cartesian product
4. Cartesian Product
• The Cartesian product of two relations R of
degree k1 and S of degree k2 is the set of
(k1+k2)-tuples, where each result tuple is a
concatenation of one tuple of R with one
tuple of S, for all tuples of R and S (R X S)
• Consider the relation EMP and PAY, EMPXPAY
is:
6. Joins
• Join is a derivative of Cartesian Product
• There are various forms of joins
– Join
• Inner join
– Theta join
– Equi-join
• Outer join
– Left join
– Right join
– Full join
– Semi join
7. Theta Join
• Consider the relation EMP, the theta-join of
relation EMP and ASG over the join predicate
EMP.ENO=ASG.ENO
9. Semi-Join
• The semi-join of relation R, defined over the
set of attributes A, by relation S, defined over
the set of attributes B, is the subset of the
tuples of R that participate in the join of R
with S
• The advantage of semi-join is that it decreases
the number of tuples that need to be handled
to form the join
10. Semi-Join
• In centralized database systems, this is
important because it usually results in a
decreased number of secondary storage
accesses by making better use of the memory.
• It is even more important in distributed
databases since it usually reduces the amount
of data that needs to be transmitted between
sites in order to evaluate a query.
11. Semi-Join
• To demonstrate the difference between join
and semi-join, lets consider the semi-join of
EMP with PAY over the predicate EMP.TITLE =
PAY.TITLE that is
13. Derived Horizontal Fragmentation
• A derived horizontal fragmentation is defined
on a member relation of a link according to a
selection operation specified on its owner
• It is important to remember two points
– First, the link between the owner and the member
relations is defined as an equi-join
– Second, an equi-join can be implemented by
means of semi-join
14. Derived Horizontal Fragmentation
• Accordingly, given a link L where owner(L) = S
and member(L) = R, the derived horizontal
fragments of R are defined as:
• Where w is the maximum number of
fragments that will be defined on R, and
S
where Fi is the formula according to which
the primary horizontal fragment Si is defined
15. Derived Horizontal Fragmentation
• To carry out a derived horizontal
fragmentation, three inputs are needed:
– The set of partitions of the owner relation (PAY1,
PAY2)
– The member relation
– The set of semi join predicates between the
owner and member (EMP.TITLE=PAY.TITLE)
17. Example
• Consider L1, where owner(L1) = PAY and
member (L1) = EMP
• We can group engineers into two groups
according to their salary: those making less
then or equal to $30,000, and those making
more then $30,000
• The two fragments EMP1 and EMP2 are
defined as:
19. Derived Horizontal Fragmentation
• One potential complication that need
attention
• In a database schema if there are two link into
a relation R, there could be more than one
possible derived horizontal fragmentation of R
• The choice of candidate fragmentation is
based on two criteria
– The fragmentation with better join characteristics
– The fragmentation used in more applications
20. The fragmentation used in more
Applications
• It is quite straight forward if we take into
consideration the frequency with which
application access some data
• The access of the heavy users can minimize
the total impact on system performance
21. The Fragmentation with better join
characteristics
• Consider the last example, the effect of this
fragmentation is that the join of the EMP and
PAY relations to answer the query is assisted
– By performing it on smaller relations
– By potentially performing joins in parallel
22. The Fragmentation with better join
characteristics
• The first point is obvious, the fragments of EMP
are smaller than EMP itself
• Therefore, it will be faster to join any fragment of
PAY with any fragment of EMP than to work with
the relations themselves
• The second point is however, more important and
is at the heart of distributed databases
• If, besides executing a number of queries at
different sites, we can parallelize execution of one
join query, the response time or throughput of
the system can be expected to improve
23. The Fragmentation with better join
characteristics
• In the case of joins, this is possible under certain
circumstances
• Consider the join graph between the fragments of EMP
and PAY, there is only one link coming in or going out of
a fragment
• Such a join graph is called a simple graph
• The advantage of a design where the join relationship
between fragments is simple is that the member and
owner link can be allocated to one site and the joins
between different pairs of fragments can proceed
independently and in parallel
25. The Fragmentation with better join
characteristics
• Unfortunately, obtaining simple join graphs may
not always be possible
• In that case the next desirable alternative is to
have a design that results in a partitioned join
graph
• A partitioned graph consist of two or more sub-
graphs with no links between them
• Fragments so obtained may not be distributed for
parallel execution as easily as those obtained via
simple join graphs, but the allocation is still
possible
26. The Fragmentation with better join
characteristics
• Let us continue with the distribution design of the database
we started before
• We already decided on the fragmentation of relation EMP
according to the fragmentation of PAY
• Lets now consider ASG, assume that there are two
applications
– The first application finds the names of engineers who work at
certain places, it turns on all three sites and accesses the
information about the engineer who work on local projects with
higher probability than those of projects at other locations
– At each administrative sites where employee records are
managed, users would like to access the responsibilities on the
projects that these employee work on and learn how they will
work on those projects
27. The Fragmentation with better join
characteristics
• The first application results in a fragmentation
of ASG according to the fragments PROJ1,
PROJ3, PROJ4 and PROJ6 of PROJ obtained
before
28. The Fragmentation with better join
characteristics
• Therefore, the derived fragmentation of ASG
according to {PROJ1, PROJ3, PROJ4, PROJ6} is
defined as:
• The fragment instances are:
29. The Fragmentation with better join
characteristics
• The second query can be specified in SQL as:
• Where i=1 or i=2, depending on the site where
the query is issued
• The derived fragmentation of ASG according
to the fragmentation of EMP is defined as:
31. The Fragmentation with better join
characteristics
• The example demonstrate two things:
– Derived fragmentation may follow a chain where
one relation is fragmented as a result of another
one’s design and it, in turn, causes the
fragmentation of another relation
(PAY->EMP->ASG)
– Typically, there will be more than one candidate
fragmentation for a relation (ASG), the final choice
of the fragmentation scheme may be a decision
problem addressed during allocation
32. Checking of Correctness
• We should now check the fragmentation
algorithms discussed so far with respect to
three correctness criteria
– Completeness
– Reconstruction
– Disjointness
33. Completeness
• The completeness of a primary horizontal
fragmentation is based on the selection
predicate used
• As long as the selection predicates are
complete, the resulting fragmentation is
guaranteed to be complete as well
34. Completeness
• The completeness of a derived horizontal
fragmentation is somewhat more difficult to define
• For example, there should be no ASG tuple which has
a project number that is not also contained in PROJ,
this rule is know as referential integrity
35. Reconstruction
• Reconstruction of a global relation from its
fragments is performed by the union operator
in both the primary and the derived horizontal
fragmentation
• Thus for a relation R with fragmentation
36. Disjointness
• It is easier to establish Disjointness of
fragmentation for primary than for derived
horizontal fragmentation
• In PHF Disjointness is guaranteed as long as
the minterm predicates determining the
fragmentation are mutually exclusive
37. Example
• In derived fragmentation, however, there is a
semi join involved that adds considerable
complexity
• Disjointness can be guaranteed if the join graph is
simple, otherwise it is necessary to investigate
actual tuple values
• In general we do not want a tuple of a member
relation to join with two or more tuples of the
owner relation when these tuples are in different
fragments of the owner
38. Example
• In fragmenting relation PAY, the minterm predicates M =
{m1, m2} where
m1: SAL<=30000
m2: SAL>30000
• Since m1 and m2 are mutually exclusive, the fragmentation
of PAY is disjoint
• For relation EMP, however we require that
– Each engineer has a single title
– Each title have a single salary value associated with it
• Since these two rules follow from the semantics of the
database, the fragmentation of EMP with respect to PAY is
also disjoint