Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Â
Bounds for overlapping interval join on MapReduce
1. Bounds for Overlapping
Interval Join on MapReduce
Foto N. Afrati1, Shlomi Dolev2,
Shantanu Sharma2, and Jeffrey D. Ullman3
1 National Technical University of Athens, Greece
2 Ben-Gurion University of the Negev, Israel
3 Stanford University, USA
2nd Algorithms and Systems for MapReduce and Beyond (BeyondMR)
Brussels, Belgium (27 March 2015)
2. Outline
⢠Introduction
⢠Goal of Mapping Schema and Our Contribution
⢠Unit-Length and Equally-Spaced Intervals
⢠Variable-Length and Equally-Spaced Intervals
⢠Conclusion
2
3. Outline
⢠Introduction
â Interval and Overlapping Intervals
â Interval Join
â Reducer capacity and Mapping Schema
⢠Goal of Mapping Schema and Our Contribution
⢠Unit-Length and Equally-Spaced Intervals
⢠Variable-Length and Equally-Spaced Intervals
⢠Conclusion
3
4. ⢠Interval
â A pair [starting time , ending time]
â A (time) interval, i, is represented by a pair of times
[Ts
i
, Te
i
], Ts
i
< Te
i
, where Ts
i
and Te
i
show the starting-
point and the ending-point of the interval i, respectively
â Example:
⢠My talk,
⢠a phase of a project, a class of a professor
Introduction
4
Ts
i
= 10am
Talk
Te
i
= 10:30am
5. ⢠Overlapping Intervals
â Two intervals, say interval i and interval j are called
overlapping intervals if the intersection of both the
interval is nonempty
Introduction
5Non-overlapping intervalsOverlapping intervals
i
j
Overlapping intervals
Talk
Coffee break
10am 10:35am
10:30am 11am
6. Introduction
6
EmpID Name Duration
đ1 U 1-Apr â1-June
đ2 V 1-May â1-July
đ3 W 1-Apr â1-July
đ4 X 1-Mar â1-June
đ5 Y 1-Mar â1-Aug
Phase Duration
Requirement Analysis (RA) 1-Mar â 1-May
Design (D) 1-Apr â 1-June
Coding (C) 1-May â1-Aug
1-Mar 1-Apr 1-May 1-June 1-July 1-Aug
Project Employee
Project
Employee
RA
D
C
đ1
đ2
đ3
đ4
đ5
⢠Overlapping Interval Join: an example
Find all the employee that are
involved in RA phase of the
project
7. ⢠Reducer capacity
â An upper bound on the total number of intervals
that are assigned to the reducer
â Example
⢠Reducer capacity to be the size of the main memory of
the processors on which reducers run
⢠Communication cost
â Total amount of data to be transferred from the map
phase to reduce phase
â Tradeoff between the reducer capacity and communication
cost
Introduction
7
8. Introduction
Mapping schema for interval join
An assignment of the set of intervals to some given
reducers, such that
â Respect the reducer capacity
⢠The total number of intervals assigned to a reducer must be
less than or equal to the reducer capacity
â Assignment of inputs
⢠For every output, it is required to assign every two
corrosponding overlapping corrossponding intervals to at least
one reducer in common
8Reducer
I1 I2 I3
Reducer Reducer Reducer
I1 I2 I3I1 I2 I3
9. State-of-the-Art
⢠B. Chawda, H. Gupta, S. Negi, T.A. Faruquie, L.V.
Subramaniam, and M.K. Mohania, âProcessing Interval
Joins On Map-Reduce,â EDBT, 2014.
⢠MapReduce-based 2-way and multiway interval join
algorithms of overlapping intervals
⢠Not regarding the reducer capacity
⢠No analysis of a lower bound on replication of
individual intervals
⢠No analysis of the replication rate of the algorithms
offered therein
9
10. Outline
⢠Introduction
⢠Goal of Mapping Schema and Our Contribution
⢠Unit-Length and Equally-Spaced Intervals
⢠Variable-Length and Equally-Spaced Intervals
⢠Conclusion
10
11. ⢠Interval join problem
â Assign all the intervals that share at least one
common point of time to at least one reduce in
common for finding outputs
Goal of Mapping Schema
11
12. ⢠An algorithm for variable-length intervals that
can start at any time
â Before this, we consider two simple cases of
⢠Unit-length and equally-spaced intervals and provide
algorithm
⢠Variable-length and equally-spaced intervals and
provide algorithm
⢠All the algorithms achieve almost matching upper
bound on the replication rate to the lower bound
Our Contribution
12
13. Outline
⢠Introduction
⢠Goal of Mapping Schema and Our Contribution
⢠Unit-Length and Equally-Spaced Intervals
⢠Variable-Length and Equally-Spaced Intervals
⢠Conclusion
13
14. ⢠Relations X and Y of n intervals
⢠All intervals do not have beginning beyond k and
before 0
⢠Hence, spacing between starting points of two
successive intervals =
k
n
< 1
Unit-Length and Equally-Spaced
Intervals
14
0 .25 .50 .75 1 1.25 1.5 1.75 2 2.25
X
Y
n = 9 and k = 2.25, so spacing = 0.25
15. ⢠Divide the time-range from 0 to k into
equal-sized partitions of length w (say P
partitions are created)
⢠Arrange P reducers
⢠Assign all intervals of X that exist in a
partition pi to ith reducer
⢠Assign all intervals of Y that have their
starting or ending-point in partition pi to
ith reducer
Unit-Length and Equally-
Spaced Intervals-Algorithm
0 .25 .50 .75 1 1.25 1.5 1.75 2 2.25
X
Y
n = 9 and k = 2.25
1 partition 2 partition 3 partition
5 partition4 partition
16. ⢠Does the algorithm work?
⢠Consider q =
3wn
k
+
n
k
+ 2
⢠q: the reducer capacity
⢠w: length of a partition
⢠n: the total number of intervals in a relation
⢠k: the last starting point of an interval
⢠Count how many intervals lie in a partition, if
they are less than or equal to q then we have
a solution and the algorithm works.
Unit-Length and Equally-Spaced
Intervals
16
17. ⢠Does the algorithm work?
â Count 1: How many intervals of Y overlap with an
interval X in a partition of length w?
⢠Spacing is n/k, so at most 2wn/k intervals of Y can
overlap with an interval of X
â Count 2: How many intervals can have starting
points after starting of xi and starting points
before ending of xi.
⢠Intervals of X after starting point of xi = wn/k
⢠Intervals of X before starting point of xi = n/k
â Count 3: Do not forget to count xi itself and an
identical interval of Y i.e. yi.
Unit-Length and Equally-Spaced
Intervals
17
0 .25 .50 .75 1 1.25 1.5 1.75 2 2.25
X
Y
n = 9 and k = 2.25
1 partition 2 partition 3 partition
5 partition4 partition
18. ⢠Does the algorithm work?
â Total number of intervals in a partition
â Count 1 + Count 2 + Count 3 =
2wn
k
+
wn
k
+
n
k
+ 2
= q
â OK. The algorithm works
Unit-Length and Equally-Spaced
Intervals
18
19. Outline
⢠Introduction
⢠Goal of Mapping Schema and Our Contribution
⢠Unit-Length and Equally-Spaced Intervals
⢠Variable-Length and Equally-Spaced Intervals
⢠Conclusion
19
20. ⢠Two types of intervals
â Big and small intervals
â Different length intervals
Variable-Length and Equally-
Spaced Intervals
20
21. ⢠Big and small intervals
â All the intervals of X are of length lmin
â All the intervals of Y are of length lmax
â The previous algorithm will work here too
â Note that an interval of X will be replicated to
several reducers, while an interval of Y will be
replicated to at most two reducers
Variable-Length and Equally-
Spaced Intervals
21
0 .7 1.4 2.1 2.8 3.5 4.2
X
Y
n = 6 and
spacing = 0.7
22. ⢠Variable-length intervals: A general case
â All the restriction regarding length of an interval
and spacing between two interval is removed
â Intervals can begin at some time greater than or
equal to 0 and end by time T
â S: the total length of intervals in one relation
Variable-Length and Equally-
Spaced Intervals
22
0 s s+1 s+2 s+3 T
X
Y
23. ⢠Variable-length intervals: A general case
â Algorithm
⢠Divide the time range into
T
w
equal sized partitions
⢠Arrange
T
w
reducers
⢠Follow the same procedure as in the previous algorithm
â i.e., assign all the intervals of X that belong to ith partition to ith
reducers and assign all the intervals of Y to reducers corresponding
to their starting and ending points (only to at most two reducers)
Variable-Length and Equally-
Spaced Intervals
23
0 s s+1 s+2 s+3 T
X
Y
24. ⢠Variable-length intervals: A general case
â Does the algorithm work?
â Consider q =
3nw + S
T
â Count the average number of intervals of X and Y sent to a
reducer; if they are less than or equal to the reducer
capacity, then the algorithm will work
Variable-Length and Equally-
Spaced Intervals
24
25. ⢠Variable-length intervals: A general case
â Count 1: Average number of intervals of Y
received by a reducer
â˘
ReplicationâTotal number of inputs
total number of reducer
â An interval of Y is sent to at most to 2 reducers
(Replication)
â There are
T
w reducers and n intervals in Y
⢠Average number of intervals of Y received by a
reducer =
2ân
T/w
Variable-Length and Equally-
Spaced Intervals
25
26. ⢠Variable-length intervals: A general case
â Count 2: Average number of intervals of X
received by a reducer
â˘
ReplicationâTotal number of inputs
total number of reducer
â Average length of intervals is S/n
â An interval of X is sent to at most to 1 + S/nw reducers
â There are
T
w
reducers and n intervals in X
⢠Average number of intervals of X received by a
reducer =
(1+S/nW)ân
T/w
Variable-Length and Equally-
Spaced Intervals
26
Average
length/how much
length a reducer
can hold
27. ⢠Variable-length intervals: A general case
â Does the algorithm work?
â Total number of intervals that a reducer receive
= Count 1+ Count 2
2nw
T
+
(1+S/nW)wn
T
=
3wn+S
T
= q
The algorithm works
Variable-Length and Equally-
Spaced Intervals
27
28. Outline
⢠Introduction
⢠Problem Statement and Our Contribution
⢠Unit-Length and Equally-Spaced Intervals
⢠Variable-Length and Equally-Spaced Intervals
⢠Conclusion
28
29. Conclusion
⢠An investigation for good MapReduce algorithms for
the problem of finding pairs of overlapping intervals
⢠Algorithms for:
â Unit-sized and equally-spaced intervals
⢠Lower bounds on the replication rate = 2 or 2q
n
k
⢠Upper bounds on the replication rate =
3
qTâS
S
2
â Big-small and equally-spaced intervals
⢠Lower bounds on the replication rate = 2 or 2q
lmin
s
⢠Upper bounds on the replication rate =
3
qTâS
S
2
â A general case for variable length intervals
⢠Upper bounds on the replication rate =
3
qTâS
S
2
29Proofs of lower and upper bounds on the replication rate are given in the paper
30. Foto Afrati1, Shlomi Dolev2, Shantanu Sharma2, and
Jeffrey D. Ullman3
1 School of Electrical and Computing Engineering, National Technical
University of Athens, Greece
afrati@softlab.ece.ntua.gr
2 Department of Computer Science, Ben-Gurion University of the
Negev, Israel
{dolev,sharmas}@cs.bgu.ac.il
3 Department of Computer Science, Stanford University, USA
ullman@cs.stanford.edu
Presentation is available at
http://www.cs.bgu.ac.il/~sharmas/publication.html