These are the slides for the paper "Performance Modeling of Stream Joins" presented at the international ACM conference on Distributed Event-Based Systems (DEBS)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
Performance Modeling of Stream Joins
1. Performance Modeling of
Stream Joins
Vincenzo Gulisano1, Alessandro V. Papadopoulos2, Yiannis Nikolakopoulos1,
Marina Papatriantafilou1, Philippas Tsigas1
1 2
2. Agenda – Performance modeling of stream joins
• Performance modeling of stream joins
• Performance modeling of stream joins
• The model
• Evaluation
• Conclusions
V. Gulisano Performance Modeling of Stream Joins 2
3. Streaming applications:
OP
OP
OP
Static/dynamic decisions:
- deployment
- load balancing
- elasticity
- load shedding
f(operators’ cost / throughput / latency ...)
measure (live)
take decisions
model
V. Gulisano Performance Modeling of Stream Joins 3
4. Agenda – Performance modeling of stream joins
• Performance modeling of stream joins
• Performance modeling of stream joins
• The model
• Evaluation
• Conclusions
V. Gulisano Performance Modeling of Stream Joins 4
5. V. Gulisano 5
Out
WR
WS
Time-based windows
- FIXED interval of time
- VARIABLE number of tuples
Tuple-based windows
- VARIABLE interval of time
- FIXED number of tuples
Stream joins
R
S
<ts,A1,...,An>
Performance Modeling of Stream Joins
6. V. Gulisano 6
R
S
WR
WS
Deterministic execution
- results do not depend on the
interleaving of R and S tuples
Process ready tuples in timestamp order
A tuple is ready if its timestamp is less
than or equal to the minimum of the
last tuple’s timestamp from R and S
Stream joins – deterministic execution
READY
Performance Modeling of Stream Joins
7. V. Gulisano 7
Stream joins – multiple physical streams
|R|
|S|
WR
WS
READY
Performance Modeling of Stream Joins
8. V. Gulisano 8
Stream joins – parallel execution
Out
READY
n threads
WR
WS
thread
WR
WS
thread
...
READY
~ 1/n of comparisons, 1/n of outputs
Performance Modeling of Stream Joins
9. Agenda – Performance modeling of stream joins
• Performance modeling of stream joins
• Performance modeling of stream joins
• The model
• Evaluation
• Conclusions
V. Gulisano Performance Modeling of Stream Joins 9
10. Modeling goal
V. Gulisano Performance Modeling of Stream Joins 10
Stream
join
Characteristics of
the input streams
Configuration
Throughput yi
Latency li
Out
R
S
Latency
Time difference
over time interval i
11. Scope of the presentation
• The presentation covers only a portion of the whole model in the
paper
• Equations can be found in the paper
• Presentation covers and discusses main dependencies between
• Input characteristics <-> Throughput / Latency
• Join configuration <-> Throughput / Latency
V. Gulisano Performance Modeling of Stream Joins 11
12. Step-by-step model
1. Non-deterministic stream join
2. Deterministic stream join with multiple physical streams
3. Deterministic parallel stream join with multiple physical
streams
V. Gulisano Performance Modeling of Stream Joins 12
13. V. Gulisano 13
R
S
WR
WS
Out
Time-based Tuple-based
y
ljoin
∝ ri, si
∝ ωR
i, ωS
i
Non-deterministic stream join
ri
si
ωRωi
ωSωi
∝ ωR
i, ωS
i
∝ ri, si
∝ ri, si
Performance Modeling of Stream Joins
Dependencies
14. Deterministic stream join with multiple physical streams
V. Gulisano Performance Modeling of Stream Joins 14
|R|
|S|
WR
WS
thread
READY
15. V. Gulisano Performance Modeling of Stream Joins 15
1
2
3
Deterministic stream join with multiple physical streams
latency overhead for results produced by 1
16. V. Gulisano Performance Modeling of Stream Joins 16
2
3
4
Deterministic stream join with multiple physical streams
latency overhead for results produced by 2
17. V. Gulisano Performance Modeling of Stream Joins 17
3
4
5
Deterministic stream join with multiple physical streams
latency overhead for results produced by 3
18. V. Gulisano 18Performance Modeling of Stream Joins
Time-based Tuple-based
y
ljoin
lin
∝ ri, si
∝ ωR
i, ωS
i
∝ ωR
i, ωS
i
∝ ri, si
∝ ri, si
∝ 1/ri, 1/si
∝ |R|, |S|
|R|
|S|
WR
WS
thread
READY
Deterministic stream join with multiple physical streams
Dependencies
19. V. Gulisano Performance Modeling of Stream Joins 19
Time-based Tuple-based
y
ljoin
lin
lout
∝ ri, si
∝ ωR
i, ωS
i
∝ ωR
i, ωS
i
∝ ri, si
∝ ri, si
∝ 1/ri, 1/si
∝ |R|, |S|
∝ 1/n
∝ n
n threads
WR
WS
thread
WR
WS
thread
...
READY
Out
READY
Deterministic parallel stream join with multiple physical streams
Dependencies
20. Agenda – Performance modeling of stream joins
• Performance modeling of stream joins
• Performance modeling of stream joins
• The model
• Evaluation
• Conclusions
V. Gulisano Performance Modeling of Stream Joins 20
21. Evaluation
• Runs common benchmark (Handshake Join, ScaleJoin)
• results compare the simulator’s output with a Java implementation
(available at https://github.com/dcs-chalmers/Join_Model)
V. Gulisano Performance Modeling of Stream Joins 21
31. Agenda – Performance modeling of stream joins
• Performance modeling of stream joins
• Performance modeling of stream joins
• The model
• Evaluation
• Conclusions
V. Gulisano Performance Modeling of Stream Joins 31
32. Conclusions
Comprehensive dynamic model for stream joins
• Non-deterministic vs deterministic execution
• Single vs multiple physical streams
• Centralized vs parallel
• Non-saturated vs saturated
Very close matching between the model and empirical measurements
Open for future work, for instance:
• Other operators
• Worst case vs average case
...
V. Gulisano Performance Modeling of Stream Joins 32