SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
Scotty: Efficient Window Aggregation with
General Stream Slicing
Berlin, October 7-9, 2019
Philipp M. Grulich
Research Associate (TU Berlin)
Jonas Traub
Research Associate (TU Berlin)
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
53
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
8
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
Scotty Window Processor:
Efficent Window Aggregations for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
5
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
The number of slices depends on the workload.
6
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
7
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
8
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
9
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
10
We store partial aggregates instead of all tuples. => Small memory footprint.
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
11
We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
12
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
We require just a few computation steps to calculate final aggregates. => Low latency.
13
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
General Stream Slicing combines generality and efficiency in a single solution.
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
Count-based tumbling
window with a length of 5
tuples.
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling
window with a length of 5
tuples.
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling
window with a length of 5
tuples.
11 13 12
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12
What if the stream is out-of-order?
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
13 12
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
5
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 125 + - 3
5
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
What if the aggregation function is not invertible?
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Window Processor:
Efficent Window Aggregations
for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
16
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
Connectors:
…more coming soon…
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
Scotty adapts to work load characteristics
and combines generality and efficiency in a single solution.
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
Throughput(Tuples/sec.)
Number of Councurrent Windows
19
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
● With overlapping
concurrent windows,
the throughput drops
drastically.
Throughput(Tuples/sec.)
Number of Councurrent Windows
19
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink+Scotty Storm+Scotty Beam+Flink+Scotty
Benchmark
Concurrent Windows with Scotty:
● With Scotty, the throughput
is independent of the
number of concurrent
windows.
20
Throughput(Tuples/sec.)
Number of Councurrent Windows
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Clone Scotty and install to maven
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Clone Scotty and install to maven
2. Add Scotty to your Flink Project:
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
2. Add Window Definitions
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
3. Add Scotty to your Flink Job
2. Add Window Definitions
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP).
Scotty Window Processor
Scotty Features:
● One window operator for many systems.
● High performance with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics
tu-berlin-dima.github.io/
scotty-window-processor
Open Source Repository:
23

Weitere ähnliche Inhalte

Ähnlich wie FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General Stream Slicing

Replacing Academic Journals
Replacing Academic JournalsReplacing Academic Journals
Replacing Academic JournalsBjörn Brembs
 
Costs of the French PWR
Costs of the French PWRCosts of the French PWR
Costs of the French PWRmyatom
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Jonas Traub
 
Sps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi LettenmaierSps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi LettenmaierCSCP
 
OPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution gridsOPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution gridsOPAL-RT TECHNOLOGIES
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFogGuru MSCA Project
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402vrij
 

Ähnlich wie FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General Stream Slicing (8)

Replacing Academic Journals
Replacing Academic JournalsReplacing Academic Journals
Replacing Academic Journals
 
Costs of the French PWR
Costs of the French PWRCosts of the French PWR
Costs of the French PWR
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
 
Sps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi LettenmaierSps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi Lettenmaier
 
Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...
Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...
Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...
 
OPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution gridsOPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution grids
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402
 

Mehr von Jonas Traub

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Jonas Traub
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Jonas Traub
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Jonas Traub
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Jonas Traub
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Jonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingJonas Traub
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...Jonas Traub
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataJonas Traub
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)Jonas Traub
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
 

Mehr von Jonas Traub (13)

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 

Kürzlich hochgeladen

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 

FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General Stream Slicing

  • 1. Scotty: Efficient Window Aggregation with General Stream Slicing Berlin, October 7-9, 2019 Philipp M. Grulich Research Associate (TU Berlin) Jonas Traub Research Associate (TU Berlin)
  • 2. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 2
  • 3. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 2
  • 4. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 53 2
  • 5. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 2
  • 6. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Motivation 3
  • 7. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Motivation 3
  • 8. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) 4
  • 9. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) 4
  • 10. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) 4
  • 11. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 4
  • 12. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 5
  • 13. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example The number of slices depends on the workload. 6
  • 14. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 7
  • 15. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 8
  • 16. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 9
  • 17. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 10 We store partial aggregates instead of all tuples. => Small memory footprint.
  • 18. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 11 We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
  • 19. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 12
  • 20. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example We require just a few computation steps to calculate final aggregates. => Low latency. 13
  • 21. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing 14
  • 22. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics 14
  • 23. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 24. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 25. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 26. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 27. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility General Stream Slicing combines generality and efficiency in a single solution. 14
  • 28. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 15
  • 29. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 15
  • 30. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 Count-based tumbling window with a length of 5 tuples. 15
  • 31. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 15
  • 32. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 11 13 12 15
  • 33. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 11 13 12 What if the stream is out-of-order? 15
  • 34. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 15
  • 35. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 15
  • 36. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 15
  • 37. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 15
  • 38. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 13 12 15
  • 39. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 15
  • 40. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 5 15
  • 41. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 125 + - 3 5 15
  • 42. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 15
  • 43. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 What if the aggregation function is not invertible? 15
  • 44. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 16
  • 45. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: 17
  • 46. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. 17
  • 47. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. 17
  • 48. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. 17
  • 49. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. 17
  • 50. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order 17
  • 51. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order Connectors: …more coming soon… 17
  • 52. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 53. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 54. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 55. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 56. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core Scotty adapts to work load characteristics and combines generality and efficiency in a single solution. 18
  • 57. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam Throughput(Tuples/sec.) Number of Councurrent Windows 19
  • 58. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam ● With overlapping concurrent windows, the throughput drops drastically. Throughput(Tuples/sec.) Number of Councurrent Windows 19
  • 59. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink+Scotty Storm+Scotty Beam+Flink+Scotty Benchmark Concurrent Windows with Scotty: ● With Scotty, the throughput is independent of the number of concurrent windows. 20 Throughput(Tuples/sec.) Number of Councurrent Windows
  • 60. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Clone Scotty and install to maven 21
  • 61. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Clone Scotty and install to maven 2. Add Scotty to your Flink Project: 21
  • 62. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 22
  • 63. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 2. Add Window Definitions 22
  • 64. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 3. Add Scotty to your Flink Job 2. Add Window Definitions 22
  • 65. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP). Scotty Window Processor Scotty Features: ● One window operator for many systems. ● High performance with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics tu-berlin-dima.github.io/ scotty-window-processor Open Source Repository: 23