SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Stream Loops on Flink
Reinventing the wheel for the streaming era
Paris Carbone
@FF2018-Berlin
Systems Researcher@KTH/SICS <parisc@kth.se>
Committer@Flink <senorcarbone@apache.org>
1
Stream Compute Landmarks
A Timeline
Sliding
Window
Semantics
Task Parallel
Execution
Arbitrary
Computation
Nesting
Out-of-order
Processing
(watermarks)
State Management
-Fault Tolerance
-Reconfiguration
2
Has Streaming Subsumed Batch?
short answer: NO
long answer: not YET
• Staged Processing
• Allows Bulk/Delta Iterative Computation
multi-pass
Flink
DataSet
• Continuous Processing
• Allows correct Acyclic Computation
Flink
DataStream
single-pass
?
3
Why Iterations Matter
• Iterations are fundamental building blocks for:
• Graph Analysis (PageRank, Conn.Comp., SSSP etc.)
• Machine Learning (Gradient Descent, PCA etc.)
• Transactions (e.g., optimistic concurrency control)
Iterative Processing Primitives are currently not present
in todays’ production-grade stream processing systems.
4
Current Challenges
• Data is born and evolves continuously as a data stream
(e.g., user interactions, sensor events, server logs)
5
Current Challenges
Day 1 Day 2 Day 3 Day 4
• To run iterative analysis users have manually do the…
• bucketing
• laundry (schedule jobs)
• cleaning (integration)
6
Current Challenges
• To run iterative analysis users have manually do the…
• bucketing
• laundry (iterative job scheduling)
• cleaning (integration)
Day 1 Day 2 Day 3 Day 4
7
Current Challenges
• To run iterative analysis users have manually do the…
• bucketing
• laundry (iterative job scheduling)
• sorting (model integration)
Day 1 Day 2 Day 3 Day 4
8
Current Challenges
• To run iterative analysis users have manually do the…
• bucketing
• laundry (iterative job scheduling)
• sorting (model integration)
Day 1 Day 2 Day 3 Day 4
9
A Partially Solved Problem
• Most challenges mentioned are solved and automated for
single-pass aggregation via stream windowing.
How would a distributed iterative
computation look like as a special
type of window aggregation?
window
computation
windows are a natural fit for
declaring iterative computation
10
Anatomy of Iterative Computation
Termination (fixpoint/fixed)
Loop Step Function (computation)
Final Result-Solution
Solution (finite state)
11
Programming Model
Extensions
Distributed Dataflow
Execution
DataStream Model
StreamML
Stream
Libraries
Core
Stream API
Runtime
Graph
CEP
SQL
Table
• Multi-Pass Window
Aggregations
• Synchrony and
Termination
• Pregel on Windows
Proposed Additions
based on Flink 1.6
12
Window Iterations
An extension of existing window aggregation to multi-pass aggregates
13
entry step finalizewindow
iterate
OUTIN
R
Loop Context
14
PageRank Example
entry
step
finalize
window
iterate
OUT
IN
15
PageRank Example
entry
step
finalize
window
iterate
OUT
IN
16
PageRank Example
entry
step
finalize
window
iterate
OUT
IN
17
PageRank Example
entry
step
finalize
window
iterate
OUT
IN
18
PageRank Example
entry
step
finalize
window
iterate
OUT
IN
19
Higher-Level Abstractions
• Vertex-Centric Model (Part of experimental Gelly-Streams lib)
Example:
single source
shortest paths
20
Enough about what
• We need to examine “how” to implement iterations
executed in parallel and out-of-order
using event-time progress
window
computation
21
How Event-Time Progress Works
T
t
progress metric
record
• Every record carries a timestamp t
• t ≥ T
• T can only increment (monotonicity)
T: indication of a low
event-time boundstream
task
22
How Event-Time Progress Works
• low watermarks propagate
progress metrics along the
computational graph
• progress metric = minimum
watermark across channels
• works correctly as long as
monotonicity is maintained
23
Intuition
Iteration Superstep progress bears similarities to Event-Time progress
1. Can be used to run computation for out-of-order streams.
2. Need for a decentralised mechanism (no coordinator/scheduler).
3. Only relevant to respective parts of the graph (i.e., sources cannot
make use of sink progress)
24
Watermarks and
Unstructured Loops
t2 t4t1
W=n
… …t3
• Low watermarking is a very powerful mechanism to measure and
propagate progress of a single progress metric.
• However it breaks when we
have arbitrary cycles
(current Flink iterations)
old watermark
25
which superstep?
which cycle?
Proposed Extensions
Distributed Dataflow
Execution
DataStream Model
StreamML
Stream
Libraries
Core
Stream API
Runtime
Graph
CEP
SQL
Table
• Structured Loops
• Progress Timestamps
• Cyclic Flow Control
runtime Additions
based on Flink 1.6
26
Structured Loops
IN OUT
structured
loop
global scope
• Structure loops are low level stream graph primitives.
• They can be seen as ‘Iterative Operators’
• Introduce the notion of “structured programming” for data streaming.
• Each structured loop has a scope.
• Each scope operates on its own progress metric.
• Global scope operates on event-time progress (no changes made).
R
27
Structured Loops Composition
R
IN OUT
loop scope
global scope
R’
nested loop scope
28
• Encourages compositionally
(arbitrary nesting)
Using Progress Timestamps
• Nests progress timestamps
• (e.g., last superstep pT per window Pctx)
• How do we guarantee monotonic progress metrics?
29
Structured Loop Embeddings
R
IN OUT
LIN LOUT
LH LT
nest unnest
incr
[32] [0,32] [32]
[0,32]
[1,32]
[1,32]
[31]
term
[ø,32]
User-Defined Loop DAG
[ø,32][ø,32]
• Initializes Scope’s Progress Metric for
both records and watermarks (nest)
• Triggers final output of an iterative operator
• Removes inner progress metric (unnest)
• Increments Progress Metric (incr)
• Forwards Records and Watermarks
• Evaluates and Disseminates Termination (ø)
30
Example: Window Iteration
R
IN OUT
LIN LOUT
LH
LT
entry
step
fin
window
split
Managed State
31
Further Challenges
• Cyclic Flow Control
• avoid deadlocks
• encourage iteration completion
• State Management Integration
• one more level of indirection (mapvalue per key)
what keeps us busy
32
Cyclic Flow Control
R
IN OUT
LIN LOUT
LH
LT
union
entry
step
fin
window
split
Managed State
flow
control
R
IN
• (Consumed Buffers < Threshold): Round-Robin between IN and R
• (Consumed Buffers >= Threshold): Exclusive Ingestion of R
33
FLIP-15
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132
• Introduces Scoping and Nesting (for Structured Loops)
• Addresses Flow Control Alternatives
• Discusses Computation-agnostic Loop Termination
34
Research Team
and Sponsors
• Marius Melzer (TUDresden)
• Asterios Katsifodimos (TUDelft)
• Vasiliki Kalavri (ETH)
• Seif Haridi (KTH)
• Pramod Bhatotia (Un.Edinburgh)
35
References
• Scalable and Reliable Data Stream Processing - Paris Carbone (2018)
• Inflight Progress Tracking for Stream Processing Systems with Cyclic
Dataflows - Marius Melzer (2017)
• Naiad: a timely dataflow system - Murray, McSherry et al. (2013)
• The dataflow model: a practical approach to balancing correctness, latency,
and cost in massive-scale, unbounded, out-of-order data processing - Tyler
Akidau et al. (2015)
• Out-of-order processing: a new architecture for high-
performance stream systems - Li, Maier et al. (2008)
• Flexible time management in data stream systems - Srivastava, Widom
(2004)
36
Stream Loops on Flink
Reinventing the wheel for the streaming era
Paris Carbone
@FF2018-Berlin
Systems Researcher@KTH/SICS <parisc@kth.se>
Committer@Flink <senorcarbone@apache.org>
37

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
 
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA SystemsUsing OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
 
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...
 
Scalable Online Analytics for Monitoring
Scalable Online Analytics for MonitoringScalable Online Analytics for Monitoring
Scalable Online Analytics for Monitoring
 
DSD-INT 2020 Simulation with RTC
DSD-INT 2020 Simulation with RTCDSD-INT 2020 Simulation with RTC
DSD-INT 2020 Simulation with RTC
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And Performance
 
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
 
DSD-INT 2018 Delft3D Flexible Mesh status and features - Kleczek
DSD-INT 2018 Delft3D Flexible Mesh status and features - KleczekDSD-INT 2018 Delft3D Flexible Mesh status and features - Kleczek
DSD-INT 2018 Delft3D Flexible Mesh status and features - Kleczek
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Transform Your Telecom Operations with Graph Technologies
Transform Your Telecom Operations with Graph TechnologiesTransform Your Telecom Operations with Graph Technologies
Transform Your Telecom Operations with Graph Technologies
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 

Ähnlich wie Stream Loops on Flink - Reinventing the wheel for the streaming era

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
YoungSeok Yoon
 

Ähnlich wie Stream Loops on Flink - Reinventing the wheel for the streaming era (20)

An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data Streaming
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
SYNCHRONIZATION
SYNCHRONIZATIONSYNCHRONIZATION
SYNCHRONIZATION
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
 
The Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache FlinkThe Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache Flink
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015
 
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques Pipelining
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
 
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processing
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
 

Mehr von Paris Carbone

Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream Windows
Paris Carbone
 

Mehr von Paris Carbone (10)

Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
 
A Future Look of Data Stream Processing as an Architecture for AI
A Future Look of Data Stream Processing as an Architecture for AIA Future Look of Data Stream Processing as an Architecture for AI
A Future Look of Data Stream Processing as an Architecture for AI
 
Continuous Deep Analytics
Continuous Deep AnalyticsContinuous Deep Analytics
Continuous Deep Analytics
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analytics
 
Single-Pass Graph Stream Analytics with Apache Flink
Single-Pass Graph Stream Analytics with Apache FlinkSingle-Pass Graph Stream Analytics with Apache Flink
Single-Pass Graph Stream Analytics with Apache Flink
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream Windows
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 

Kürzlich hochgeladen

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Kürzlich hochgeladen (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Stream Loops on Flink - Reinventing the wheel for the streaming era

  • 1. Stream Loops on Flink Reinventing the wheel for the streaming era Paris Carbone @FF2018-Berlin Systems Researcher@KTH/SICS <parisc@kth.se> Committer@Flink <senorcarbone@apache.org> 1
  • 2. Stream Compute Landmarks A Timeline Sliding Window Semantics Task Parallel Execution Arbitrary Computation Nesting Out-of-order Processing (watermarks) State Management -Fault Tolerance -Reconfiguration 2
  • 3. Has Streaming Subsumed Batch? short answer: NO long answer: not YET • Staged Processing • Allows Bulk/Delta Iterative Computation multi-pass Flink DataSet • Continuous Processing • Allows correct Acyclic Computation Flink DataStream single-pass ? 3
  • 4. Why Iterations Matter • Iterations are fundamental building blocks for: • Graph Analysis (PageRank, Conn.Comp., SSSP etc.) • Machine Learning (Gradient Descent, PCA etc.) • Transactions (e.g., optimistic concurrency control) Iterative Processing Primitives are currently not present in todays’ production-grade stream processing systems. 4
  • 5. Current Challenges • Data is born and evolves continuously as a data stream (e.g., user interactions, sensor events, server logs) 5
  • 6. Current Challenges Day 1 Day 2 Day 3 Day 4 • To run iterative analysis users have manually do the… • bucketing • laundry (schedule jobs) • cleaning (integration) 6
  • 7. Current Challenges • To run iterative analysis users have manually do the… • bucketing • laundry (iterative job scheduling) • cleaning (integration) Day 1 Day 2 Day 3 Day 4 7
  • 8. Current Challenges • To run iterative analysis users have manually do the… • bucketing • laundry (iterative job scheduling) • sorting (model integration) Day 1 Day 2 Day 3 Day 4 8
  • 9. Current Challenges • To run iterative analysis users have manually do the… • bucketing • laundry (iterative job scheduling) • sorting (model integration) Day 1 Day 2 Day 3 Day 4 9
  • 10. A Partially Solved Problem • Most challenges mentioned are solved and automated for single-pass aggregation via stream windowing. How would a distributed iterative computation look like as a special type of window aggregation? window computation windows are a natural fit for declaring iterative computation 10
  • 11. Anatomy of Iterative Computation Termination (fixpoint/fixed) Loop Step Function (computation) Final Result-Solution Solution (finite state) 11
  • 12. Programming Model Extensions Distributed Dataflow Execution DataStream Model StreamML Stream Libraries Core Stream API Runtime Graph CEP SQL Table • Multi-Pass Window Aggregations • Synchrony and Termination • Pregel on Windows Proposed Additions based on Flink 1.6 12
  • 13. Window Iterations An extension of existing window aggregation to multi-pass aggregates 13
  • 20. Higher-Level Abstractions • Vertex-Centric Model (Part of experimental Gelly-Streams lib) Example: single source shortest paths 20
  • 21. Enough about what • We need to examine “how” to implement iterations executed in parallel and out-of-order using event-time progress window computation 21
  • 22. How Event-Time Progress Works T t progress metric record • Every record carries a timestamp t • t ≥ T • T can only increment (monotonicity) T: indication of a low event-time boundstream task 22
  • 23. How Event-Time Progress Works • low watermarks propagate progress metrics along the computational graph • progress metric = minimum watermark across channels • works correctly as long as monotonicity is maintained 23
  • 24. Intuition Iteration Superstep progress bears similarities to Event-Time progress 1. Can be used to run computation for out-of-order streams. 2. Need for a decentralised mechanism (no coordinator/scheduler). 3. Only relevant to respective parts of the graph (i.e., sources cannot make use of sink progress) 24
  • 25. Watermarks and Unstructured Loops t2 t4t1 W=n … …t3 • Low watermarking is a very powerful mechanism to measure and propagate progress of a single progress metric. • However it breaks when we have arbitrary cycles (current Flink iterations) old watermark 25 which superstep? which cycle?
  • 26. Proposed Extensions Distributed Dataflow Execution DataStream Model StreamML Stream Libraries Core Stream API Runtime Graph CEP SQL Table • Structured Loops • Progress Timestamps • Cyclic Flow Control runtime Additions based on Flink 1.6 26
  • 27. Structured Loops IN OUT structured loop global scope • Structure loops are low level stream graph primitives. • They can be seen as ‘Iterative Operators’ • Introduce the notion of “structured programming” for data streaming. • Each structured loop has a scope. • Each scope operates on its own progress metric. • Global scope operates on event-time progress (no changes made). R 27
  • 28. Structured Loops Composition R IN OUT loop scope global scope R’ nested loop scope 28 • Encourages compositionally (arbitrary nesting)
  • 29. Using Progress Timestamps • Nests progress timestamps • (e.g., last superstep pT per window Pctx) • How do we guarantee monotonic progress metrics? 29
  • 30. Structured Loop Embeddings R IN OUT LIN LOUT LH LT nest unnest incr [32] [0,32] [32] [0,32] [1,32] [1,32] [31] term [ø,32] User-Defined Loop DAG [ø,32][ø,32] • Initializes Scope’s Progress Metric for both records and watermarks (nest) • Triggers final output of an iterative operator • Removes inner progress metric (unnest) • Increments Progress Metric (incr) • Forwards Records and Watermarks • Evaluates and Disseminates Termination (ø) 30
  • 31. Example: Window Iteration R IN OUT LIN LOUT LH LT entry step fin window split Managed State 31
  • 32. Further Challenges • Cyclic Flow Control • avoid deadlocks • encourage iteration completion • State Management Integration • one more level of indirection (mapvalue per key) what keeps us busy 32
  • 33. Cyclic Flow Control R IN OUT LIN LOUT LH LT union entry step fin window split Managed State flow control R IN • (Consumed Buffers < Threshold): Round-Robin between IN and R • (Consumed Buffers >= Threshold): Exclusive Ingestion of R 33
  • 34. FLIP-15 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132 • Introduces Scoping and Nesting (for Structured Loops) • Addresses Flow Control Alternatives • Discusses Computation-agnostic Loop Termination 34
  • 35. Research Team and Sponsors • Marius Melzer (TUDresden) • Asterios Katsifodimos (TUDelft) • Vasiliki Kalavri (ETH) • Seif Haridi (KTH) • Pramod Bhatotia (Un.Edinburgh) 35
  • 36. References • Scalable and Reliable Data Stream Processing - Paris Carbone (2018) • Inflight Progress Tracking for Stream Processing Systems with Cyclic Dataflows - Marius Melzer (2017) • Naiad: a timely dataflow system - Murray, McSherry et al. (2013) • The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing - Tyler Akidau et al. (2015) • Out-of-order processing: a new architecture for high- performance stream systems - Li, Maier et al. (2008) • Flexible time management in data stream systems - Srivastava, Widom (2004) 36
  • 37. Stream Loops on Flink Reinventing the wheel for the streaming era Paris Carbone @FF2018-Berlin Systems Researcher@KTH/SICS <parisc@kth.se> Committer@Flink <senorcarbone@apache.org> 37