SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Migration of a realtime stats product
from Storm to Flink
Patrick Gunia
Engineering Manager @ ResearchGate
The social network gives scientists new tools
to connect, collaborate, and keep up with the
research that matters most to them.
ResearchGate is built
for scientists.
What to count –Reads
What to count –Reads
What to count – Requirements
1. Correctness
2. (Near) realtime
3. Adjustable
How we did it in the past…
producer
producer
producer
producer
event
event
Active MQ Storm Hbase
dataflow
Never change a running system….
…so why would we?
Drawbacks of the original solution
Counting Scheme
time
Event Event Event Event Event Event
Counter
A
+1 +1 +1
Counter
B
+1 +1 +1 +1
Counter
C
+1 +1 +1
• Single increments are inefficient and cause a high write load on the
database
• Roughly ~1,000 million counters are incremented per day
• Load grows linearly with platform activity (e.g. x increment operations
per publication read)
Drawbacks of the original solution
Performance / Efficiency
• At-least-once semantics in combination with the simple increment logic
can cause multiple increments per event on the same counter
• To ensure correctness, additional components are needed (e.g.
consistency checks and fix jobs)
Drawbacks of the original solution
Operational difficulties
From stateless to stateful counting
How we do it now…
dataflow
producer
producer
producer
producer
event
event
Hbase
Kafka
Flink
Kafka dispatcher
service
As this is Flink Forward…
…let’s focus on the Flink part of the story
What we want to achieve
1. Migrate the Storm implementation to Flink
2. Improve efficiency and performance
3. Improve ease of operation
From stateless to stateful counting
Baseline implementation
Simple Counting
Migrate the Storm implementation to Flink
1. Read input event from Kafka
2. FlatMap input event into (multiple) entities
3. Output results (single counter increments) to Kafka sink
4. Perform DB operation based on Kafka messages
Simple Counting
Benefit
Lines of Code: Reduced lines of code by ~70% by using the High-Level
language abstraction offered by Flink APIs
What we want to achieve
1. Migrate the Storm implementation to Flink ✔
2. Improve efficiency and performance
3. Improve ease of operation
From stateless to stateful counting
Challenge 1: Performance / Efficiency
"Statefuller" Counting
Doing it the more Flink(y) way…
1. Read input event from Kafka
2. FlatMap input event into (multiple) entities
3. Key stream based on entity key
4. Use time window to pre-aggregate
5. Output aggregates to Kafka sink
6. Perform DB operations based on Kafka messages
final DataStream<CounterDTO> output = source
// create counters based on input events
.flatMap(new MapEventToCounterKey(mapConfig))
.name(MapEventToCounterKey.class.getSimpleName())
// key the stream by minute aligned counter keys
.keyBy(new CounterMinuteKeySelector())
// and collect them for a configured window
.timeWindow(Time.milliseconds(config.getTimeWindowInMs()))
.fold(null, new MinuteWindowFold())
.name(MinuteWindowFold.class.getSimpleName())
"Statefuller" Counting
Counting Scheme
time
Event Event Event Event Event
Counter
A
+1 +1 +1
Counter
B
+1 +1 +1
"Statefuller" Counting
Counting Scheme
time
Event Event Event Event Event
Counter
A
+1 +1 +1
Counter
B
+1 +1 +1
Using windows to pre-aggregate increments
"Statefuller" Counting
Counting Scheme
Event Event Event Event Event
Counter
A
+2 +1
Counter
B
+1 +2
time
Event Event Event Event Event
Counter
A
+1 +1 +1
Counter
B
+1 +1 +1
Using windows to pre-aggregate increments
"Statefuller" Counting
Benefits
1. Easy to implement (KeySelector, TimeWindow, Fold)
2. Small resource footprint: two yarn containers with 4GB of memory
3. Window-based pre-aggregation reduced the write load on the database
from ~1,000 million increments per day to ~200 million (~80%)
What we want to achieve
1. Migrate the Storm implementation to Flink ✔
2. Improve efficiency and performance ✔
3. Improve ease of operation
From stateless to stateful counting
Challenge 2: Improve ease of operation
Idempotent counting
Counting Scheme
time
Event Event Event Event Event
Counter
A
+2 +1
Counter
B
+1 +2
Idempotent counting
Counting Scheme
time
Replace increments with PUT operations
Event Event Event Event Event
Counter
A
+2 +1
Counter
B
+1 +2
Idempotent counting
Counting Scheme
Event Event Event Event Event
Counter
A
40 41
Counter
B
12 14
time
Replace increments with PUT operations
Event Event Event Event Event
Counter
A
+2 +1
Counter
B
+1 +2
Idempotent counting
Scheme
Window
Key by Minute Key by Day
Window
Fold Fold
Windows
1st: Pre-aggregates increments
2nd: Accumulates the state for counters with day granularity
final DataStream<CounterDTO> output = source
// create counters based on input events
.flatMap(new MapEventToCounterKey(mapConfig))
.name(MapEventToCounterKey.class.getSimpleName())
// key the stream by minute aligned counter keys
.keyBy(new CounterMinuteKeySelector())
// and collect them for a configured window
.timeWindow(Time.milliseconds(config.getTimeWindowInMs()))
.fold(null, new MinuteWindowFold())
.name(MinuteWindowFold.class.getSimpleName())
// key the stream by day aligned counter keys
.keyBy(new CounterDayKeySelector())
.timeWindow(Time.minutes(config.getDayWindowTimeInMinutes()))
// trigger on each element and purge the window
.trigger(new OnElementTrigger())
.fold(null, new DayWindowFold())
.name(DayWindowFold.class.getSimpleName());
Back to our graphs...
Back to our graphs...
Back to our graphs...
All time counters
•Windows for counters with day granularity are finite and can be closed
•this enables Put operations instead of increments and thus eases operation
and increases correctness and trust
But how to handle updates of counters, that are never „closed“?
All time counters – 1st idea
1. Key the stream based on a key that identifies the „all-time“ counter
2. Update the counter state for every message in the keyed stream
3. Output the current state and perform a Put operation on the database
All time counters – 1st idea
Problems
•Creating the initial state for all existing counters is not trivial
•The key space is unlimited, thus the state will grow infinitely
Thus we‘ve come up with something else...
All time counters – 2nd idea
•Day counter updates can be used to also update the all-time counter
idempotently
• Simplified:
1. Remember what you did before
2. Revert the previous operation
3. Apply the update
All time counters – 2nd idea
Update Scheme
All-time Current Day
Counter 245 5
All time counters – 2nd idea
Update Scheme
All-time Current Day
Counter 245 5
Incoming Update for current day, new value: 7
All time counters – 2nd idea
Update Scheme
All-time Current Day
Counter 245 5
Step 1: Revert the previous operation 245 – 5 = 240
Step 2: Apply the update 240 + 7 = 247
Incoming Update for current day, new value: 7
All time counters – 2nd idea
Update Scheme
All-time Current Day
Counter 245 5
Step 1: Revert the previous operation 245 – 5 = 240
Step 2: Apply the update 240 + 7 = 247
Incoming Update for current day, new value: 7
Put the row back into the database
All time counters – 2nd idea
Update Scheme
All-time Current Day
Counter 245 5
Step 1: Revert the previous operation 245 – 5 = 240
Step 2: Apply the update 240 + 7 = 247
Incoming Update for current day, new value: 7
All-time Current Day
Counter 247 7
Put the row back into the database
Stats processing pipeline
dataflow
producer
producer
producer
producer
event
event
Hbase
Kafka
Flink
Kafka dispatcher
service
What we want to achieve
1. Migrate the Storm implementation to Flink ✔
2. Improve efficiency and performance ✔
3. Improve ease of operation ✔
How to integrate stream and batch
processing?
Batch World
There are a couple of things that might
• be unknown at processing time (e.g. bad behaving crawlers)
• change after the fact (e.g. business requirements)
Batch World
• Switching to Flink enabled us to re-use parts of the implemented
business logic in a nightly batch job
• new counters can easily be added and existing ones modified
• the described architecture allows us to re-use code and building blocks
from our infrastructure
Thank you!
Questions?
patrick.gunia@researchgate.net
Twitter: @patgunia
LinkedIn: patrickgunia

Weitere ähnliche Inhalte

Was ist angesagt?

data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product AnnouncementFlink Forward
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkVerverica
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaFlink Forward
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
 
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...Flink Forward
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords Stephan Ewen
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4Till Rohrmann
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Flink Forward
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Kostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsKostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsVerverica
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 

Was ist angesagt? (20)

data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Kostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsKostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIs
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 

Ähnlich wie Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats product from Storm to Flink

Building Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsBuilding Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsJ On The Beach
 
Prometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudPrometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudSneha Inguva
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIsFlink Forward
 
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...Databricks
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupRobert Metzger
 
Cf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphiteCf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphiteJeff Barrows
 
Client-server architecture (clientserver) is a network architecture .pdf
Client-server architecture (clientserver) is a network architecture .pdfClient-server architecture (clientserver) is a network architecture .pdf
Client-server architecture (clientserver) is a network architecture .pdffabmallkochi
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every dayVadym Khondar
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonVMware Tanzu
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonVMware Tanzu
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixHostedbyConfluent
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 

Ähnlich wie Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats product from Storm to Flink (20)

Building Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsBuilding Applications with Streams and Snapshots
Building Applications with Streams and Snapshots
 
Prometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudPrometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the Cloud
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
 
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Cf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphiteCf summit-2016-monitoring-cf-sensu-graphite
Cf summit-2016-monitoring-cf-sensu-graphite
 
Client-server architecture (clientserver) is a network architecture .pdf
Client-server architecture (clientserver) is a network architecture .pdfClient-server architecture (clientserver) is a network architecture .pdf
Client-server architecture (clientserver) is a network architecture .pdf
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 

Mehr von Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Mehr von Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Kürzlich hochgeladen

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Kürzlich hochgeladen (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats product from Storm to Flink

  • 1. Migration of a realtime stats product from Storm to Flink Patrick Gunia Engineering Manager @ ResearchGate
  • 2. The social network gives scientists new tools to connect, collaborate, and keep up with the research that matters most to them. ResearchGate is built for scientists.
  • 3.
  • 4. What to count –Reads
  • 5. What to count –Reads
  • 6. What to count – Requirements 1. Correctness 2. (Near) realtime 3. Adjustable
  • 7. How we did it in the past… producer producer producer producer event event Active MQ Storm Hbase dataflow
  • 8. Never change a running system…. …so why would we?
  • 9. Drawbacks of the original solution Counting Scheme time Event Event Event Event Event Event Counter A +1 +1 +1 Counter B +1 +1 +1 +1 Counter C +1 +1 +1
  • 10. • Single increments are inefficient and cause a high write load on the database • Roughly ~1,000 million counters are incremented per day • Load grows linearly with platform activity (e.g. x increment operations per publication read) Drawbacks of the original solution Performance / Efficiency
  • 11. • At-least-once semantics in combination with the simple increment logic can cause multiple increments per event on the same counter • To ensure correctness, additional components are needed (e.g. consistency checks and fix jobs) Drawbacks of the original solution Operational difficulties
  • 12. From stateless to stateful counting
  • 13. How we do it now… dataflow producer producer producer producer event event Hbase Kafka Flink Kafka dispatcher service
  • 14. As this is Flink Forward… …let’s focus on the Flink part of the story
  • 15. What we want to achieve 1. Migrate the Storm implementation to Flink 2. Improve efficiency and performance 3. Improve ease of operation
  • 16. From stateless to stateful counting Baseline implementation
  • 17. Simple Counting Migrate the Storm implementation to Flink 1. Read input event from Kafka 2. FlatMap input event into (multiple) entities 3. Output results (single counter increments) to Kafka sink 4. Perform DB operation based on Kafka messages
  • 18. Simple Counting Benefit Lines of Code: Reduced lines of code by ~70% by using the High-Level language abstraction offered by Flink APIs
  • 19. What we want to achieve 1. Migrate the Storm implementation to Flink ✔ 2. Improve efficiency and performance 3. Improve ease of operation
  • 20. From stateless to stateful counting Challenge 1: Performance / Efficiency
  • 21. "Statefuller" Counting Doing it the more Flink(y) way… 1. Read input event from Kafka 2. FlatMap input event into (multiple) entities 3. Key stream based on entity key 4. Use time window to pre-aggregate 5. Output aggregates to Kafka sink 6. Perform DB operations based on Kafka messages
  • 22. final DataStream<CounterDTO> output = source // create counters based on input events .flatMap(new MapEventToCounterKey(mapConfig)) .name(MapEventToCounterKey.class.getSimpleName()) // key the stream by minute aligned counter keys .keyBy(new CounterMinuteKeySelector()) // and collect them for a configured window .timeWindow(Time.milliseconds(config.getTimeWindowInMs())) .fold(null, new MinuteWindowFold()) .name(MinuteWindowFold.class.getSimpleName())
  • 23. "Statefuller" Counting Counting Scheme time Event Event Event Event Event Counter A +1 +1 +1 Counter B +1 +1 +1
  • 24. "Statefuller" Counting Counting Scheme time Event Event Event Event Event Counter A +1 +1 +1 Counter B +1 +1 +1 Using windows to pre-aggregate increments
  • 25. "Statefuller" Counting Counting Scheme Event Event Event Event Event Counter A +2 +1 Counter B +1 +2 time Event Event Event Event Event Counter A +1 +1 +1 Counter B +1 +1 +1 Using windows to pre-aggregate increments
  • 26. "Statefuller" Counting Benefits 1. Easy to implement (KeySelector, TimeWindow, Fold) 2. Small resource footprint: two yarn containers with 4GB of memory 3. Window-based pre-aggregation reduced the write load on the database from ~1,000 million increments per day to ~200 million (~80%)
  • 27. What we want to achieve 1. Migrate the Storm implementation to Flink ✔ 2. Improve efficiency and performance ✔ 3. Improve ease of operation
  • 28. From stateless to stateful counting Challenge 2: Improve ease of operation
  • 29. Idempotent counting Counting Scheme time Event Event Event Event Event Counter A +2 +1 Counter B +1 +2
  • 30. Idempotent counting Counting Scheme time Replace increments with PUT operations Event Event Event Event Event Counter A +2 +1 Counter B +1 +2
  • 31. Idempotent counting Counting Scheme Event Event Event Event Event Counter A 40 41 Counter B 12 14 time Replace increments with PUT operations Event Event Event Event Event Counter A +2 +1 Counter B +1 +2
  • 32. Idempotent counting Scheme Window Key by Minute Key by Day Window Fold Fold Windows 1st: Pre-aggregates increments 2nd: Accumulates the state for counters with day granularity
  • 33. final DataStream<CounterDTO> output = source // create counters based on input events .flatMap(new MapEventToCounterKey(mapConfig)) .name(MapEventToCounterKey.class.getSimpleName()) // key the stream by minute aligned counter keys .keyBy(new CounterMinuteKeySelector()) // and collect them for a configured window .timeWindow(Time.milliseconds(config.getTimeWindowInMs())) .fold(null, new MinuteWindowFold()) .name(MinuteWindowFold.class.getSimpleName()) // key the stream by day aligned counter keys .keyBy(new CounterDayKeySelector()) .timeWindow(Time.minutes(config.getDayWindowTimeInMinutes())) // trigger on each element and purge the window .trigger(new OnElementTrigger()) .fold(null, new DayWindowFold()) .name(DayWindowFold.class.getSimpleName());
  • 34. Back to our graphs...
  • 35. Back to our graphs...
  • 36. Back to our graphs...
  • 37. All time counters •Windows for counters with day granularity are finite and can be closed •this enables Put operations instead of increments and thus eases operation and increases correctness and trust But how to handle updates of counters, that are never „closed“?
  • 38. All time counters – 1st idea 1. Key the stream based on a key that identifies the „all-time“ counter 2. Update the counter state for every message in the keyed stream 3. Output the current state and perform a Put operation on the database
  • 39. All time counters – 1st idea Problems •Creating the initial state for all existing counters is not trivial •The key space is unlimited, thus the state will grow infinitely Thus we‘ve come up with something else...
  • 40. All time counters – 2nd idea •Day counter updates can be used to also update the all-time counter idempotently • Simplified: 1. Remember what you did before 2. Revert the previous operation 3. Apply the update
  • 41. All time counters – 2nd idea Update Scheme All-time Current Day Counter 245 5
  • 42. All time counters – 2nd idea Update Scheme All-time Current Day Counter 245 5 Incoming Update for current day, new value: 7
  • 43. All time counters – 2nd idea Update Scheme All-time Current Day Counter 245 5 Step 1: Revert the previous operation 245 – 5 = 240 Step 2: Apply the update 240 + 7 = 247 Incoming Update for current day, new value: 7
  • 44. All time counters – 2nd idea Update Scheme All-time Current Day Counter 245 5 Step 1: Revert the previous operation 245 – 5 = 240 Step 2: Apply the update 240 + 7 = 247 Incoming Update for current day, new value: 7 Put the row back into the database
  • 45. All time counters – 2nd idea Update Scheme All-time Current Day Counter 245 5 Step 1: Revert the previous operation 245 – 5 = 240 Step 2: Apply the update 240 + 7 = 247 Incoming Update for current day, new value: 7 All-time Current Day Counter 247 7 Put the row back into the database
  • 47. What we want to achieve 1. Migrate the Storm implementation to Flink ✔ 2. Improve efficiency and performance ✔ 3. Improve ease of operation ✔
  • 48. How to integrate stream and batch processing?
  • 49. Batch World There are a couple of things that might • be unknown at processing time (e.g. bad behaving crawlers) • change after the fact (e.g. business requirements)
  • 50. Batch World • Switching to Flink enabled us to re-use parts of the implemented business logic in a nightly batch job • new counters can easily be added and existing ones modified • the described architecture allows us to re-use code and building blocks from our infrastructure

Hinweis der Redaktion

  1. Why not use a fold() instead of a Window for the day increments? The fold state is never cleaned up and would remain in the system forever! Using a window operator makes it possible to get rid of states after the window has been closed.