SlideShare a Scribd company logo
1 of 48
Download to read offline
© 2019 Ververica 1
Apache Flink®
An Introduction and Outlook into the Future
Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.
Timo Walther
Follow me: @twalthr (yes, without the e)
© 2019 Ververica 2
About me
Timo Walther
● Apache Flink Committer and PMC Member
● Part of Flink since 2013 (before it was actually called "Flink")
● Software Engineer @ Ververica
(formerly dataArtisans, now part of Alibaba Group)
© 2019 Ververica 3
About Ververica
Original creators of
Apache Flink®
Complete Stream
Processing Infrastructure
© 2019 Ververica 4
Ververica Platform
© 2019 Ververica 5
This talk is about Apache Flink
● What is Flink?
● Use Cases & Users
● Stateful Stream Processing
● Event-Time Processing
● APIs
● Ecosystem
● Community
● Roadmap & Future
© 2019 Ververica 6
What is Flink?
© 2019 Ververica 7
Event Streams State (Event) Time Snapshots
Core Building Blocks for Stream Processing
real-time and
replay
complex
business logic
consistency with
out-of-order data
and late data
forking /
versioning /
time-travel
© 2019 Ververica 8
What is Apache Flink?
Scalable embedded state
Access at memory speed &
scales with parallel operators.
© 2019 Ververica 9
9
What is Apache Flink?
Stateful computations over streams
real-time and historic:
fast, scalable, fault tolerant,
event time, large state, exactly-once
© 2019 Ververica 10
Flink Unifies Stream and Batch Processing
● Processes unbounded (stream) and bounded (batch) data
● Processes recorded (offline) and live (real-time) data
● Serves most streaming & batch use cases
– Data Pipelines, Analytics, CEP, Event-driven Applications
© 2019 Ververica 11
Consistency, Scale, Ecosystem
● Flexible and expressive APIs
● Guaranteed correctness
○ Exactly-once state consistency
○ Event-time semantics
● In-memory processing at massive scale
○ Runs on 10000s of cores
○ Manages 10s TBs of state
● Flexible deployments and large ecosystem
○ Kubernetes, YARN, Mesos, Docker, S3, HDFS, Kafka, Kinesis, …
© 2019 Ververica 12
Use Case & Users
© 2019 Ververica 13
Use Case: ETL and Data Pipelining
● Periodic ETL is the traditional
approach
○ External tool periodically triggers
ETL batch job
○ Also supported by Flink
● Data pipelines continuously
move data
○ Ingestion with low latency
○ No external tool
○ No artificial data boundaries
© 2019 Ververica 14
Use Case: Batch & Stream Analytics
● Batch analytics is great for ad-hoc
queries
○ Queries change faster than data
○ Interactive analytics / prototyping
● Stream analytics continuously
processes data
○ Data changes faster than queries
○ Live / low latency results
○ No Lambda architecture required!
© 2019 Ververica 15
Use Case: Event-Driven Applications
● Traditional application design
○ Compute & data tier architecture
○ React to and process events
○ State is stored in (remote) database
● Event-driven application
○ State is maintained locally
○ Guaranteed consistency by
periodic state checkpoints
○ Tight coupling of logic and data
(microservice architecture)
○ Highly scalable design
© 2019 Ververica 16
Powered By Apache Flink
Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html
© 2019 Ververica 17
Rapidly Growing Adoption
Source: Qubole “2018 Survey of Big Data Trends and Challenges.”
A survey among 400+ technology decisions makers about their big data projects.
125%
© 2019 Ververica 18
Stateful Stream Processing
© 2019 Ververica 19
Designing Applications as Data Flows
● Data Flows are a common programming abstraction.
● Events flow from operator to operator.
● Data Flows can be executed in parallelized.
Src SnkMap
User
Function
Window
User
Function
keyBy
© 2019 Ververica 20
What is State in a Streaming Application?
● Many functions are stateful
○ Streaming data arrives over time
○ Functions need to remember records or temporary results
● Any variable that lives across function invocations is state
● State must not be lost in case of a failure
© 2019 Ververica 21
Maintaining and Checkpointing State
● Flink maintains state locally per task (in-mem / on-disk)
○ Fast access!
● State is periodically checkpointed to durable storage
○ A checkpoint is a consistent snapshot of the state of all tasks
© 2019 Ververica 22
Checkpoint Consistency
● All tasks copy their state exactly! when they processed all events up
to the same position in the input
o State of source tasks includes current read position in input (e.g., Kafka offset)
Task State
(Read Position)
Stateless Task
Task State
(Partial Aggregate)
© 2019 Ververica 23
Recovery and Guaranteed Consistency
● Recovery is like loading a saved computer game.
● Flink recovers state with exactly-once consistency.
○ After a failure, the application is restarted.
○ All tasks load their state from the latest checkpoint.
○ The application continues as if the failure never happened..
Loading
Game...
Game
saved!
GAME
OVER!
© 2019 Ververica 24
Much More Than Just Exactly-Once Recovery!
● Suspend and resume applications
● Fix and upgrade applications
● Migrate applications to a different / upgraded cluster
● Scale applications in and out
● A/B test applications
● ...
© 2019 Ververica 25
Event-Time Processing
© 2019 Ververica 26
What is Time in a Streaming Application?
● Streaming data arrives over time.
● Many streaming computations are defined based on time.
○ “Count the number of records every 10 minutes.”
○ “Run some logic 1 hour after you saw this record.”
○ “Wait for 30 more seconds for data to arrive.”
● This raises some questions.
○ How does Flink measure time?
○ How does time relate to data?
© 2019 Ververica 27
Event-Time and Processing-Time
Event
Generator
● Mobile App
● Webserver
● Sensor
● ...
12:00:01 11:59:56 11:58:37
Event with
timestamp
Processing-time job
Event-time job
11:57:12
11:57:12
Application time
driven by data
Application time
driven by
machine clock
© 2019 Ververica 28
What is Processing-Time?
● A record is processed based on the wall-clock time when it arrives.
● Results are inherently non-deterministic and depend on
○ Clocks, load, and processing speed of machines
○ Arrival / ingestion rate of data and possibly backpressure
○ ...
● Applications of processing-time
○ Does not work for recorded data.
○ Does not work for data that arrives out-of-order
○ Might be sufficient for approximate, low-latency results
© 2019 Ververica 29
What is Event-Time?
• A record is processed based on an embedded timestamp.
○ Timestamp typically denotes time when record was created.
• The “current” time is determined by watermarks
○ A watermark is a special record with a timestamp w
○ Denotes that no more records with a time t <= w will arrive
• Properties of event-time processing
○ Results are deterministic
○ Same semantics when processing recorded and live data
○ Can trade result latency for result completeness
© 2019 Ververica 30
APIs
© 2019 Ververica 31
Layered APIs
© 2019 Ververica 32
SQL & Table API
● Unified APIs for streaming data and data at rest
○ Run the same query on batch and streaming data
○ ANSI SQL: No stream-specific syntax or semantics!
○ Many common stream analytics use cases supported
SELECT
userId,
COUNT(*) AS cnt
SESSION_START(clicktime, INTERVAL '30' MINUTE)
FROM clicks
GROUP BY
SESSION(clicktime, INTERVAL '30' MINUTE),
userId
Count clicks per user and session
(defined by 30 min. gap of inactivity).
© 2019 Ververica 33
DataStream API
● Programs are composed as data flows
● Logic is implemented as custom user functions
○ map, flatMap, reduce, window aggregation, window join,
asynchronous request function, …
● Data is processed as arbitrary Java/Scala objects
○ (Avro) POJOs, Tuple, Row
© 2019 Ververica 34
DataStream API Example
// a stream of website clicks
DataStream<Click> clicks = ...
DataStream<Tuple2<String, Long>> result = clicks
// project clicks to userId and add a 1 for counting
.map(
// define function by implementing the MapFunction interface.
new MapFunction<Click, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(Click click) {
return Tuple2.of(click.userId, 1L);
}
})
// key by userId (field 0)
.keyBy(0)
// define session window with 30 minute gap
.window(EventTimeSessionWindows.withGap(Time.minutes(30L)))
// count clicks per session. Define function as lambda function.
.reduce((a, b) -> Tuple2.of(a.f0, a.f1 + b.f1));
Count clicks per user and session
(defined by 30 min. gap of inactivity).
Same use case as previous SQL query.
© 2019 Ververica 35
ProcessFunctions
● Flink’s most expressive function interfaces
○ Expose access to State and Time
○ Are embedded in DataStream programs
● Enable powerful applications
○ Put events or intermediate results into state for future computations
○ Register timers to be called back once “time is up”
● A collection of multiple function interfaces
○ 1 input, 1 windowed input,
2 key-partitioned inputs, 2 broadcasted/forwarded inputs, ...
© 2019 Ververica 36
DSL & Libraries
● Stateful Functions
○ API to build lightweight, stateful, and strongly consistent applications.
○ Apps are composed of stateful functions that can arbitrary message each other.
○ Contribution in progress
● DataSet API for batch processing
○ Flink is a great batch processing engine!
○ Process data in binary representation in managed memory.
● CEP Library for complex event processing
○ Detect patterns in event streams.
© 2019 Ververica 37
Ecosystem
© 2019 Ververica 38
Framework & Library Deployments
Framework Deployment Library Deployment
© 2019 Ververica 39
Selected Connectors
● Event logs:
○ Kafka, Kinesis, Pulsar*
● File systems:
○ S3, HDFS, NFS, MapR FS, …
● Encodings:
○ Avro, JSON, CSV, ORC, Parquet
● Databases:
○ JDBC, Hive
● Key-Value Stores
○ Cassandra, Elasticsearch, Redis*
* Connectors available as part of other projects.
© 2019 Ververica 40
Community
© 2019 Ververica 41
Development & Releases
● Apache Flink is developed by an open source community
○ Everybody is welcome to contribute.
● Fast development pace
○ Feature releases every 3-4 months
○ Bugfix releases more frequently as needed
1.7.0
11/2018
1.5.0
05/2018
1.5.1: 07/2018
1.5.2: 07/2018
1.5.3: 08/2018
1.5.4: 09/2018
1.5.5: 10/2018
1.6.0
08/2018
1.6.1: 09/2018
1.6.2: 10/2018
1.7.1: 12/2018
1.7.2: 02/2019
1.6.3: 12/2018
1.6.4: 02/2019
1.5.6: 12/2018
1.9.0
08/2019
1.8.0
04/2019
1.8.1: 07/2019
1.8.2: 09/2019
1.9.1: 10/2019
© 2019 Ververica 42
Growing & Active Community
● Flink’s community is very active and growing
● The community is answering many questions every day
○ In 2018, we had the most active user mailing lists of all 200+ ASF projects
○ ~4000 questions on Stack Overflow: [apache-flink], [flink-streaming], [flink-sql]
© 2019 Ververica 43
Roadmap & Future
© 2019 Ververica 44
Unified Batch and Stream Processing
● First OS system with a unified batch and stream processing engine
○ Based on a “true” streaming engine
● Porting DataSet API into DataStream API as “Bounded Streams”
● Why?
○ One engine to maintain and improve
○ One API for all use cases (incl. backfilling and state bootstrapping)
○ Competitive performance compared to best systems of each category
○ (Proving it’s possible)
© 2019 Ververica 45
SQL, Machine Learning & Notebooks
● Full-fletched Batch and Stream SQL engine
○ Full TPC-DS support
○ Batch queries with competitive performance
○ Continuous SQL queries over streaming data
● Python Table API
● Machine Learning, Data Exploration, and Notebook Support
● Integration with Hive ecosystem
© 2019 Ververica 46
API + Runtime for Stateful Applications
● Contribution of Stateful Functions API
○ Strongly consistent, stateful applications without transactional DBMS
○ Like Functions-as-a-Service + State
○ Arbitrary and reliable messaging between functions
● Unaligned Checkpoints to enable more fine-grained checkpoints
○ Faster checkpoints yield faster recovery and tighter SLAs
© 2019 Ververica 47
Summary
● Flink powers the world’s most demanding stateful streaming
applications
● Scope of applications expands quickly beyond “classical streaming”
○ Batch SQL, ML, Python, interactive notebooks
○ Event-driven, stateful applications
● Large and helpful community
© 2019 Ververica 48
@VervericaDatawww.ververica.com
Follow me @twalthr (yes, without the e) and grab a Flink sticker!

More Related Content

What's hot

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondBowen Li
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 

What's hot (20)

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 

Similar to Stream processing with Apache Flink (Timo Walther - Ververica)

Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst PracticesKonstantin Knauf
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Noam Elfanbaum
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streamingdatamantra
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkAljoscha Krettek
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®Aljoscha Krettek
 
Tips and tricks for developing streaming and table connectors - Eron Wright,...
Tips and tricks for developing streaming and table connectors  - Eron Wright,...Tips and tricks for developing streaming and table connectors  - Eron Wright,...
Tips and tricks for developing streaming and table connectors - Eron Wright,...Flink Forward
 
Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...
Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...
Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...Flink Forward
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 

Similar to Stream processing with Apache Flink (Timo Walther - Ververica) (20)

Apache flink
Apache flinkApache flink
Apache flink
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache Flink
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Tips and tricks for developing streaming and table connectors - Eron Wright,...
Tips and tricks for developing streaming and table connectors  - Eron Wright,...Tips and tricks for developing streaming and table connectors  - Eron Wright,...
Tips and tricks for developing streaming and table connectors - Eron Wright,...
 
Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...
Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...
Tips and Tricks for Developing Streaming and table Connectors - Wron Wright, ...
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Sprint 15
Sprint 15Sprint 15
Sprint 15
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 

More from KafkaZone

Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)KafkaZone
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...KafkaZone
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)KafkaZone
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)KafkaZone
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at HotstarKafkaZone
 
Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)KafkaZone
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKafkaZone
 

More from KafkaZone (7)

Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 

Recently uploaded

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 

Recently uploaded (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 

Stream processing with Apache Flink (Timo Walther - Ververica)

  • 1. © 2019 Ververica 1 Apache Flink® An Introduction and Outlook into the Future Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. Timo Walther Follow me: @twalthr (yes, without the e)
  • 2. © 2019 Ververica 2 About me Timo Walther ● Apache Flink Committer and PMC Member ● Part of Flink since 2013 (before it was actually called "Flink") ● Software Engineer @ Ververica (formerly dataArtisans, now part of Alibaba Group)
  • 3. © 2019 Ververica 3 About Ververica Original creators of Apache Flink® Complete Stream Processing Infrastructure
  • 4. © 2019 Ververica 4 Ververica Platform
  • 5. © 2019 Ververica 5 This talk is about Apache Flink ● What is Flink? ● Use Cases & Users ● Stateful Stream Processing ● Event-Time Processing ● APIs ● Ecosystem ● Community ● Roadmap & Future
  • 6. © 2019 Ververica 6 What is Flink?
  • 7. © 2019 Ververica 7 Event Streams State (Event) Time Snapshots Core Building Blocks for Stream Processing real-time and replay complex business logic consistency with out-of-order data and late data forking / versioning / time-travel
  • 8. © 2019 Ververica 8 What is Apache Flink? Scalable embedded state Access at memory speed & scales with parallel operators.
  • 9. © 2019 Ververica 9 9 What is Apache Flink? Stateful computations over streams real-time and historic: fast, scalable, fault tolerant, event time, large state, exactly-once
  • 10. © 2019 Ververica 10 Flink Unifies Stream and Batch Processing ● Processes unbounded (stream) and bounded (batch) data ● Processes recorded (offline) and live (real-time) data ● Serves most streaming & batch use cases – Data Pipelines, Analytics, CEP, Event-driven Applications
  • 11. © 2019 Ververica 11 Consistency, Scale, Ecosystem ● Flexible and expressive APIs ● Guaranteed correctness ○ Exactly-once state consistency ○ Event-time semantics ● In-memory processing at massive scale ○ Runs on 10000s of cores ○ Manages 10s TBs of state ● Flexible deployments and large ecosystem ○ Kubernetes, YARN, Mesos, Docker, S3, HDFS, Kafka, Kinesis, …
  • 12. © 2019 Ververica 12 Use Case & Users
  • 13. © 2019 Ververica 13 Use Case: ETL and Data Pipelining ● Periodic ETL is the traditional approach ○ External tool periodically triggers ETL batch job ○ Also supported by Flink ● Data pipelines continuously move data ○ Ingestion with low latency ○ No external tool ○ No artificial data boundaries
  • 14. © 2019 Ververica 14 Use Case: Batch & Stream Analytics ● Batch analytics is great for ad-hoc queries ○ Queries change faster than data ○ Interactive analytics / prototyping ● Stream analytics continuously processes data ○ Data changes faster than queries ○ Live / low latency results ○ No Lambda architecture required!
  • 15. © 2019 Ververica 15 Use Case: Event-Driven Applications ● Traditional application design ○ Compute & data tier architecture ○ React to and process events ○ State is stored in (remote) database ● Event-driven application ○ State is maintained locally ○ Guaranteed consistency by periodic state checkpoints ○ Tight coupling of logic and data (microservice architecture) ○ Highly scalable design
  • 16. © 2019 Ververica 16 Powered By Apache Flink Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html
  • 17. © 2019 Ververica 17 Rapidly Growing Adoption Source: Qubole “2018 Survey of Big Data Trends and Challenges.” A survey among 400+ technology decisions makers about their big data projects. 125%
  • 18. © 2019 Ververica 18 Stateful Stream Processing
  • 19. © 2019 Ververica 19 Designing Applications as Data Flows ● Data Flows are a common programming abstraction. ● Events flow from operator to operator. ● Data Flows can be executed in parallelized. Src SnkMap User Function Window User Function keyBy
  • 20. © 2019 Ververica 20 What is State in a Streaming Application? ● Many functions are stateful ○ Streaming data arrives over time ○ Functions need to remember records or temporary results ● Any variable that lives across function invocations is state ● State must not be lost in case of a failure
  • 21. © 2019 Ververica 21 Maintaining and Checkpointing State ● Flink maintains state locally per task (in-mem / on-disk) ○ Fast access! ● State is periodically checkpointed to durable storage ○ A checkpoint is a consistent snapshot of the state of all tasks
  • 22. © 2019 Ververica 22 Checkpoint Consistency ● All tasks copy their state exactly! when they processed all events up to the same position in the input o State of source tasks includes current read position in input (e.g., Kafka offset) Task State (Read Position) Stateless Task Task State (Partial Aggregate)
  • 23. © 2019 Ververica 23 Recovery and Guaranteed Consistency ● Recovery is like loading a saved computer game. ● Flink recovers state with exactly-once consistency. ○ After a failure, the application is restarted. ○ All tasks load their state from the latest checkpoint. ○ The application continues as if the failure never happened.. Loading Game... Game saved! GAME OVER!
  • 24. © 2019 Ververica 24 Much More Than Just Exactly-Once Recovery! ● Suspend and resume applications ● Fix and upgrade applications ● Migrate applications to a different / upgraded cluster ● Scale applications in and out ● A/B test applications ● ...
  • 25. © 2019 Ververica 25 Event-Time Processing
  • 26. © 2019 Ververica 26 What is Time in a Streaming Application? ● Streaming data arrives over time. ● Many streaming computations are defined based on time. ○ “Count the number of records every 10 minutes.” ○ “Run some logic 1 hour after you saw this record.” ○ “Wait for 30 more seconds for data to arrive.” ● This raises some questions. ○ How does Flink measure time? ○ How does time relate to data?
  • 27. © 2019 Ververica 27 Event-Time and Processing-Time Event Generator ● Mobile App ● Webserver ● Sensor ● ... 12:00:01 11:59:56 11:58:37 Event with timestamp Processing-time job Event-time job 11:57:12 11:57:12 Application time driven by data Application time driven by machine clock
  • 28. © 2019 Ververica 28 What is Processing-Time? ● A record is processed based on the wall-clock time when it arrives. ● Results are inherently non-deterministic and depend on ○ Clocks, load, and processing speed of machines ○ Arrival / ingestion rate of data and possibly backpressure ○ ... ● Applications of processing-time ○ Does not work for recorded data. ○ Does not work for data that arrives out-of-order ○ Might be sufficient for approximate, low-latency results
  • 29. © 2019 Ververica 29 What is Event-Time? • A record is processed based on an embedded timestamp. ○ Timestamp typically denotes time when record was created. • The “current” time is determined by watermarks ○ A watermark is a special record with a timestamp w ○ Denotes that no more records with a time t <= w will arrive • Properties of event-time processing ○ Results are deterministic ○ Same semantics when processing recorded and live data ○ Can trade result latency for result completeness
  • 30. © 2019 Ververica 30 APIs
  • 31. © 2019 Ververica 31 Layered APIs
  • 32. © 2019 Ververica 32 SQL & Table API ● Unified APIs for streaming data and data at rest ○ Run the same query on batch and streaming data ○ ANSI SQL: No stream-specific syntax or semantics! ○ Many common stream analytics use cases supported SELECT userId, COUNT(*) AS cnt SESSION_START(clicktime, INTERVAL '30' MINUTE) FROM clicks GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId Count clicks per user and session (defined by 30 min. gap of inactivity).
  • 33. © 2019 Ververica 33 DataStream API ● Programs are composed as data flows ● Logic is implemented as custom user functions ○ map, flatMap, reduce, window aggregation, window join, asynchronous request function, … ● Data is processed as arbitrary Java/Scala objects ○ (Avro) POJOs, Tuple, Row
  • 34. © 2019 Ververica 34 DataStream API Example // a stream of website clicks DataStream<Click> clicks = ... DataStream<Tuple2<String, Long>> result = clicks // project clicks to userId and add a 1 for counting .map( // define function by implementing the MapFunction interface. new MapFunction<Click, Tuple2<String, Long>>() { @Override public Tuple2<String, Long> map(Click click) { return Tuple2.of(click.userId, 1L); } }) // key by userId (field 0) .keyBy(0) // define session window with 30 minute gap .window(EventTimeSessionWindows.withGap(Time.minutes(30L))) // count clicks per session. Define function as lambda function. .reduce((a, b) -> Tuple2.of(a.f0, a.f1 + b.f1)); Count clicks per user and session (defined by 30 min. gap of inactivity). Same use case as previous SQL query.
  • 35. © 2019 Ververica 35 ProcessFunctions ● Flink’s most expressive function interfaces ○ Expose access to State and Time ○ Are embedded in DataStream programs ● Enable powerful applications ○ Put events or intermediate results into state for future computations ○ Register timers to be called back once “time is up” ● A collection of multiple function interfaces ○ 1 input, 1 windowed input, 2 key-partitioned inputs, 2 broadcasted/forwarded inputs, ...
  • 36. © 2019 Ververica 36 DSL & Libraries ● Stateful Functions ○ API to build lightweight, stateful, and strongly consistent applications. ○ Apps are composed of stateful functions that can arbitrary message each other. ○ Contribution in progress ● DataSet API for batch processing ○ Flink is a great batch processing engine! ○ Process data in binary representation in managed memory. ● CEP Library for complex event processing ○ Detect patterns in event streams.
  • 37. © 2019 Ververica 37 Ecosystem
  • 38. © 2019 Ververica 38 Framework & Library Deployments Framework Deployment Library Deployment
  • 39. © 2019 Ververica 39 Selected Connectors ● Event logs: ○ Kafka, Kinesis, Pulsar* ● File systems: ○ S3, HDFS, NFS, MapR FS, … ● Encodings: ○ Avro, JSON, CSV, ORC, Parquet ● Databases: ○ JDBC, Hive ● Key-Value Stores ○ Cassandra, Elasticsearch, Redis* * Connectors available as part of other projects.
  • 40. © 2019 Ververica 40 Community
  • 41. © 2019 Ververica 41 Development & Releases ● Apache Flink is developed by an open source community ○ Everybody is welcome to contribute. ● Fast development pace ○ Feature releases every 3-4 months ○ Bugfix releases more frequently as needed 1.7.0 11/2018 1.5.0 05/2018 1.5.1: 07/2018 1.5.2: 07/2018 1.5.3: 08/2018 1.5.4: 09/2018 1.5.5: 10/2018 1.6.0 08/2018 1.6.1: 09/2018 1.6.2: 10/2018 1.7.1: 12/2018 1.7.2: 02/2019 1.6.3: 12/2018 1.6.4: 02/2019 1.5.6: 12/2018 1.9.0 08/2019 1.8.0 04/2019 1.8.1: 07/2019 1.8.2: 09/2019 1.9.1: 10/2019
  • 42. © 2019 Ververica 42 Growing & Active Community ● Flink’s community is very active and growing ● The community is answering many questions every day ○ In 2018, we had the most active user mailing lists of all 200+ ASF projects ○ ~4000 questions on Stack Overflow: [apache-flink], [flink-streaming], [flink-sql]
  • 43. © 2019 Ververica 43 Roadmap & Future
  • 44. © 2019 Ververica 44 Unified Batch and Stream Processing ● First OS system with a unified batch and stream processing engine ○ Based on a “true” streaming engine ● Porting DataSet API into DataStream API as “Bounded Streams” ● Why? ○ One engine to maintain and improve ○ One API for all use cases (incl. backfilling and state bootstrapping) ○ Competitive performance compared to best systems of each category ○ (Proving it’s possible)
  • 45. © 2019 Ververica 45 SQL, Machine Learning & Notebooks ● Full-fletched Batch and Stream SQL engine ○ Full TPC-DS support ○ Batch queries with competitive performance ○ Continuous SQL queries over streaming data ● Python Table API ● Machine Learning, Data Exploration, and Notebook Support ● Integration with Hive ecosystem
  • 46. © 2019 Ververica 46 API + Runtime for Stateful Applications ● Contribution of Stateful Functions API ○ Strongly consistent, stateful applications without transactional DBMS ○ Like Functions-as-a-Service + State ○ Arbitrary and reliable messaging between functions ● Unaligned Checkpoints to enable more fine-grained checkpoints ○ Faster checkpoints yield faster recovery and tighter SLAs
  • 47. © 2019 Ververica 47 Summary ● Flink powers the world’s most demanding stateful streaming applications ● Scope of applications expands quickly beyond “classical streaming” ○ Batch SQL, ML, Python, interactive notebooks ○ Event-driven, stateful applications ● Large and helpful community
  • 48. © 2019 Ververica 48 @VervericaDatawww.ververica.com Follow me @twalthr (yes, without the e) and grab a Flink sticker!