Back to the program
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
Thursday 17th
from 18:00 to 18:40
Theatre 19
-
Keynote
In this talk I’ll give a very short introduction to stream processing in general and then dive into event-time based stream processing. I will outline how this is important for IoT applications and also why it is such a challenging topic. Afterwards we’ll look at some real-world IoT use cases that are enabled by the support for robust event-time based stream processing provided by Apache Flink™. We will especially focus on easy of use and on correctness of results in the face of errors.
In the first half of the talk we’ll cover the basics of stream processing. We will look at the differences between event-time based and processing-time and at stateful stream processing. While on this, we’ll also highlight how the combination of these features is essential for doing robust stream processing in an IoT setting.
In the second part, we will look at how Flink solves some of the challenges that arise in event-time based processing and how that enables novel applications in the IoT space. We will do the latter by looking at a collection of real-world IoT use cases.
Some of the topics covered will be:
- Apache Flink
- Stateful Stream Processing
- Event Time vs. Processing Time Windowing
- Processing of out-of-order events
- IoT use cases
5. Big Data Architecture
Collect events in HDFS (or similar)
Periodically run (batch) jobs to process
Problems:
• Huge latency
• Natural boundaries in data don’t match batch
boundaries
5
6. Rethinking Data Architecture
Real-time reaction to events
Continuous applications
Process both real-time and historical data
6
7. What is (Distributed) Streaming
Streaming:
Computations on never-
ending “streams” of data
records (“events”)
Distributed:
Computation spread
across many machines
7
Your
code
Your
code
Your
code
Your
code
8. What is Stateful Streaming
Result depends on history
of stream
A stateful stream
processor should gives
the tools to manage state
• Recover, roll back, version,
upgrade, etc
8
Your
code
state
9. What is Event-Time Streaming
Events have timestamps
Processing depends on
timestamps
An event-time stream
processor should give you the
tools to reason about time
• Handle streams that are out of
order
9
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4
11. Recap: What is Streaming?
Continuous processing of data that is
continuously generated
I.e., pretty much all “big” data
It’s all about state and time
Flink does all of that
11
15. A Simple Definition
15
IoT use cases from the system’s
perspective:
A large number of (distributed) things
continuously generating a large amount
of data.
16. IoT: Some Insights
16
Data is continuously produced
→ Stream Processing
Events have a timestamp
→ Event-time based processing
Data/Events can arrive with huge
delays/out-of-order
Most analyses happen on time windows
17. What Is Event-Time Processing
17
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
18. What Is Event-Time Processing
18
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
20. Sources of Time Mismatch
Big Mismatch
• Network disconnects
• Slow network
Small Mismatch
• The nature of distributed systems
• Differing system clock time
20
21. Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®:
A Simple Walkthrough
http://data-artisans.com/robust-stream-processing-flink-walkthrough/
27. 30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
27
28. King
Challenges:
• Many games (Candy Crush, Farm Heroes,
Pet Rescue, and Bubble Witch…)
• 300 million monthly unique users
• 30 billion events received every day
Need event-time based statistics
28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
30. Solution: RBEA
Multiplexing of multiple data scientist
requests into a single Flink job
Groovy as language for analysis scripts
Event-time windowing
30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
34. In Summary
34
If you need to ask: you already have a
streaming use case!
IoT requires Proper Time Management
Apache Flink has done that for a long time
now*
* Since version 0.10
36. 36
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward