This document discusses Apache Flink for IoT event-time stream processing. It begins by introducing streaming architectures and Flink. It then discusses how IoT data has important properties like continuous data production and event timestamps that require event-time based processing. Examples are provided of companies like King and Bouygues Telecom using Flink for billions of events per day with challenges like out-of-order data and flexible windowing. Event-time processing in Flink is able to handle these challenges through features like watermarks.
6. Rethinking Data Architecture
§ Better app isolation
§ Real-time reaction to events
§ Robust continuous applications
§ Process both real-time and historical data
5
8. What is (Distributed) Streaming
§ Streaming:
Computations on never-
ending “streams” of data
records (“events”)
§ Distributed:
Computation spread
across many machines
7
Your
code
Your
code
Your
code
Your
code
9. What is Stateful Streaming
§ Computation and state
• E.g., counters, windows of past
events, state machines, trained ML
models
§ Result depends on history of
stream
§ A stateful stream processor
should gives the tools to manage
state
• Recover, roll back, version,
upgrade, etc
8
Your
code
state
10. What is Event-Time Streaming
§ Data records associated with
timestamps (time series data)
§ Processing depends on timestamps
§ An event-time stream processor
should give you the tools to reason
about time
• Handle streams that are out of order
• Core feature is watermarks – a clock
to measure event time
9
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4
11. Recap: What is Streaming?
§ Continuous processing on data that is
continuously generated
§ I.e., pretty much all “big” data
§ It’s all about state and time
§ Flink does all of what we just saw
10
15. A Simple Definition
14
IoT use cases from the system’s
perspective:
A large number of (distributed) things
generating a large amount of data.
16. Important Properties
15
§ Data is continuously produced
→ Stream Processing
§ Events have a timestamp that has to be
considered
→ Event-time based processing
§ Data/Events can arrive with huge delays
§ Most analyses happen on time windows
17. Remember: Streaming technology is
enabling the obvious: continuous
processing on data that is continuously
produced
Hint: you already have streaming data
16
18. What Is Event-Time Processing
17
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
20. Sources of Time Mismatch
§ Big Mismatch
• Network disconnects
• Slow network
§ Small Mismatch
• The nature of distributed systems
• Differing system clock time
19
21. Big Event-Time Mismatch
20
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
22. Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®:
A Simple Walkthrough
http://data-artisans.com/robust-stream-processing-flink-walkthrough/
28. 30 Flink applications in production for more than
one year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining
state of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
27
29. King
§ Challenges:
• Many games (Candy Crush, Farm Heroes, Pet
Rescue, and Bubble Witch…)
• 300 million monthly unique users
• 30 billion events received every day
§ Need Event-Time Based statistics
28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
31. Solution: RBEA
§ Multiplexing of multiple data scientist
requests into a single Flink job
§ Groovy as language for analysis scripts
§ Event-time windowing
30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
35. In Summary
34
§ If you need to ask: you already have a
streaming use case!
§ IoT requires Proper Time Management
§ Apache Flink has done that for a long
time now*
* Since version 0.10
37. 36
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward