1
Aljoscha Krettek
@aljoscha
ApacheCon North America
May, 2017
Apache Flink® and IoT: How
Stateful Event-Time Processing
E...
What I’d Like to Talk About
2
§ IoT and event-time stream processing
§ Stateful stream processing
§ Streaming architecture...
3
Original creators of
Apache Flink®
Providers of the
dA Platform, a supported
Flink distribution
IoT and Event-time Stream
Processing
4
Example Event Sources
5
A Simple Definition
6
IoT use cases from the system’s
perspective:
A large number of (distributed) things
continuously gen...
IoT: Some Insights
7
§ Data is continuously produced
→ Stream Processing
§ Events have a timestamp
→ Event-time based proc...
What Is Event-Time Processing
8
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode...
What Is Event-Time Processing
9
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
What’s The Problem?
10
13
12
735961112
1234567891011121314
Processing Time
Processing-Time Windows 137356
12 137 356Event-...
Sources of Time Mismatch
§ Big Mismatch
• Network disconnects
• Slow network
§ Small Mismatch
• The nature of distributed ...
Small Event-Time Mismatch
12
Robust Stream Processing with Apache Flink®:
A Simple Walkthrough
http://data-artisans.com/ro...
13
14
15
Recap: Event-Time
§ IoT use cases need event-time
processing
§ Even small mismatch of event
time/processing time will lead...
(Stateful) Stream Processing
17
Stream Processing
18
Computation
Computations on
never-ending
“streams” of data
records (“events”)
Distributed Stream Processing
19
Computation
Computation
spread across
many
machines
Computation Computation
Stateful Stream Processing
20
Computation
State
State is usually
partitioned by
some key in
the data
Stateful Stream Processing II
21
§ Result depends on history of stream
§ A stateful stream processor should
gives the tool...
22
app state
app state
app state
event log
Query
service
Recap: Stateful Streams
§ Continuous processing of data that is
continuously generated
§ I.e., pretty much all “big” data
...
Operational Issues
24
Operational Questions
§ What happens in case of failures?
§ What if I need to update my code/Flink?
§ Can I re-process my ...
Failure Handling
§ JobManager High-Availability using
ZooKeeper
§ Periodic checkpoints of state to
persistent storage (HDF...
Savepoints
§ A persistent snapshot of all state
§ When starting an application, state can
be initialized from a savepoint
...
Closing
28
TL;DR
§ Stateful stream processing is nice 😎
§ IoT use cases require proper time
management
§ Apache Flink is a stateful s...
3
Thank you!
@aljoscha
@ApacheFlink
@dataArtisans
Backup Slides
31
Event-Time Processing
32
What Is Event-Time Processing
33
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episod...
What Is Event-Time Processing
34
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
What is Event-Time Streaming
§ Events have timestamps
§ Processing depends on
timestamps
§ An event-time stream
processor ...
Recap: Event-Time
§ IoT use cases need event-time
processing
§ Even small mismatch of event
time/processing time will lead...
History of Flink
37
A brief History of Flink
38
January ‘10 December ‘14
v0.5 v0.6 v0.7
March ‘16
Flink Project
Incubation
Top Level
Project
v...
A brief History of Flink
39
January ‘10 December ‘14
v0.5 v0.6 v0.7
March ‘16
Flink Project
Incubation
Top Level
Project
v...
Nächste SlideShare
Wird geladen in …5
×

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

252 Aufrufe

Veröffentlicht am

Apache Flink® is an open-source stream processing framework for distributed and accurate data streaming applications. An increasing number of IoT use cases will (and some already do) require robust processing frameworks that can handle an ever-increasing amount of data and provide insights in real time. Apache Flink is one of the contenders for the top spot among such frameworks and in this presentation Aljoscha Krettek will highlight some of the properties that make Flink so well suited for IoT use cases: We will first learn what stream processing frameworks in general provide before diving into stateful stream processing and event-time based stream-processing. We will see why these two features are important for IoT scenarios and also why they, together with Flink’s robust handling of failures, enable accurate and robust analytics on real-time streaming data.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

  1. 1. 1 Aljoscha Krettek @aljoscha ApacheCon North America May, 2017 Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics
  2. 2. What I’d Like to Talk About 2 § IoT and event-time stream processing § Stateful stream processing § Streaming architecture and Flink
  3. 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  4. 4. IoT and Event-time Stream Processing 4
  5. 5. Example Event Sources 5
  6. 6. A Simple Definition 6 IoT use cases from the system’s perspective: A large number of (distributed) things continuously generating a large amount of data.
  7. 7. IoT: Some Insights 7 § Data is continuously produced → Stream Processing § Events have a timestamp → Event-time based processing § Data/Events can arrive with huge delays/out-of-order § Most analyses happen on time windows
  8. 8. What Is Event-Time Processing 8 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  9. 9. What Is Event-Time Processing 9 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
  10. 10. What’s The Problem? 10 13 12 735961112 1234567891011121314 Processing Time Processing-Time Windows 137356 12 137 356Event-Time Windows 12 1112 Mismatch between event time and processing time.
  11. 11. Sources of Time Mismatch § Big Mismatch • Network disconnects • Slow network § Small Mismatch • The nature of distributed systems • Differing system clock time 11
  12. 12. Small Event-Time Mismatch 12 Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. Recap: Event-Time § IoT use cases need event-time processing § Even small mismatch of event time/processing time will lead to wrong results 16
  17. 17. (Stateful) Stream Processing 17
  18. 18. Stream Processing 18 Computation Computations on never-ending “streams” of data records (“events”)
  19. 19. Distributed Stream Processing 19 Computation Computation spread across many machines Computation Computation
  20. 20. Stateful Stream Processing 20 Computation State State is usually partitioned by some key in the data
  21. 21. Stateful Stream Processing II 21 § Result depends on history of stream § A stateful stream processor should gives the tools to manage state • Recover, roll back, version upgrade, etc.
  22. 22. 22 app state app state app state event log Query service
  23. 23. Recap: Stateful Streams § Continuous processing of data that is continuously generated § I.e., pretty much all “big” data § It’s all about state and time § Flink does all of that 23
  24. 24. Operational Issues 24
  25. 25. Operational Questions § What happens in case of failures? § What if I need to update my code/Flink? § Can I re-process my data? § How can I execute my programs? 25
  26. 26. Failure Handling § JobManager High-Availability using ZooKeeper § Periodic checkpoints of state to persistent storage (HDFS, S3, …) § In case of failure: rollback to previous consistent state 26
  27. 27. Savepoints § A persistent snapshot of all state § When starting an application, state can be initialized from a savepoint § In-between savepoint and restore we can update Flink version or user code 27
  28. 28. Closing 28
  29. 29. TL;DR § Stateful stream processing is nice 😎 § IoT use cases require proper time management § Apache Flink is a stateful stream processor with plenty of nifty features 29
  30. 30. 3 Thank you! @aljoscha @ApacheFlink @dataArtisans
  31. 31. Backup Slides 31
  32. 32. Event-Time Processing 32
  33. 33. What Is Event-Time Processing 33 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  34. 34. What Is Event-Time Processing 34 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
  35. 35. What is Event-Time Streaming § Events have timestamps § Processing depends on timestamps § An event-time stream processor should give you the tools to reason about time • Handle streams that are out of order 35 Your code state t3 t1 t2t4 t1-t2 t3-t4
  36. 36. Recap: Event-Time § IoT use cases need event-time processing § Even small mismatch of event time/processing time will lead to wrong results 36
  37. 37. History of Flink 37
  38. 38. A brief History of Flink 38 January ‘10 December ‘14 v0.5 v0.6 v0.7 March ‘16 Flink Project Incubation Top Level Project v0.8 v0.10 Release 1.0 Project Stratosphere (Flink precursor) v0.9 April ‘14
  39. 39. A brief History of Flink 39 January ‘10 December ‘14 v0.5 v0.6 v0.7 March ‘16 Flink Project Incubation Top Level Project v0.8 v0.10 Release 1.0 Project Stratosphere (Flink precursor) v0.9 April ‘14 The academia gap: Reading/writing papers, teaching, worrying about thesis Realizing this might be interesting to people beyond academia (even more so, actually)

×