Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Till Rohrmann
till@data-artisans.com
@stsffap
From Apache Flink®
1.3 to 1.4
2
Original creators of Apache
Flink®
Providers of
dA Platform 2, including
open source Apache Flink +
dA Application Manag...
Overview
Apache Flink 1.3 – Previously on Apache
Flink
Apache Flink 1.4 – What’s happening now?
Apache Flink 1.5+ – Nex...
Previously on Apache Flink
Apache Flink 1.3
Apache Flink 1.3 in Numbers
141 contributors (no deduplication)
1400 commits
>= 680 resolved JIRA issues
+261813 / -65...
Evolution of Flink’s API
6
Flink 1.0.0
State API (ValueState
ReducingState, ListState)
Flink 1.1.0
Session Windows
Late ar...
Side Outputs
 Additional outputs for a stream
 Late events
 Corrupted input data
 More expressive APIs
 FLINK-4460
7
...
Side Outputs: Example
8
DataStream<Integer> input = ...;
final OutputTag<String> outputTag = new OutputTag<String>("side-o...
Evolution of Large State Handling
9
Flink 1.0.0
RocksDB for out-of-core
state support
Flink 1.1.0
Fully async RocksDB
snap...
G
H
C
D
Full Checkpoints
10
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
@t1 @t2 @t3
A
F
C
D
E
G
H...
G
H
C
D
Incremental Checkpoints
11
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
E
F
G
H
I
@t1 @t2 ...
Incremental Checkpoints
12
Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4
C1 C3C1 C1
Chunk
1
Chunk
2
Chunk
3
Chunk
4
...
Incremental Checkpointing Contd.
Currently supported for RocksDB
state backend
FLINK-5053
Faster and smaller checkpoint...
Evolution of High Level APIs
14
Flink 1.0.0
CEP library added
Table API v1
Flink 1.1.0
Table API overhaul
Integration with...
Enriched CEP Language
Support for quantifiers (+, *, ?)
FLINK-3318
Iterative conditions
FLINK-6197
Not operator
FLIN...
CEP: Detect Dipping Stocks
16
DataStream<Stock> stocks = …;
Pattern<Stock, ?> pattern = Pattern
.<Stock>begin("rising")
.w...
What’s Happening Now?
Apache Flink 1.4
Event Driven I/O
18
Rework of Flink’s network stack
Event driven network I/O
Use full available capacity
Near perfect ...
Flow Control
 Flow control for TaskManager communication
 Single channel no longer stalls other
multiplexed channels
 F...
New Deployment Model
Rework of Flink’s distributed
architecture
Ready for multitude of
deployment scenarios
Support for...
Producing Exactly Once with Kafka 0.11
Support for Kafka 0.11
First Kafka producer with
exactly once processing
guarante...
StreamSQL and Table API
Support for retractions
Extended aggregation support
Support for external table
catalogs
Windo...
Operational Robustness
Drop Java 7
Support Scala 2.12
Avoid dependency hell
Child first class loading
Relocation of
d...
Next on Apache Flink
Apache Flink 1.5+
Side Inputs
 Additional input for operator
 Join with static data set
 Feeding of externally trained ML model
 Window ...
State Management & Evolution
Eager state declaration
State type, serializer and name
known at pre-flight time
Flip-22 d...
State Replication
Replicate state between
TaskManagers
Faster recovery in
case of failures
High throughput
queryable st...
Programmatic Job Control
Improve client to give better job control
Run concurrent jobs from the same
program
Trigger sa...
JobClient & ClusterClient
29
StreamExecutionEnvironment env = ...;
// define program
JobClient jobClient = env.execute();
...
TL;DL
Apache Flink one of the most innovative open
source stream processing platforms
Stay tuned what’s happening next ...
31
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
We are hiring!
data-artisans.com/careers
32
Nächste SlideShare
Wird geladen in …5
×

From Apache Flink® 1.3 to 1.4

1.647 Aufrufe

Veröffentlicht am

With Flink 1.3 being released, the Flink community is already working towards the upcoming release 1.4. Given Flink's high development pace, which manifested in Flink 1.3 being one of the feature-wise biggest releases in its recent history, it becomes more and more difficult to keep track of all development threads. Moreover, it requires more effort to learn about newly added features and which value they provide for your application.

In this talk, I want to present and explain some of Flink's latest features, including incremental checkpointing, fine grained recovery, side outputs and many more. Furthermore, I want to put them in perspective with respect to Flink's future direction by giving some insights into ongoing development threads in the community. Thereby, I intend to give attendees a better picture about Flink's current and future capabilities.

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

From Apache Flink® 1.3 to 1.4

  1. 1. Till Rohrmann till@data-artisans.com @stsffap From Apache Flink® 1.3 to 1.4
  2. 2. 2 Original creators of Apache Flink® Providers of dA Platform 2, including open source Apache Flink + dA Application Manager
  3. 3. Overview Apache Flink 1.3 – Previously on Apache Flink Apache Flink 1.4 – What’s happening now? Apache Flink 1.5+ – Next on Apache Flink 3
  4. 4. Previously on Apache Flink Apache Flink 1.3
  5. 5. Apache Flink 1.3 in Numbers 141 contributors (no deduplication) 1400 commits >= 680 resolved JIRA issues +261813 / -65646 LOC 5
  6. 6. Evolution of Flink’s API 6 Flink 1.0.0 State API (ValueState ReducingState, ListState) Flink 1.1.0 Session Windows Late arriving events Flink 1.2.0 ProcessFunction (access to state, timers, events) Flink 1.3.0 Side outputs Access to per-window state
  7. 7. Side Outputs  Additional outputs for a stream  Late events  Corrupted input data  More expressive APIs  FLINK-4460 7 Process Function Main output Side output
  8. 8. Side Outputs: Example 8 DataStream<Integer> input = ...; final OutputTag<String> outputTag = new OutputTag<String>("side-output"){}; SingleOutputStreamOperator<Integer> mainDataStream = input .process(new ProcessFunction<Integer, Integer>() { @Override public void processElement( Integer value, Context ctx, Collector<Integer> out) throws Exception { // emit data to regular output out.collect(value); // emit data to side output ctx.output(outputTag, "sideout-" + String.valueOf(value)); } }); DataStream<String> sideOutputStream = mainDataStream.getSideOutput(outputTag);
  9. 9. Evolution of Large State Handling 9 Flink 1.0.0 RocksDB for out-of-core state support Flink 1.1.0 Fully async RocksDB snapshots Flink 1.2.0 Rescalable keyed and non-partitioned state Flink 1.3.0 Incremental checkpoints Fine-grained recovery
  10. 10. G H C D Full Checkpoints 10 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E @t1 @t2 @t3 A F C D E G H C D I E
  11. 11. G H C D Incremental Checkpoints 11 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E E F G H I @t1 @t2 @t3
  12. 12. Incremental Checkpoints 12 Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4 C1 C3C1 C1 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Storage C2 C4C3
  13. 13. Incremental Checkpointing Contd. Currently supported for RocksDB state backend FLINK-5053 Faster and smaller checkpoints 13 Full checkpoint Incremental checkpoint Size 60 GB 1 – 30 GB Time 180 s 3 – 30 s “A Look at Flink’s Internal Data Structures and Algorithms for Efficient Checkpointing” by Stefan Richter, Tomorrow @ 12:20 pm Maschinenhaus
  14. 14. Evolution of High Level APIs 14 Flink 1.0.0 CEP library added Table API v1 Flink 1.1.0 Table API overhaul Integration with Apache Calcite Flink 1.2.0 Tumbling, sliding and session group-windows for Table API Flink 1.3.0 Rescalable CEP operators Retractions in Table API/SQL
  15. 15. Enriched CEP Language Support for quantifiers (+, *, ?) FLINK-3318 Iterative conditions FLINK-6197 Not operator FLINK-3320 15 “Complex Event Processing With Flink: The State of FlinkCEP” by Kostas Kloudas, Today @ 2:30 pm Maschinenhaus
  16. 16. CEP: Detect Dipping Stocks 16 DataStream<Stock> stocks = …; Pattern<Stock, ?> pattern = Pattern .<Stock>begin("rising") .where(new IterativeCondition<Stock>() { @Override public boolean filter(Stock stock, Context<Stock> ctx) throws Exception { // calculate the average price double sum = 0.0; int count = 0; for (Stock previousStock : ctx.getEventsForPattern("rising")) { sum += previousStock.getPrice(); count++; } // only accept if the price is higher or equal than the average price return stock.getPrice() >= sum / count; }).oneOrMore() .next("falling"); PatternStream<Stock> dippingStocks = new PatternStream<>(stocks.keyBy("name"), pattern); DataStream<String> namesOfDippingStocks = dippingStocks.select(…);
  17. 17. What’s Happening Now? Apache Flink 1.4
  18. 18. Event Driven I/O 18 Rework of Flink’s network stack Event driven network I/O Use full available capacity Near perfect latency behaviour TCP Buffer capacity left flush
  19. 19. Flow Control  Flow control for TaskManager communication  Single channel no longer stalls other multiplexed channels  Fine-grained backpressure control  Improves checkpoint alignments 19 “Building a Network Stack for Optimal Throughput / Low-Latency Trade-Offs” by Nico Kruber, Today @ 2:00 pm Palais Atelier Receiver Sender #1 Sender #2 Give credit Send credited data
  20. 20. New Deployment Model Rework of Flink’s distributed architecture Ready for multitude of deployment scenarios Support for dynamic scaling 20 “Flink in Containerland” by Patrick Lucas, Tomorrow @ 3:20 pm Maschinenhaus
  21. 21. Producing Exactly Once with Kafka 0.11 Support for Kafka 0.11 First Kafka producer with exactly once processing guarantees 21 “Hit Me, Baby, Just One Time – Building End-to-End Exactly Once Applications With Flink” by Piotr Nowojski, Today @ 3:20 pm Palais Atelier Consuming Producing End-to-End exactly once processing
  22. 22. StreamSQL and Table API Support for retractions Extended aggregation support Support for external table catalogs Window joins 22 “Unified Stream and Batch Processing With Apache Flink’s Relational APIs” by Fabian Hüske, Tomorrow @ 11:00 am Kesselhaus “From Streams to Tables and Back Again: A Demo of Flink’s Table & SQL API” by Timo Walther, Tomorrow @ 11:50 am Kesselhaus
  23. 23. Operational Robustness Drop Java 7 Support Scala 2.12 Avoid dependency hell Child first class loading Relocation of dependencies De-Hadoopification 23
  24. 24. Next on Apache Flink Apache Flink 1.5+
  25. 25. Side Inputs  Additional input for operator  Join with static data set  Feeding of externally trained ML model  Window joins  Flip-17 design document: https://goo.gl/W4yMEu 25 Process Function Main input Side input
  26. 26. State Management & Evolution Eager state declaration State type, serializer and name known at pre-flight time Flip-22 design document: https://goo.gl/trFiSi Evolving existing state Schema updates Serializer upgrades 26 “Managing State in Apache Flink” by Tzu-Li Tai, Today @ 4:30 pm Kesselhaus
  27. 27. State Replication Replicate state between TaskManagers Faster recovery in case of failures High throughput queryable state 27 TaskManager TaskManager Change log stream Input State
  28. 28. Programmatic Job Control Improve client to give better job control Run concurrent jobs from the same program Trigger savepoints programmatically Better testing facilities 28
  29. 29. JobClient & ClusterClient 29 StreamExecutionEnvironment env = ...; // define program JobClient jobClient = env.execute(); CompletableFuture<Acknowledge> savepointFuture = jobClient.takeSavepoint(savepointPath); // wait for the savepoint completion savepointFuture.get(); CompletableFuture<JobExecutionResult> resultFuture = jobClient.getResultFuture(); // cancel the job jobClient.cancelJob(); // get the execution result --> should be canceled JobExecutionResult result = resultFuture.get(); // get list of all still running jobs on the cluster ClusterClient clusterClient = jobClient.getClusterClient(); CompletableFuture<List<JobInfo>> jobInfosFuture = clusterClient.getJobInfos(); List<JobInfo> jobInfos = jobInfosFuture.get();
  30. 30. TL;DL Apache Flink one of the most innovative open source stream processing platforms Stay tuned what’s happening next  Visit the in depths talks to learn more about Flink’s internals 30
  31. 31. 31 Thank you! @stsffap @ApacheFlink @dataArtisans
  32. 32. We are hiring! data-artisans.com/careers 32

×