http://flink-forward.org/kb_sessions/dynamic-scaling-how-apache-flink-adapts-to-changing-workloads/
Modern stream processing engines not only have to process millions of events per second at sub-second latency but also have to cope with constantly changing workloads. Due to the dynamic nature of stream applications where the number of incoming events can strongly vary with time, systems cannot reliably predetermine the amount of required resources. In order to meet guaranteed SLAs as well as utilizing system resources as efficiently as possible, frameworks like Apache Flink have to adapt their resource consumption dynamically. In this talk, we will take a look under the hood and explain how Flink scales stateful application in and out. Starting with the concept of key groups and partionable state, we will cover ways to detect bottlenecks in streaming jobs and discuss efficient strategies how to scale out operators with minimal down-time.
10. Keyed vs. Non-keyed State
10
• State bound to a key
• E.g. Keyed UDF and window state
• State bound to a subtask
• E.g. Source state
Keyed Non-keyed
11. Repartitioning Keyed State
§ Similar to consistent
hashing
§ Split key space into
key groups
§ Assign key groups to
tasks
11
Key space
Key group #1 Key group #2
Key group #3Key group #4
12. Repartitioning Keyed State contd.
§ Rescaling changes
key group
assignment
§ Maximum parallelism
defined by #key
groups
12
13. Repartitioning Non-keyed state
§ User defined merge and
split functions
• Most general approach
§ Breaking non-keyed state
up into finer granularity
• State has to contain
multiple entries
• Automatic repartitioning
wrt granularity
13
#1 #2
#3
14. Repartitioning Non-keyed State contd.
§ Non-keyed state entries gathered at the
job manager
§ Repartitioning schemes
• Repartition & send
• Union & broadcast
14
15. Example: Kafka Source
15
partitionId: 1, offset: 42
partitionId: 3, offset: 10
partitionId: 6, offset: 27
• Store offset for each partition
• Individual entries are repartitionable
partitionId: 1, offset: 42
partitionId: 3, offset: 10
partitionId: 6, offset: 27
16. Rescaling: Why is That so Hard?
§ Handling of state
§ Repartitioning of keyed & non-keyed
state
§ Unique among open source stream
processors, afaik
16
20. Current State
§ Manual rescaling
1. Take savepoint
2. Stop the job
3. Restart job with adjusted parallelism and
savepoint
20
21. Next Steps
§ Integrate savepoint with stop signal
§ Rescaling individual operators w/o restart
§ Dynamic container de-/allocation
• “Running Flink Everywhere” by Stephan
Ewen, 16:45 at Kesselhaus
21
22. Auto Scaling Policies
22
• Latency
• Throughput
• Resource utilization
• Kubernetes on GCE, EC2 and Mesos (marathon-
autoscale) already support auto-scaling
23. Conclusion
§ Scaling of keyed and non-keyed state
§ Flink supports manual rescaling with
restart
(WIP branch: https://github.com/tillrohrmann/flink/tree/partitionable-op-state)
§ Future versions might support scaling on
the fly and automatic rescaling policies
23