SlideShare ist ein Scribd-Unternehmen logo
1 von 60
1
Stefan Richter

@StefanRRichter



September 13, 2017
A Look at Flink’s Internal Data
Structures for Efficient
Checkpointing
Flink State and Distributed Snapshots
2
State
Backend
Stateful

Operation
Source
Event
3
Trigger checkpoint
Inject checkpoint barrier
Stateful

Operation
Source
Flink State and Distributed Snapshots
„Asynchronous Barrier Snapshotting“
4
Take state snapshot
Synchronously trigger
state snapshot (e.g.
copy-on-write)
Flink State and Distributed Snapshots
Stateful

Operation
Source
„Asynchronous Barrier Snapshotting“
5
DFS
Processing pipeline continues
Durably persist

full snapshots

asynchronously
For each state backend:
one writer, n reader
Flink State and Distributed Snapshots
Stateful

Operation
Source
Challenges for Data Structures
▪ Asynchronous Checkpoints:
▪ Minimize pipeline stall time while taking
the snapshot.
▪ Keep overhead (memory, CPU,…) as low
as possible while writing the snapshot.
▪ Support multiple parallel checkpoints.
6
7
DFS
Durably persist

incremental
snapshots

asynchronously
Processing pipeline continues
ΔΔ
(differences from
previous snapshots)
Flink State and Distributed Snapshots
Stateful

Operation
Source
Full Snapshots
8
K S
2 B
4 W
6 N
K S
2 B
3 K
4 L
6 N
K S
2 Q
3 K
6 N
9 S
K S
2 B
4 W
6 N
K S
2 B
3 K
4 L
6 N
K S
2 Q
3 K
6 N
9 S
Checkpoint 1 Checkpoint 2 Checkpoint 3
time
State
Backend
(Task-local)
Checkpoint
Directory
(DFS)
Updates Updates
Checkpoint
Directory
(DFS)
Incremental Snapshots
9
K S
2 B
4 W
6 N
K S
2 B
3 K
4 L
6 N
K S
2 Q
3 K
6 N
9 S
K S
2 B
4 W
6 N
K S
3 K
4 L
K S
2 Q
4 -
9 S
iCheckpoint 1 iCheckpoint 2 iCheckpoint 3
Δ(-, icp1)
Δ(icp1, icp2)
Δ(icp2, icp3)
time
State
Backend
(Task-local)
Updates Updates
5 TB 10 TB 15 TB
Incremental Full
Incremental vs Full Snapshots
10
Checkpoint
Duration
State Size
State
Backend
(Task-local)
Incremental Recovery
11
K S
2 B
4 W
6 N
K S
2 B
3 K
4 L
6 N
K S
2 Q
3 K
6 N
9 S
K S
2 B
4 W
6 N
K S
3 K
4 L
K S
2 Q
4 -
9 S
time
Recovery?
+ +
iCheckpoint 1 iCheckpoint 2 iCheckpoint 3
Δ(-, icp1)
Δ(icp1, icp2)
Δ(icp2, icp3)
Checkpoint
Directory
(DFS)
Challenges for Data Structures
▪ Incremental checkpoints:
▪ Efficiently detect the (minimal) set of
state changes between two
checkpoints.
▪ Avoid unbounded checkpoint history.
12
Two Keyed State Backends
13
HeapKeyedStateBackend RocksDBKeyedStateBackend
• State lives in memory, on Java heap.
• Operates on objects.
• Think of a hash map {key obj -> state obj}.
• Goes through ser/de during snapshot/restore.
• Async snapshots supported.
• State lives in off-heap memory and on disk.
• Operates on serialized bytes.
• Think of K/V store {key bytes -> state bytes}.
• Based on log-structured-merge (LSM) tree.
• Goes through ser/de for each state access.
• Async and incremental snapshots.
Asynchronous and Incremental Checkpoints
with the RocksDB Keyed-State Backend
14
What is RocksDB
▪ Ordered Key/Value store (bytes).
▪ Based on LSM trees.
▪ Embedded, written in C++, used via JNI.
▪ Write optimized (all bulk sequential), with
acceptable read performance.
15
RocksDB Architecture (simplified)
16
Memory
Local Disk
K: 2 V: T K: 8 V: W
Memtable
- Mutable Memory Buffer for K/V Pairs (bytes)
- All reads / writes go here first
- Aggregating (overrides same unique keys)
- Asynchronously flushed to disk when full
RocksDB Architecture (simplified)
17
Memory
Local Disk
K: 4 V: Q
Memtable
K: 2 V: T K: 8 V: WIdx
SSTable-(1)
BF-1
- Flushed Memtable becomes immutable
- Sorted By Key (unique)
- Read: first check Memtable, then SSTable
- Bloomfilter & Index as optimisations
RocksDB Architecture (simplified)
18
Memory
Local Disk
K: 2 V: A K: 5 V: N
Memtable
K: 4 V: Q K: 7 V: SIdx
SSTable-(2)
K: 2 V: T K: 8 V: WIdx
SSTable-(1)
K: 2 V: C K: 7 -Idx
SSTable-(3)
BF-1 BF-2 BF-3
- Deletes are explicit entries
- Natural support for snapshots
- Iteration: online merge
- SSTables can accumulate
RocksDB Compaction (simplified)
19
K: 2 V: C K: 7 - K: 4 V: Q K: 7 V: S K: 2 V: T K: 8 V: W
K: 2 V: C K: 4 V: Q K: 8 V: W
multiway
sorted
merge
„Log Compaction“
SSTable-(3) SSTable-(1)SSTable-(2)
SSTable-(1, 2, 3)
(rm)
RocksDB Asynchronous Checkpoint
▪ Flush Memtable. (Simplified)
▪ Then create iterator over all current
SSTables (they are immutable).
▪ New changes go to Memtable and future
SSTables and are not considered by
iterator.
20
RocksDB Incremental Checkpoint
▪ Flush Memtable.
▪ For all SSTable files in local working
directory: upload new files since last
checkpoint to DFS, re-reference other files.
▪ Do reference counting on JobManager for
uploaded files.
21
Incremental Checkpoints - Example
▪ Illustrate interactions (simplified) between:
▪ Local RocksDB instance
▪ Remote Checkpoint Directory (DFS)
▪ Job Manager (Checkpoint Coordinator)
▪ In this example: 2 latest checkpoints retained
22
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
merge
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
Incremental Checkpoints - Example
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
merge
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
Incremental Checkpoints - Example
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
merge
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
Incremental Checkpoints - Example
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
Incremental Checkpoints - Example
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
sstable-(1,2,3) sstable-(4,5,6)CP 4
merge
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
Incremental Checkpoints - Example
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
Incremental Checkpoints - Example
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
Incremental Checkpoints - Example
29
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
merge
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
Incremental Checkpoints - Example
30
sstable-(1,2,3)
sstable-(1) sstable-(2)
sstable-(1) sstable-(2) sstable-(3) sstable-(4)
sstable-(4) sstable-(5)
CP 1
CP 2
CP 3
merge
sstable-(1,2,3) sstable-(4,5,6)CP 4
merge
+sstable-(6)
sstable-(3)
sstable-(4)
sstable-(1)
sstable-(2)
sstable-(1,2,3)
sstable-(5)
sstable-(4,5,6)
CP 1
CP 2CP 2
CP 3
CP 4
TaskManager Local RocksDB Working Directory DFS Upload Job Manager
Summary RocksDB
▪ Asynchronous snapshots „for free“, immutable
copy is already there.
▪ Trivially supports multiple concurrent snapshots.
▪ Detection of incremental changes easy: observe
creation and deletion of SSTables.
▪ Bounded checkpoint history through compaction.
31
Asynchronous Checkpoints
with the Heap Keyed-State Backend
32
Heap Backend: Chained Hash Map
33
S1
K1
Entry
S2
K2
Entry
next
Map<K, S> {
Entry<K, S>[] table;
}
Entry<K, S> {
final K key;
S state;
Entry next;
}
Map<K, S>
34
S1
K1
Entry
S2
K2
Entry
next
Map<K, S> {
Entry<K, S>[] table;
}
Entry<K, S> {
final K key;
S state;
Entry next;
}
Map<K, S>
Heap Backend: Chained Hash Map
35
S1
K1
Entry
S2
K2
Entry
next
Map<K, S> {
Entry<K, S>[] table;
}
Entry<K, S> {
final K key;
S state;
Entry next;
}
Map<K, S>
Heap Backend: Chained Hash Map
36
S1
K1
Entry
S2
K2
Entry
next
Map<K, S> {
Entry<K, S>[] table;
}
Entry<K, S> {
final K key;
S state;
Entry next;
}
Map<K, S>
Heap Backend: Chained Hash Map
37
S1
K1
Entry
S2
K2
Entry
next
Map<K, S> {
Entry<K, S>[] table;
}
Entry<K, S> {
final K key;
S state;
Entry next;
}
Map<K, S>
„Structural
changes to map“
(easy to detect)
„User modifies
state objects“
(hard to
detect)
Heap Backend: Chained Hash Map
Copy-on-Write Hash Map
38
S1
K1
Entry
S2
K2
Entry
next
Map<K, S>
„Structural
changes to map“
(easy to detect)
„User modifies
state objects“
(hard to
detect)
Map<K, S> {
Entry<K, S>[] table;
int mapVersion;
int requiredVersion;
OrderedSet<Integer> snapshots;
}
Entry<K, S> {
final K key;
S state;
Entry next;
int stateVersion;
int entryVersion;
}
Copy-on-Write Hash Map - Snapshots
39
Map<K, S> {
Entry<K, S>[] table;
int mapVersion;
int requiredVersion;
OrderedSet<Integer> snapshots;
}
Entry<K, S> {
final K key;
S state;
Entry next;
int stateVersion;
int entryVersion;
}
Create Snapshot:
1. Flat array-copy of table array
2. snapshots.add(mapVersion);
3. ++mapVersion;
4. requiredVersion = mapVersion;
Release Snapshot:
1. snapshots.remove(snapVersion);
2. requiredVersion = snapshots.getMax();
Copy-on-Write Hash Map - 2 Golden Rules
1.Whenever a map entry e is modified and e.entryVersion <
map.requiredVersion, first copy the entry and redirect pointers that are
reachable through snapshot to the copy. Set e.entryVersion =
map.mapVersion. Pointer redirection can trigger recursive application of
rule 1 to other entries.
2.Before returning the state object s of entry e to a caller, if e.stateVersion
< map.requiredVersion, create a deep copy of s and redirect the
pointer in e to the copy. Set e.stateVersion = map.mapVersion. Then
return the copy to the caller. Applying rule 2 can trigger rule 1.
40
Copy-on-Write Hash Map - Example
41
sv:0
ev:0
sv:0
ev:0
K:42 S:7
K:23 S:3
mapVersion: 0
requiredVersion: 0
Copy-on-Write Hash Map - Example
42
sv:0
ev:0
sv:0
ev:0
K:42 S:7
K:23 S:3
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
snapshot() -> 0
sv:0
ev:0
sv:0
ev:0
S:7
S:3
Copy-on-Write Hash Map - Example
43
K:42
K:23
sv:1
ev:1
K:13 S:2
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
put(13, 2)
sv:0
ev:0
sv:0
ev:0
S:7
S:3
Copy-on-Write Hash Map - Example
44
K:42
K:23
K:13
sv:1
ev:1
S:2
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(23)
sv:0
ev:0
sv:0
ev:0
S:7
S:3
Copy-on-Write Hash Map - Example
45
K:42
K:23
K:13
sv:1
ev:1
S:2
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(23)
Caller could modify state object!
sv:0
ev:0
sv:0
ev:0
S:7
S:3
Copy-on-Write Hash Map - Example
46
K:42
K:23 S:3
K:13
sv:1
ev:1
S:2
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(23)
Trigger copy.
How to connect?
?
sv:0
ev:0
sv:0
ev:0
S:7
S:3
Copy-on-Write Hash Map - Example
47
K:42
K:23 S:3
sv:1
ev:1
K:13
sv:1
ev:1
S:2
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(23)
Copy entry.
How to connect?
?
sv:0
ev:0
sv:0
ev:0
S:7
S:3K:23 S:3
sv:1
ev:1
Copy-on-Write Hash Map - Example
48
K:42K:13
sv:1
ev:1
S:2
sv:0
ev:1
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(23)
Copy entry (recursive).
Connect?
sv:0
ev:0
sv:0
ev:0
S:7
S:3 S:3
sv:1
ev:1
K:23
Copy-on-Write Hash Map - Example
49
K:42K:13
sv:1
ev:1
S:2
sv:0
ev:1
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(23)
S:3
sv:1
ev:1
sv:0
ev:0
sv:0
ev:0
S:7
S:3K:23
Copy-on-Write Hash Map - Example
50
K:42K:13
sv:1
ev:1
S:2
sv:0
ev:1
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(42)
S:3
sv:1
ev:1
sv:0
ev:0
sv:0
ev:0
S:7
S:3K:23
Copy-on-Write Hash Map - Example
51
K:42K:13
sv:1
ev:1
S:2
sv:0
ev:1
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(42)
Copy required!
S:3
sv:1
ev:1
sv:0
ev:0
sv:0
ev:0
S:7
S:3K:23
Copy-on-Write Hash Map - Example
52
K:42
S:7
K:13
sv:1
ev:1
S:2
sv:1
ev:1
snapshotVersion: 0
mapVersion: 1
requiredVersion: 1
get(42)
K:23 S:3
sv:1
ev:1
Copy-on-Write Hash Map - Example
53
K:42
sv:1
ev:1
S:7
K:13
sv:1
ev:1
S:2
mapVersion: 1
requiredVersion: 0
release(0)
K:23 S:3
sv:1
ev:1
Copy-on-Write Hash Map - Example
54
sv:1
ev:1
S:2 K:42
sv:1
ev:1
S:7
K:13
snapshotVersion: 1
mapVersion: 2
requiredVersion: 2
snapshot() -> 1
K:23 S:3
sv:1
ev:1
Copy-on-Write Hash Map - Example
55
sv:1
ev:1
S:2 K:42
sv:1
ev:1
S:7
K:13
snapshotVersion: 1
mapVersion: 2
requiredVersion: 2
remove(23)
Change next-pointer?
K:23 S:3
sv:1
ev:1
Copy-on-Write Hash Map - Example
56
sv:1
ev:1
S:2 K:42
sv:1
ev:1
S:7
K:13
snapshotVersion: 1
mapVersion: 2
requiredVersion: 2
remove(23)
Change next-pointer?
sv:1
ev:2
K:23 S:3
sv:1
ev:1
Copy-on-Write Hash Map - Example
57
sv:1
ev:1
S:2 K:42
sv:1
ev:1
S:7
K:13
snapshotVersion: 1
mapVersion: 2
requiredVersion: 2
remove(23)
sv:1
ev:2
Copy-on-Write Hash Map - Example
58
K:42
S:7
K:13
sv:1
ev:1
S:2
mapVersion: 2
requiredVersion: 0
release(1)
Copy-on-Write Hash Map Benefits
▪ Fine granular, lazy copy-on-write.
▪ At most one copy per object, per snapshot.
▪ No thread synchronisation between readers and writer
after array-copy step.
▪ Supports multiple concurrent snapshots.
▪ Also implements efficient incremental rehashing.
▪ Future: Could serve as basis for incremental snapshots.
59
Questions?
60

Weitere ähnliche Inhalte

Was ist angesagt?

Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureEsh Vckay
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkVerverica
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...HostedbyConfluent
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...HostedbyConfluent
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021StreamNative
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
 

Was ist angesagt? (20)

Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda Architecture
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 

Ähnlich wie Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data structures and algorithms for efficient checkpointing

State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...Paris Carbone
 
W1.1 i os in database
W1.1   i os in databaseW1.1   i os in database
W1.1 i os in databasegafurov_x
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesRick Branson
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresHyo jeong Lee
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansEvention
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Cassandra at talkbits
Cassandra at talkbitsCassandra at talkbits
Cassandra at talkbitsMax Alexejev
 
Apache Cassandra at TalkBits
Apache Cassandra at TalkBitsApache Cassandra at TalkBits
Apache Cassandra at TalkBitsDataStax Academy
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksTim Callaghan
 
Ugif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUgif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUGIF
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Shi Shao Feng
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Amazon Web Services
 

Ähnlich wie Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data structures and algorithms for efficient checkpointing (20)

State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
 
W1.1 i os in database
W1.1   i os in databaseW1.1   i os in database
W1.1 i os in database
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicores
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Cassandra at talkbits
Cassandra at talkbitsCassandra at talkbits
Cassandra at talkbits
 
Apache Cassandra at TalkBits
Apache Cassandra at TalkBitsApache Cassandra at TalkBits
Apache Cassandra at TalkBits
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 
Ugif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugifUgif 10 2012 beauty ofifmxdiskstructs ugif
Ugif 10 2012 beauty ofifmxdiskstructs ugif
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
 
Lec7
Lec7Lec7
Lec7
 

Mehr von Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 

Mehr von Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 

Kürzlich hochgeladen

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 

Kürzlich hochgeladen (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 

Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data structures and algorithms for efficient checkpointing

  • 1. 1 Stefan Richter
 @StefanRRichter
 
 September 13, 2017 A Look at Flink’s Internal Data Structures for Efficient Checkpointing
  • 2. Flink State and Distributed Snapshots 2 State Backend Stateful
 Operation Source Event
  • 3. 3 Trigger checkpoint Inject checkpoint barrier Stateful
 Operation Source Flink State and Distributed Snapshots „Asynchronous Barrier Snapshotting“
  • 4. 4 Take state snapshot Synchronously trigger state snapshot (e.g. copy-on-write) Flink State and Distributed Snapshots Stateful
 Operation Source „Asynchronous Barrier Snapshotting“
  • 5. 5 DFS Processing pipeline continues Durably persist
 full snapshots
 asynchronously For each state backend: one writer, n reader Flink State and Distributed Snapshots Stateful
 Operation Source
  • 6. Challenges for Data Structures ▪ Asynchronous Checkpoints: ▪ Minimize pipeline stall time while taking the snapshot. ▪ Keep overhead (memory, CPU,…) as low as possible while writing the snapshot. ▪ Support multiple parallel checkpoints. 6
  • 7. 7 DFS Durably persist
 incremental snapshots
 asynchronously Processing pipeline continues ΔΔ (differences from previous snapshots) Flink State and Distributed Snapshots Stateful
 Operation Source
  • 8. Full Snapshots 8 K S 2 B 4 W 6 N K S 2 B 3 K 4 L 6 N K S 2 Q 3 K 6 N 9 S K S 2 B 4 W 6 N K S 2 B 3 K 4 L 6 N K S 2 Q 3 K 6 N 9 S Checkpoint 1 Checkpoint 2 Checkpoint 3 time State Backend (Task-local) Checkpoint Directory (DFS) Updates Updates
  • 9. Checkpoint Directory (DFS) Incremental Snapshots 9 K S 2 B 4 W 6 N K S 2 B 3 K 4 L 6 N K S 2 Q 3 K 6 N 9 S K S 2 B 4 W 6 N K S 3 K 4 L K S 2 Q 4 - 9 S iCheckpoint 1 iCheckpoint 2 iCheckpoint 3 Δ(-, icp1) Δ(icp1, icp2) Δ(icp2, icp3) time State Backend (Task-local) Updates Updates
  • 10. 5 TB 10 TB 15 TB Incremental Full Incremental vs Full Snapshots 10 Checkpoint Duration State Size
  • 11. State Backend (Task-local) Incremental Recovery 11 K S 2 B 4 W 6 N K S 2 B 3 K 4 L 6 N K S 2 Q 3 K 6 N 9 S K S 2 B 4 W 6 N K S 3 K 4 L K S 2 Q 4 - 9 S time Recovery? + + iCheckpoint 1 iCheckpoint 2 iCheckpoint 3 Δ(-, icp1) Δ(icp1, icp2) Δ(icp2, icp3) Checkpoint Directory (DFS)
  • 12. Challenges for Data Structures ▪ Incremental checkpoints: ▪ Efficiently detect the (minimal) set of state changes between two checkpoints. ▪ Avoid unbounded checkpoint history. 12
  • 13. Two Keyed State Backends 13 HeapKeyedStateBackend RocksDBKeyedStateBackend • State lives in memory, on Java heap. • Operates on objects. • Think of a hash map {key obj -> state obj}. • Goes through ser/de during snapshot/restore. • Async snapshots supported. • State lives in off-heap memory and on disk. • Operates on serialized bytes. • Think of K/V store {key bytes -> state bytes}. • Based on log-structured-merge (LSM) tree. • Goes through ser/de for each state access. • Async and incremental snapshots.
  • 14. Asynchronous and Incremental Checkpoints with the RocksDB Keyed-State Backend 14
  • 15. What is RocksDB ▪ Ordered Key/Value store (bytes). ▪ Based on LSM trees. ▪ Embedded, written in C++, used via JNI. ▪ Write optimized (all bulk sequential), with acceptable read performance. 15
  • 16. RocksDB Architecture (simplified) 16 Memory Local Disk K: 2 V: T K: 8 V: W Memtable - Mutable Memory Buffer for K/V Pairs (bytes) - All reads / writes go here first - Aggregating (overrides same unique keys) - Asynchronously flushed to disk when full
  • 17. RocksDB Architecture (simplified) 17 Memory Local Disk K: 4 V: Q Memtable K: 2 V: T K: 8 V: WIdx SSTable-(1) BF-1 - Flushed Memtable becomes immutable - Sorted By Key (unique) - Read: first check Memtable, then SSTable - Bloomfilter & Index as optimisations
  • 18. RocksDB Architecture (simplified) 18 Memory Local Disk K: 2 V: A K: 5 V: N Memtable K: 4 V: Q K: 7 V: SIdx SSTable-(2) K: 2 V: T K: 8 V: WIdx SSTable-(1) K: 2 V: C K: 7 -Idx SSTable-(3) BF-1 BF-2 BF-3 - Deletes are explicit entries - Natural support for snapshots - Iteration: online merge - SSTables can accumulate
  • 19. RocksDB Compaction (simplified) 19 K: 2 V: C K: 7 - K: 4 V: Q K: 7 V: S K: 2 V: T K: 8 V: W K: 2 V: C K: 4 V: Q K: 8 V: W multiway sorted merge „Log Compaction“ SSTable-(3) SSTable-(1)SSTable-(2) SSTable-(1, 2, 3) (rm)
  • 20. RocksDB Asynchronous Checkpoint ▪ Flush Memtable. (Simplified) ▪ Then create iterator over all current SSTables (they are immutable). ▪ New changes go to Memtable and future SSTables and are not considered by iterator. 20
  • 21. RocksDB Incremental Checkpoint ▪ Flush Memtable. ▪ For all SSTable files in local working directory: upload new files since last checkpoint to DFS, re-reference other files. ▪ Do reference counting on JobManager for uploaded files. 21
  • 22. Incremental Checkpoints - Example ▪ Illustrate interactions (simplified) between: ▪ Local RocksDB instance ▪ Remote Checkpoint Directory (DFS) ▪ Job Manager (Checkpoint Coordinator) ▪ In this example: 2 latest checkpoints retained 22
  • 23. sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 merge +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 Incremental Checkpoints - Example TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 24. sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 merge +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 Incremental Checkpoints - Example TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 25. sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 merge +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 Incremental Checkpoints - Example TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 26. Incremental Checkpoints - Example sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 sstable-(1,2,3) sstable-(4,5,6)CP 4 merge +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 27. Incremental Checkpoints - Example sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 28. Incremental Checkpoints - Example sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 29. Incremental Checkpoints - Example 29 sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 merge +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 30. Incremental Checkpoints - Example 30 sstable-(1,2,3) sstable-(1) sstable-(2) sstable-(1) sstable-(2) sstable-(3) sstable-(4) sstable-(4) sstable-(5) CP 1 CP 2 CP 3 merge sstable-(1,2,3) sstable-(4,5,6)CP 4 merge +sstable-(6) sstable-(3) sstable-(4) sstable-(1) sstable-(2) sstable-(1,2,3) sstable-(5) sstable-(4,5,6) CP 1 CP 2CP 2 CP 3 CP 4 TaskManager Local RocksDB Working Directory DFS Upload Job Manager
  • 31. Summary RocksDB ▪ Asynchronous snapshots „for free“, immutable copy is already there. ▪ Trivially supports multiple concurrent snapshots. ▪ Detection of incremental changes easy: observe creation and deletion of SSTables. ▪ Bounded checkpoint history through compaction. 31
  • 32. Asynchronous Checkpoints with the Heap Keyed-State Backend 32
  • 33. Heap Backend: Chained Hash Map 33 S1 K1 Entry S2 K2 Entry next Map<K, S> { Entry<K, S>[] table; } Entry<K, S> { final K key; S state; Entry next; } Map<K, S>
  • 34. 34 S1 K1 Entry S2 K2 Entry next Map<K, S> { Entry<K, S>[] table; } Entry<K, S> { final K key; S state; Entry next; } Map<K, S> Heap Backend: Chained Hash Map
  • 35. 35 S1 K1 Entry S2 K2 Entry next Map<K, S> { Entry<K, S>[] table; } Entry<K, S> { final K key; S state; Entry next; } Map<K, S> Heap Backend: Chained Hash Map
  • 36. 36 S1 K1 Entry S2 K2 Entry next Map<K, S> { Entry<K, S>[] table; } Entry<K, S> { final K key; S state; Entry next; } Map<K, S> Heap Backend: Chained Hash Map
  • 37. 37 S1 K1 Entry S2 K2 Entry next Map<K, S> { Entry<K, S>[] table; } Entry<K, S> { final K key; S state; Entry next; } Map<K, S> „Structural changes to map“ (easy to detect) „User modifies state objects“ (hard to detect) Heap Backend: Chained Hash Map
  • 38. Copy-on-Write Hash Map 38 S1 K1 Entry S2 K2 Entry next Map<K, S> „Structural changes to map“ (easy to detect) „User modifies state objects“ (hard to detect) Map<K, S> { Entry<K, S>[] table; int mapVersion; int requiredVersion; OrderedSet<Integer> snapshots; } Entry<K, S> { final K key; S state; Entry next; int stateVersion; int entryVersion; }
  • 39. Copy-on-Write Hash Map - Snapshots 39 Map<K, S> { Entry<K, S>[] table; int mapVersion; int requiredVersion; OrderedSet<Integer> snapshots; } Entry<K, S> { final K key; S state; Entry next; int stateVersion; int entryVersion; } Create Snapshot: 1. Flat array-copy of table array 2. snapshots.add(mapVersion); 3. ++mapVersion; 4. requiredVersion = mapVersion; Release Snapshot: 1. snapshots.remove(snapVersion); 2. requiredVersion = snapshots.getMax();
  • 40. Copy-on-Write Hash Map - 2 Golden Rules 1.Whenever a map entry e is modified and e.entryVersion < map.requiredVersion, first copy the entry and redirect pointers that are reachable through snapshot to the copy. Set e.entryVersion = map.mapVersion. Pointer redirection can trigger recursive application of rule 1 to other entries. 2.Before returning the state object s of entry e to a caller, if e.stateVersion < map.requiredVersion, create a deep copy of s and redirect the pointer in e to the copy. Set e.stateVersion = map.mapVersion. Then return the copy to the caller. Applying rule 2 can trigger rule 1. 40
  • 41. Copy-on-Write Hash Map - Example 41 sv:0 ev:0 sv:0 ev:0 K:42 S:7 K:23 S:3 mapVersion: 0 requiredVersion: 0
  • 42. Copy-on-Write Hash Map - Example 42 sv:0 ev:0 sv:0 ev:0 K:42 S:7 K:23 S:3 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 snapshot() -> 0
  • 43. sv:0 ev:0 sv:0 ev:0 S:7 S:3 Copy-on-Write Hash Map - Example 43 K:42 K:23 sv:1 ev:1 K:13 S:2 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 put(13, 2)
  • 44. sv:0 ev:0 sv:0 ev:0 S:7 S:3 Copy-on-Write Hash Map - Example 44 K:42 K:23 K:13 sv:1 ev:1 S:2 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(23)
  • 45. sv:0 ev:0 sv:0 ev:0 S:7 S:3 Copy-on-Write Hash Map - Example 45 K:42 K:23 K:13 sv:1 ev:1 S:2 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(23) Caller could modify state object!
  • 46. sv:0 ev:0 sv:0 ev:0 S:7 S:3 Copy-on-Write Hash Map - Example 46 K:42 K:23 S:3 K:13 sv:1 ev:1 S:2 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(23) Trigger copy. How to connect? ?
  • 47. sv:0 ev:0 sv:0 ev:0 S:7 S:3 Copy-on-Write Hash Map - Example 47 K:42 K:23 S:3 sv:1 ev:1 K:13 sv:1 ev:1 S:2 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(23) Copy entry. How to connect? ?
  • 48. sv:0 ev:0 sv:0 ev:0 S:7 S:3K:23 S:3 sv:1 ev:1 Copy-on-Write Hash Map - Example 48 K:42K:13 sv:1 ev:1 S:2 sv:0 ev:1 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(23) Copy entry (recursive). Connect?
  • 49. sv:0 ev:0 sv:0 ev:0 S:7 S:3 S:3 sv:1 ev:1 K:23 Copy-on-Write Hash Map - Example 49 K:42K:13 sv:1 ev:1 S:2 sv:0 ev:1 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(23)
  • 50. S:3 sv:1 ev:1 sv:0 ev:0 sv:0 ev:0 S:7 S:3K:23 Copy-on-Write Hash Map - Example 50 K:42K:13 sv:1 ev:1 S:2 sv:0 ev:1 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(42)
  • 51. S:3 sv:1 ev:1 sv:0 ev:0 sv:0 ev:0 S:7 S:3K:23 Copy-on-Write Hash Map - Example 51 K:42K:13 sv:1 ev:1 S:2 sv:0 ev:1 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(42) Copy required!
  • 52. S:3 sv:1 ev:1 sv:0 ev:0 sv:0 ev:0 S:7 S:3K:23 Copy-on-Write Hash Map - Example 52 K:42 S:7 K:13 sv:1 ev:1 S:2 sv:1 ev:1 snapshotVersion: 0 mapVersion: 1 requiredVersion: 1 get(42)
  • 53. K:23 S:3 sv:1 ev:1 Copy-on-Write Hash Map - Example 53 K:42 sv:1 ev:1 S:7 K:13 sv:1 ev:1 S:2 mapVersion: 1 requiredVersion: 0 release(0)
  • 54. K:23 S:3 sv:1 ev:1 Copy-on-Write Hash Map - Example 54 sv:1 ev:1 S:2 K:42 sv:1 ev:1 S:7 K:13 snapshotVersion: 1 mapVersion: 2 requiredVersion: 2 snapshot() -> 1
  • 55. K:23 S:3 sv:1 ev:1 Copy-on-Write Hash Map - Example 55 sv:1 ev:1 S:2 K:42 sv:1 ev:1 S:7 K:13 snapshotVersion: 1 mapVersion: 2 requiredVersion: 2 remove(23) Change next-pointer?
  • 56. K:23 S:3 sv:1 ev:1 Copy-on-Write Hash Map - Example 56 sv:1 ev:1 S:2 K:42 sv:1 ev:1 S:7 K:13 snapshotVersion: 1 mapVersion: 2 requiredVersion: 2 remove(23) Change next-pointer?
  • 57. sv:1 ev:2 K:23 S:3 sv:1 ev:1 Copy-on-Write Hash Map - Example 57 sv:1 ev:1 S:2 K:42 sv:1 ev:1 S:7 K:13 snapshotVersion: 1 mapVersion: 2 requiredVersion: 2 remove(23)
  • 58. sv:1 ev:2 Copy-on-Write Hash Map - Example 58 K:42 S:7 K:13 sv:1 ev:1 S:2 mapVersion: 2 requiredVersion: 0 release(1)
  • 59. Copy-on-Write Hash Map Benefits ▪ Fine granular, lazy copy-on-write. ▪ At most one copy per object, per snapshot. ▪ No thread synchronisation between readers and writer after array-copy step. ▪ Supports multiple concurrent snapshots. ▪ Also implements efficient incremental rehashing. ▪ Future: Could serve as basis for incremental snapshots. 59

Hinweis der Redaktion

  1. I would like to start with a very brief recap how state and checkpoints work in Flink to motivate the challenges the impose on the data structures of our backends.
  2. State backend is local to operator so that we can operate up to the speed of local memory variables. But how can we get a consistent snapshot of the distributed state?
  3. Users want to go to 10TB or even up to 50TB state! significant overhead! Beauty: self contained, can be easily deleted or recovered
  4. RocksDB is an embeddable persistent k/v store for fast storage. If user modifies state under key 42 3 times, only the latest change is reflected in memtable.
  5. Checkpoint coordinator lives in the JobMaster State backend working area is local memory / local disk Checkpointing area is DFS (stable, remote, replicated)
  6. FLUSH memtable as first step, so that all deltas are on disk SST files are immutable, their creation and deletion is the delta. we have a shared folder for the deltas (sstables) that belong to the operator instance
  7. Important new step, we can only reference successful incremental checkpoints in future checkpoints!
  8. New set of sst tables, 225 is missing, so is 227… we can assume the got merged and consolidated in 228 or 229.
  9. Think of events of user interactions, key is a user id and we store a click count for each user. We talk about keyed state.