The document describes how lock-free and wait-free in-memory algorithms can be used to improve performance of high-volume data management. It discusses using multi-version concurrency control (MVCC) with immutable data and wait-free index scans to enable concurrent transactions. A hash table with a skip list index is presented that allows lock-free inserts and finds using compare-and-swap operations while preserving consistency for readers. Transaction commit involves prepare, commit, finish and publish phases, with epochs and deferred reclamation used for garbage collection.
Nas Fronteiras da Loucura - Divaldo Pereira Franco pelo Espírito Manoel Philo...
Ähnlich wie IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management
Ähnlich wie IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management (20)
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management
1. USING LOCK-FREE AND WAIT-FREE IN-MEMORY
ALGORITHMS TO TURBO-CHARGE HIGH VOLUME DATA
MANAGEMENT
HENNING ANDERSEN, STIBO SYSTEMS A/S
See all the presentations from the In-Memory Computing
Summit at http://imcsummit.org
2. BIO
20 years of professional career at Stibo Systems A/S
Developed software for the last 30+ years
Technical lead on many projects, including:
Migrating from C++ to Java platform (performance & scalability)
Establishing a component platform
In-Memory component
32. SKIP LISTS - INSERTION
H 10 20 30 40 50
15
Pick random height
33. SKIP LISTS – INSERTION RESULT
H 10 20 30 40 5015
34. Next Prev TSN=3 Key=K1 Value=15 Index
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
MVCC INDEXING USING SKIP LISTS
hash(key)%tablesize
Bucket Table Transaction Table
Prepare
Finish
Publish
Published TSN
22
Finish
5. Update Index
Next Prev TSN=2 Key=K1 Value=10 Index
35. 5. Update Index
Next Prev TSN=3 Key=K1 Value=15 Index
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
MVCC INDEXING USING SKIP LISTS
hash(key)%tablesize
Bucket Table Transaction Table
Prepare
Finish
Publish
Published TSN
22
Finish
Next Prev TSN=2 Key=K1 Value=10 Index
36. SKIP LISTS – 5. UPDATE INDEX
H
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
Next
Prev
TSN
Key
Value
Index L0
Index L1
Index L2
K1
37. SKIP LISTS – 5. UPDATE INDEX
H
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
Next
Prev
TSN
Key
Value
Index L0
Index L1
Index L2
K1
38. SKIP LISTS – INSERTION RESULT
H
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
Next
Prev
TSN
Key
Value
Index L0
Index L1
Index L2
K1
39. SKIP LISTS – FIND [12-25], TSN=2
H
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
Next
Prev
TSN
Key
Value
Index L0
Index L1
Index L2
K1
40. SKIP LISTS – FIND [12-25], TSN=3
H
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
Next
Prev
TSN
Key
Value
Index L0
Index L1
Index L2
K1
41. LOCK-FREE INSERTIONS SUMMARY
CAS (compare-and-swap) on previous entity – one winner
Bottom-up preserves skip-list for every level, allowing wait-free readers
Help vacuum ensures lock-freedom
H 10 20 30 40
15 17
42. LOCK-FREE INSERTIONS SUMMARY
CAS (compare-and-swap) on previous entity – one winner
Bottom-up preserves skip-list for every level, allowing wait-free readers
Help vacuum ensures lock-freedom
H 10 20 30 4015 17
43. Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
H
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Reader TSN Epoch
Thread=1234 2 17
Thread=1235 3 17Vacuum wait
44. Reader TSN Epoch
Thread=1234 2 17
Thread=1235 3 17
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
H
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum wait
45. Reader TSN Epoch
Thread=1235 3 17
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
H
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
46. Reader TSN Epoch
Thread=1235 3 17
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum phase 1
H
47. Reader TSN Epoch
Thread=1235 3 17
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum epoch wait
H
48. Reader TSN Epoch
Thread=2345 3 18
Thread=1235 3 17
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum epoch wait
H
49. Reader TSN Epoch
Thread=2345 3 18
Thread=1235 3 17
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum epoch wait
H
50. Reader TSN Epoch
Thread=2345 3 18
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum epoch wait
H
51. Reader TSN Epoch
Thread=2345 3 18
Tx ID TSN
UUID=1234
Tx ID TSN
UUID=1234 3
VACUUM, EPOCH BASED DEFERRED RECLAMATION
hash(key)%tablesize
Bucket Table Transaction Table
Published TSN
23
T
10
K1
P
N
2
15
P
N
3
20
K3
P
N
2
30
K4
P
N
2
40
K5
P
N
2
50
K6
P
N
2
K1
Snapshot Registry
Vacuum phase 2
H
52. MVCC SUMMARY
Map and indexes both under MVCC
Index scans are wait-free (and simple/fast)
Insert/update/delete are lock-free
Automated reclamation of storage
53. EFFICIENT AND SAFE API
transactionManager.read((snapshot) -> {
QueryIterator<ProductCO> products = snapshot.query(ProductCO._ID.range(‘IMC’,’Stibo’));
while (products.next()) {
CacheEntry<ProductCO> entry = queryIterator.entry();
long typeId = entry.longValue(ProductCO::getObjectType);
CacheEntry<ObjectTypeCO> type = snapshot.get(typeId);
// can do gets, queries etc. on the same snapshot safely for all kinds of objects
}
}
public class ProductCO {
long getObjectType(ValuePointer ptr) { … }
}
No object copies, no GC, efficient accessOften JVM can inline entire query to one native method
1
2
3
4
5
54. DIY USEFUL LEARNING
• Memory model (java different from C++) and CAS operations
• Assembly
• CPU memory architecture
• Wait-free and lock-free algorithms
• Enumerate all states
• Think about state transitions
• Try to formally proof it right
• Deletions are often the most tricky part
• Do not even think about “this will never happen”, because it will
55. IN-MEMORY VENDOR QUESTIONS
Direct access to data or only access to copies of data?
And direct access to individual fields in an entry?
Index/Query engine MVCC consistent with map gets and/or additional queries?
Will index scans/queries acquire locks?
Will index inserts acquire locks?
Will map get/put operations acquire locks?
Memory overhead per entry?
Memory overhead per index (per entry)?
How do you avoid memory fragmentation?
Do you lock pages in memory and use huge/large pages?