1. Lock-free transactional
support for large-scale
storage systems
Flavio Junqueira, Benjamin Reed, Maysam Yabandeh
Yahoo! Research
June 2011
2. Big data
• Large data sets
✓ Unstructured, semi-structured data
✓ Critical for business logic
• Examples of such data
✓ Web logs, server logs, social media, etc
June 2011 2
3. Big data
+160% clicks +43% clicks
vs. one-size fits all vs. editor selected
Eric Baldeschwieler @IBM Big Data, May 2011
June 2011 3
4. Big data: Hadoop
Eric Baldeschwieler @IBM Big Data, May 2011
June 2011 4
5. Background
• Database generations in batches
Input e.g.,
Input Hours of Output Hours of Output Hbase,
Input DB MapReduce DB MapReduce DB HDFS
• Online concurrent updates
Input txn
Output Require transactional
Input txn DB support
June 2011 5
6. Examples
• Mutable tables
• Various indexes: Web, news, shopping, coupons
• User and content models
• Characteristics
✓ Concurrency
✓ Losing updates is undesirable
✓ There are concurrent reads and they must be consistent
June 2011 6
7. Semantics
• Read only previously committed values
Txn
w(x,v)
r(x) = v
w(x,v’) Time
June 2011 7
8. Semantics
• No concurrent writes to the same row
Txn
w(x,v)
w(x,v’) Time
At least one must abort
June 2011 8
9. Snapshot Isolation
• Known in the database realm
• Conflicting transactions
✓ Write to the same element (e.g., row)
✓ Time range between start and commit overlap
• Efficient implementation by versioning
June 2011 9
10. Locks?
• Previous approaches: Lock data to modify
✓ Convoy effect [Percolator, OSDI’10]
✓ Delays of several seconds
✓ Higher overhead on data servers
• Our approach
✓ Lock-free, centralized transaction manager
✓ Single point of failure, potential bottleneck?
June 2011 10
11. Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
DB1 Client DB2
June 2011 11
12. Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts
DB1 Client DB2
June 2011 11
13. Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), ⊥ DB2
Ts(txnw) < Ts(txnr)
June 2011 11
14. Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts Tc(txnw) < Ts(txnr)?
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), ⊥ DB2
Ts(txnw) < Ts(txnr)
June 2011 11
15. Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts Commit r2 Tc(txnw) < Ts(txnr)?
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), ⊥ DB2
Ts(txnw) < Ts(txnr)
June 2011 11
16. Transaction Status Oracle
• Single process
✓ Processes client inquiries about transactions
✓ Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts Commit r2 Tc(txnw) < Ts(txnr)?
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), ⊥ DB2
Ts(txnw) < Ts(txnr)
Cleanup(r2, txnr)
June 2011 11
17. ReTSO: Design choices
• TSO
✓ Keeps state of modified rows
• In-memory state
✓ Highest commit timestamp of all garbage-collected rows
• Auto-GC Hash map
✓ Lazy garbage-collection
✓ Upon a hit
June 2011 12
18. ReTSO: Increasing dependability
• Remote write-ahead log
Inquiries Backup
ReTSO ReTSO
Warm or cold
Updates
Writes to WAL
are synchronous but do WAL e.g., NFS, BookKeeper
not block other txns
[http://zookeeper.apache.org/bookkeeper]
June 2011 13
19. Preliminary results
• Coded in Java
✓ Except for hash map (C++ with JNI interface)
• Uses BookKeeper for WAL
• 10 identical servers
✓ 2.13 Dual Core Intel Xeon
✓ 4GB of RAM
✓ 1 Gigabit interfaces
June 2011 14
20. Preliminary results
• Average throughput observed
✓ 3 clients, 1,000 concurrent transactions
✓ 81k TPS
• Average latency
✓ 1 client, 1 txn
✓ 0.87 ms (with WAL)
✓ 0.17 ms (without WAL)
June 2011 15
21. Preliminary results
• Increasing the load of the system
✓ 1 to 16 clients 18
ReTSO
16 WAL-disabled
✓ Max is 72k TPS 14
12
Latency in ms
10
8
6
4
2
0
20000 40000 60000 80000 100000 120000
Throughput in TPS
June 2011 16
22. What’s baking?
• Integration
✓ HBase
✓ Query engine
• Real workloads
June 2011 17
23. Summary
• Transaction management for large-scale data
repositories
• Lock-based vs. Lock-free
✓ ReTSO is lock-free and dependable
✓ Reduced load on storage nodes
✓ Low latency despite faults
• Performance sufficient for realistic applications
June 2011 18