http://yahoo.github.com/omid
Many data processing tasks can be thought as small mutations to a large database triggered by events. Contrary to batch processing, the incremental processing model achieves very low delay from the reception of an event to the application of the mutation. In the absence of transactions support, developers have to use ad-hoc mechanisms to ensure atomic execution of mutations despite failures and concurrent accesses to the database by other clients. Most NoSQL data stores, like HBase, BigTable and Cassandra, lack support of transactions, which makes them unsuitable for a whole range of applications. In this talk we present Omid, an open source tool for transactional support and incremental processing on top of HBase (https://github.com/yahoo/omid). Due to the centralized nature of its client-replicated status oracle, Omid (i) avoids distributed locks, (ii) scales up to 60,000 TPS and a thousand clients, (iii) requires no changes to HBase, and (vi) adds a negligible overhead to data servers.
2. About me
§ Research engineer at Yahoo!
§ Main Omid developer
§ Apache S4 committer
§ Contributor to:
• ZooKeeper
• HBase
Omid - Yahoo! Research 2
4. Motivation
§ Traditional DBMS: § NoSQL stores:
• SQL • SQL
• Transactions • Transactions
• Scalable • Scalable
§ Some use cases require
• Scalability
• Consistent updates
Omid - Yahoo! Research 4
5. Motivation
§ Requested feature
• Support for transactions in HBase is a common request
§ Percolator
• Google implemented transactions on BigTable
§ Incremental Processing
• MapReduce latencies measured in hours
• Reduce latency to seconds
• Requires transactions
Omid - Yahoo! Research 5
6. Incremental Processing
Offline Online
MapReduce Incremental Processing
Fault tolerance Fault tolerance
Scalability Scalability
Shared state
State0 State1 Shared
State
Using Percolator, Google dramatically
reduced crawl-to-index time
Omid - Yahoo! Research 6
8. Omid
§ ‘Hope’ in Persian
§ Optimistic Concurrency Control system
• Lock free
• Aborts transactions at commit time
§ Implements Snapshot Isolation
§ Low overhead
§ http://yahoo.github.com/omid
Omid - Yahoo! Research 8
9. Snapshot Isolation
§ Transactions read from their own snapshot
§ Snapshot:
• Immutable
• Identified by creation time
• Values committed before creation time
§ Transactions conflict if two conditions apply:
A) Overlap in time
B) Write to the same row
Omid - Yahoo! Research 9
10. Transaction conflicts
Id: Time overlap: Write set:
A r1 r3
B r2 r4
C r3 r4
D r1 r2 r3 r4
Time
§ Transaction B overlaps with A and C
• B ∩ A = ∅
• B ∩ C = { r4 }
• B and C conflict
§ Transaction D doesn’t conflict
Omid - Yahoo! Research 10
12. Architecture
1. Centralized server
2. Transactional metadata replicated to clients
3. Store transaction ids in HBase timestamps
Status Oracle
(SO)
(1) (2)
S
O Omid
Omid
Omid
Clients
HBase Clients
Clients
HBase
HBase
HBase
Region
Region
Region
Region
Servers
Servers
Servers
Servers (3)
Omid - Yahoo! Research 12
13. Status Oracle
§ Limited memory
• Evict old metadata
• … maintaining consistency guarantees!
§ Keep LowWatermark
• Updated on metadata eviction
• Abort transactions older than LowWatermark
• Necessary for Garbage Collection
Omid - Yahoo! Research 13
15. Metadata Replication
§ Transactional metadata is replicated to clients
§ Status Oracle not in critical path
• Minimize overhead
§ Clients guarantee Snapshot Isolation
• Ignore data not in their snapshot
• Talk directly to HBase
• Conflicts resolved by the Status Oracle
§ Expensive, but scalable up to 1000 clients
Omid - Yahoo! Research 15
16. Architecture recap
Status Oracle
StartTS CommitTS Row LastCommitTS
T10 T20 r1 T20
… … … …
LowWatermark Aborted
T4 T2 T18 …
S
O Omid
Omid
HBase Omid
Clients
HBase
HBase
HBase Clients
Region
Region
Region Clients
Region
Servers
Servers
Servers
Servers
Omid - Yahoo! Research 16
17. Architecture + Metadata
Conflict detection
Status Oracle
(not replicated)
ST CT R LC HBase values tagged
10 20 r1 20 with Start TimeStamp
LowW Abort
… … Client
ST CT
HBase 10 20
row TS val LowW Abort
r1 10 hi … …
Omid - Yahoo! Research 17
21. Transaction Example
>> |
Status Oracle
ST CT R LC
10 20 r1 20
LowW Abort
… … Client
HBase ST CT
10 20
row TS val
r1 10 hi LowW Abort
… …
Omid - Yahoo! Research 21
22. Transaction Example
>> t = TM.begin()
Transaction { ST: 25 }
Status Oracle
>> |
ST CT R LC
10 20 r1 20
LowW Abort
… … Client t
ST: 25
HBase ST CT
10 20
row TS val
r1 10 hi LowW Abort
… …
Omid - Yahoo! Research 22
23. Transaction Example
>> t = TM.begin()
Transaction { ST: 25 }
Status Oracle
>> TTable.get(t, ‘r1’)
‘hi’
ST CT R LC
>> |
10 20 r1 20
LowW Abort
… … Client t
ST: 25
HBase ST CT
10 20
row TS val
r1 10 hi LowW Abort
… …
Omid - Yahoo! Research 23
24. Transaction Example
>> t = TM.begin()
Transaction { ST: 25 }
Status Oracle
>> TTable.get(t, ‘r1’)
‘hi’
ST CT R LC
>> TTable.put(t, ‘r1’, ‘bye’)
10 20 r1 20 >> |
LowW Abort
… … Client t
ST: 25
HBase ST CT R: {r1}
10 20
row TS val
r1 10 hi LowW Abort
r1 25 bye … …
Omid - Yahoo! Research 24
25. Transaction Example
>> TTable.get(t, ‘r1’)
‘hi’
Status Oracle
>> TTable.put(t, ‘r1’, ‘bye’)
>> TM.commit(t)
ST CT R LC
Committed{ CT: 30 }
25 30 r1 30 >> |
LowW Abort
20 … Client t
ST: 25
HBase ST CT R: {r1}
10 20
row TS val
r1 10 hi LowW Abort
r1 25 bye … …
Omid - Yahoo! Research 25
26. Transaction Example
>> t_old
Transaction{ ST: 15 }
Status Oracle
>> TT.put(t_old, ‘r1’, ‘yo’)
>> TM.commit(t_old)
ST CT R LC
Aborted
25 30 r1 30 >> |
LowW Abort
… 15 Client t_old
ST: 15
HBase ST CT R: {r1}
10 20
row TS val
r1 15 yo LowW Abort
r1 25 bye … …
Omid - Yahoo! Research 26
28. Omid Performance
§ Centralized server = bottleneck?
• Focus on good performance
• In our experiments, it never became the bottleneck
30
2 rows
25
4 rows
8 rows
512 rows/t
20
16 rows
32 rows
128 rows
1K TPS
Latency in ms
512 rows
10 ms
15
10 2 rows/t
5 100K TPS
0
5 ms
1K 10K 100K
Throughput in TPS
Omid - Yahoo! Research 28
29. Fault Tolerance
§ Omid uses BookKeeper for fault tolerance
• BookKeeper is a distributed Write Ahead Log
§ Recovery: reads log’s tail (bounded time)
Downtime < 25 s
18000
Throughput in TPS
15000
12000
9000
6000
3000
0
5900 5950 6000 6050 6100
Time in s
Omid - Yahoo! Research 29
31. News Recommendation System
§ Model-based recommendations
• Similar users belong to a cluster
• Each cluster has a trained model
§ New article
• Evaluate against clusters
• Recommend to users
§ User actions
• Train model
• Split cluster
Omid - Yahoo! Research 31
32. Transactions Advantages
§ All operations are consistent
• Updates and queries
§ Needed for a recommendation system?
• We avoid corner cases
• Implementing existing algorithms is straightforward
§ Problems we solve
• Two operations concurrently splitting a cluster
• Query while cluster splits
• …
Omid - Yahoo! Research 32
33. Example overview
Articles &
Updates Queries
User actions
Index
Data
Index
Wor Store
kers
Omid - Yahoo! Research 33
34. Other use cases
§ Update indexes incrementally
• Original Percolator use case
§ Generic graph processing
§ Adapt existing transactional solutions to HBase
Omid - Yahoo! Research 34
36. Current Challenges
§ Load isn’t distributed automatically
§ Complex logic requires big transactions
• More likely to conflict
• More expensive to retry
§ API is low level for data processing
Omid - Yahoo! Research 36
37. Future work
§ Framework for Incremental Processing
• Simpler API
• Trigger-based, easy to decompose operations
• Auto scalable
§ Improve user friendliness
• Debug tools, metrics, etc
§ Hot failover
Omid - Yahoo! Research 37
38. Questions?
All time contributors
Ben Reed
Flavio Junqueira
Francisco Pérez-Sorrosal
Try it!
Ivan Kelly yahoo.github.com/omid
Matthieu Morel
Maysam Yabandeh
Partially supported by
CumuloNimbo project (ICT-257993)
funded by the European Commission
40. Commit Algorithm
§ Receive commit request for txn_i
§ Set CommitTimestamp(txn_i) <= new timestamp
§ For each modified row:
• if LastCommit(row) > StartTimestamp(txn_i) => abort
§ For each modified row:
• Set LastCommit(row) <= CommitTimestamp(txn_i)
Omid - Yahoo! Research 40