Speaker: Mike Drob
Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.
7. Accumulo across versions
• Accumulo 1.4.4-cdh4.5.0
• Accumulo 1.6.0-cdh4.6.0-beta-1
• YCSB 0.14+50
• 80 node cluster
• 10 clients
• 5 racks
7
Public Domain via USAF
8. Accumulo across versions
The Data:
• 200 GB
• 2k Columns
• Pre-Split Table 80x
• Vary # of rows
• Vary value size
(we actually did a lot more,
but it was hard to graph)
8
Morio CC BY-SA 3.0
12. Accumulo across versions
• Write speed improved!
• Read speed about the same.
• Something weird happens writing 1000 rows.
12
Christopher Foster CC BY-SA 3.0
13. Accumulo across versions
So, what happens at 1000 rows…? Nothing.
13
100
200
300
400
500
600
700
10 100 1000 10000 100000
Throughput(MB/sec)
# of Rows
Problem is at 100 rows.
22. Performance Tweaks – Client Side
• Number of rows/columns
• Batch Writer Threads
• Batch Writer Buffer Size
• Use large buffer for small values
• Use small buffer for large values
• ACCUMULO-2766 possible fix
22
Public Domain via USN
23. Performance Tweaks – Server Side
• Apply table splits liberally
• Increase automatic split threshold
• Some properties to play with:
• table.compaction.minor.logs.threshold
• tserver.compaction.minor.concurrent.max
• tserver.walog.max.size
• If running with dfs.datanode.synconclose also
enable dfs.datanode.sync.behind.writes
23