Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

•

1 like•1,992 views

Speaker: Mike Drob Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.

Technology Business

1
Benchmarking Accumulo:
How Fast is Fast?
Mike Drob
Software Engineer, Cloudera

Me
• Cloudera Engineer
• Accumulo Committer
• Perpetual Tinkerer
2
Victor Grigas CC-BY-SA 3.0

Agenda
• Methodology
• Accumulo 1.4 to 1.6
• Accumulo to HBase
• Conclusions
3
Reuvenk CC-BY-SA 2.5

Methodology
• Measuring Performance
• Task Latency (time)
• Throughput (bps)
• Workloads
• Read
• Write
• Mixed
4
AngMoKio CC-BY-SA 2.5

Methodology
• Yahoo! Cloud Serving Benchmark
• Workloads
• Connectors
• Highly configurable
• # of Rows/Columns
• Size of Value
• # of Threads
• Parallelizable number of clients
5
Sfoskett CC BY-SA 3.0

Accumulo across versions
• Accumulo 1.4.4-cdh4.5.0
• Accumulo 1.6.0-cdh4.6.0-beta-1
• YCSB 0.14+50
• 80 node cluster
• 10 clients
• 5 racks
7
Public Domain via USAF

Accumulo across versions
The Data:
• 200 GB
• 2k Columns
• Pre-Split Table 80x
• Vary # of rows
• Vary value size
(we actually did a lot more,
but it was hard to graph)
8
Morio CC BY-SA 3.0

Accumulo across versions
9
0
200
400
600
800
1000
1200
1400
10 100 1000 10000 100000
Throughput(MB/sec)
# of Rows
Aggregate Read
Accumulo 1.4
Accumulo 1.6

Accumulo across versions
10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
10 100 1000 10000 100000
Throughput(MB/sec)
# of Rows
Aggregate Mixed
Accumulo 1.4
Accumulo 1.6

Accumulo across versions
11
0
50
100
150
200
250
300
350
400
450
10 100 1000 10000 100000
Throughput(MB/sec)
# of Rows
Aggregate Write
Accumulo 1.4
Accumulo 1.6

Accumulo across versions
• Write speed improved!
• Read speed about the same.
• Something weird happens writing 1000 rows.
12
Christopher Foster CC BY-SA 3.0

Accumulo across versions
So, what happens at 1000 rows…? Nothing.
13
100
200
300
400
500
600
700
10 100 1000 10000 100000
Throughput(MB/sec)
# of Rows
Problem is at 100 rows.

Accumulo and HBase
• Accumulo 1.6.0-cdh4.6.0-beta-1
• HBase 0.94.15-cdh4.6.0
• YCSB 0.14+50
• 5 worker nodes
• 5 split points
• 5G Heap, 3G mem map
15
Abdullah AlBargan CC BY-ND 2.0

Accumulo and HBase
• Single client (5 threads)
• Workload sizes
• In memory (15G)
• Force disk activity (30G)
• Constant # of rows
• Vary # of columns
• Activity
• 100% Write
• 100% Read
16
nahtanoj CC-BY-2.0

Accumulo and HBase
17
0
100
200
300
400
500
600
100 1000 10000 100000 1000000
Throughput(MB/sec)
# of columns
Reading 15GB (500 rows)
Accumulo
Hbase

Accumulo and HBase
18
0
100
200
300
400
500
600
100 1000 10000 100000 1000000
Throughput(MB/sec)
# of columns
Reading 30GB (1000 rows)
Accumulo
Hbase

Accumulo and HBase
19
0
10
20
30
40
50
60
70
80
100 1000 10000 100000 1000000
Throughput(MB/sec)
# of columns
Writing 15GB (500 rows)
Accumulo
Hbase

Accumulo and HBase
20
0
10
20
30
40
50
60
70
80
100 1000 10000 100000 1000000
Throughput(MB/sec)
# of columns
Writing 30GB (1000 rows)
Accumulo
Hbase

Performance Tweaks – Client Side
• Number of rows/columns
• Batch Writer Threads
• Batch Writer Buffer Size
• Use large buffer for small values
• Use small buffer for large values
• ACCUMULO-2766 possible fix
22
Public Domain via USN

Performance Tweaks – Server Side
• Apply table splits liberally
• Increase automatic split threshold
• Some properties to play with:
• table.compaction.minor.logs.threshold
• tserver.compaction.minor.concurrent.max
• tserver.walog.max.size
• If running with dfs.datanode.synconclose also
enable dfs.datanode.sync.behind.writes
23

24
Thank You! Please visit our booth!
Mike Drob – madrob@cloudera.com
@mikhaildrob

What's hot

HBase: Extreme MakeoverHBaseCon

HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz

Large-scale Web Apps @ PinterestHBaseCon

Apache Hadoop and HBaseCloudera, Inc.

Cross-Site BigTable using HBaseHBaseCon

The State of HBase ReplicationHBaseCon

Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit

Hug Hbase Presentation.Jack Levin

NoSQL & HBase overviewVenkata Naga Ravi

Digital Library Collection Management using HBaseHBaseCon

Evaluating NoSQL Performance: Time for BenchmarkingSergey Bushik

HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon

Meet hbase 2.0enissoz

Hbase: an introductionJean-Baptiste Poullet

Meet HBase 1.0enissoz

HBase Data Modeling and Access Patterns with Kite SDKHBaseCon

Apache HBase for ArchitectsNick Dimiduk

New features in Pig 0.11Hortonworks

HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon

What's hot (19)

HBase: Extreme Makeover

HBase and HDFS: Understanding FileSystem Usage in HBase

Large-scale Web Apps @ Pinterest

Apache Hadoop and HBase

Cross-Site BigTable using HBase

The State of HBase Replication

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

Hug Hbase Presentation.

NoSQL & HBase overview

Digital Library Collection Management using HBase

Evaluating NoSQL Performance: Time for Benchmarking

HBaseCon 2015: HBase Performance Tuning @ Salesforce

Meet hbase 2.0

Hbase: an introduction

Meet HBase 1.0

HBase Data Modeling and Access Patterns with Kite SDK

Apache HBase for Architects

New features in Pig 0.11

HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...

Viewers also liked

Accumulo Summit 2014: Accumulo with Distributed SQL queriesAccumulo Summit

Accumulo designscsorensen

Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit

Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit

Accumulo meetup 20130109Sqrrl

Apache Accumulo and the Data LakeAaron Cordova

Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit

Large Scale Accumulo ClustersAaron Cordova

Accumulo: A Quick IntroductionJames Salter

Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit

Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit

Sqrrl real time_big_data_20130411Sqrrl

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit

Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit

Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataYahoo Developer Network

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit

Apache Accumulo OverviewBill Havanki

SQL on AccumuloDonald Miner

Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein

Introduction to Apache AccumuloAaron Cordova

Viewers also liked (20)

Accumulo Summit 2014: Accumulo with Distributed SQL queries

Accumulo design

Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...

Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]

Accumulo meetup 20130109

Apache Accumulo and the Data Lake

Accumulo Summit 2016: Accumulo in the Enterprise

Large Scale Accumulo Clusters

Accumulo: A Quick Introduction

Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo

Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]

Sqrrl real time_big_data_20130411

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...

Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...

Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...

Apache Accumulo Overview

SQL on Accumulo

Apache Kafka, HDFS, Accumulo and more on Mesos

Introduction to Apache Accumulo

Similar to Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

HBase Low Latency, StrataNYC 2014Nick Dimiduk

Couchbase live 2016Pierre Mavro

Apache HBase Low LatencyNick Dimiduk

How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009

Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen

HBase: Where Online Meets Low LatencyHBaseCon

Accelerating HBase with NVMe and Bucket CacheNicolas Poggi

Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah

Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage

High Concurrency Architecture and Laravel Performance TuningAlbert Chen

Ceph for Big Science - Dan van der SterCeph Community

M6d cassandrapresentationEdward Capriolo

The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi

In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit

Achieving 100k Queries per Hour on Hive on TezDataWorks Summit/Hadoop Summit

Right-Sizing your SQL Server Virtual Machineheraflux

Agility and Scalability with MongoDBMongoDB

Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudCeph Community

Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry

Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith

Similar to Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast? (20)

HBase Low Latency, StrataNYC 2014

Couchbase live 2016

Apache HBase Low Latency

How does Apache Pegasus (incubating) community develop at SensorsData

Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC

HBase: Where Online Meets Low Latency

Accelerating HBase with NVMe and Bucket Cache

Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data

Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...

High Concurrency Architecture and Laravel Performance Tuning

Ceph for Big Science - Dan van der Ster

M6d cassandrapresentation

The state of Hive and Spark in the Cloud (July 2017)

In-memory Caching in HDFS: Lower Latency, Same Great Taste

Achieving 100k Queries per Hour on Hive on Tez

Right-Sizing your SQL Server Virtual Machine

Agility and Scalability with MongoDB

Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud

Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Training state-of-the-art general text embeddingZilliz

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Take control of your SAP testing with UiPath Test SuiteDianaGray10

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

unit 4 immunoblotting technique complete.pptxBkGupta21

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Rise of the Machines: Known As Drones...Rick Flair

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.

Training state-of-the-art general text embedding

Dev Dives: Streamline document processing with UiPath Studio Web

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

TeamStation AI System Report LATAM IT Salaries 2024

DevEX - reference for building teams, processes, and platforms

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Ensuring Technical Readiness For Copilot in Microsoft 365

Take control of your SAP testing with UiPath Test Suite

SIP trunking in Janus @ Kamailio World 2024

unit 4 immunoblotting technique complete.pptx

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Developer Data Modeling Mistakes: From Postgres to NoSQL

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Anypoint Exchange: It’s Not Just a Repo!

Rise of the Machines: Known As Drones...

Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

1. 1 Benchmarking Accumulo: How Fast is Fast? Mike Drob Software Engineer, Cloudera

2. Me • Cloudera Engineer • Accumulo Committer • Perpetual Tinkerer 2 Victor Grigas CC-BY-SA 3.0

3. Agenda • Methodology • Accumulo 1.4 to 1.6 • Accumulo to HBase • Conclusions 3 Reuvenk CC-BY-SA 2.5

4. Methodology • Measuring Performance • Task Latency (time) • Throughput (bps) • Workloads • Read • Write • Mixed 4 AngMoKio CC-BY-SA 2.5

5. Methodology • Yahoo! Cloud Serving Benchmark • Workloads • Connectors • Highly configurable • # of Rows/Columns • Size of Value • # of Threads • Parallelizable number of clients 5 Sfoskett CC BY-SA 3.0

6. 6 Accumulo across versions

7. Accumulo across versions • Accumulo 1.4.4-cdh4.5.0 • Accumulo 1.6.0-cdh4.6.0-beta-1 • YCSB 0.14+50 • 80 node cluster • 10 clients • 5 racks 7 Public Domain via USAF

8. Accumulo across versions The Data: • 200 GB • 2k Columns • Pre-Split Table 80x • Vary # of rows • Vary value size (we actually did a lot more, but it was hard to graph) 8 Morio CC BY-SA 3.0

9. Accumulo across versions 9 0 200 400 600 800 1000 1200 1400 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Read Accumulo 1.4 Accumulo 1.6

10. Accumulo across versions 10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Mixed Accumulo 1.4 Accumulo 1.6

11. Accumulo across versions 11 0 50 100 150 200 250 300 350 400 450 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Write Accumulo 1.4 Accumulo 1.6

12. Accumulo across versions • Write speed improved! • Read speed about the same. • Something weird happens writing 1000 rows. 12 Christopher Foster CC BY-SA 3.0

13. Accumulo across versions So, what happens at 1000 rows…? Nothing. 13 100 200 300 400 500 600 700 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Problem is at 100 rows.

14. 14 Accumulo and HBase

15. Accumulo and HBase • Accumulo 1.6.0-cdh4.6.0-beta-1 • HBase 0.94.15-cdh4.6.0 • YCSB 0.14+50 • 5 worker nodes • 5 split points • 5G Heap, 3G mem map 15 Abdullah AlBargan CC BY-ND 2.0

16. Accumulo and HBase • Single client (5 threads) • Workload sizes • In memory (15G) • Force disk activity (30G) • Constant # of rows • Vary # of columns • Activity • 100% Write • 100% Read 16 nahtanoj CC-BY-2.0

17. Accumulo and HBase 17 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 15GB (500 rows) Accumulo Hbase

18. Accumulo and HBase 18 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 30GB (1000 rows) Accumulo Hbase

19. Accumulo and HBase 19 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 15GB (500 rows) Accumulo Hbase

20. Accumulo and HBase 20 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 30GB (1000 rows) Accumulo Hbase

21. 21 Performance Tweaks

22. Performance Tweaks – Client Side • Number of rows/columns • Batch Writer Threads • Batch Writer Buffer Size • Use large buffer for small values • Use small buffer for large values • ACCUMULO-2766 possible fix 22 Public Domain via USN

23. Performance Tweaks – Server Side • Apply table splits liberally • Increase automatic split threshold • Some properties to play with: • table.compaction.minor.logs.threshold • tserver.compaction.minor.concurrent.max • tserver.walog.max.size • If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes 23

24. 24 Thank You! Please visit our booth! Mike Drob – madrob@cloudera.com @mikhaildrob

Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Similar to Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast? (20)

Recently uploaded

Recently uploaded (20)

Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?