DevoxxFR 2024 Reproducible Builds with Apache Maven
HBase and Accumulo | Washington DC Hadoop User Group
1. HBase and Accumulo
Washington DC Hadoop User Group
Jan 25th, 2012
Todd Lipcon
Software Engineer, Cloudera
todd@cloudera.com / @tlipcon
Copyright 2011 Cloudera Inc. All rights reserved
2. Background – Overview
• HBase and Accumulo are both open-source, Apache
2.0 licensed implementations of Google’s BigTable
infrastructure, running on Apache Hadoop
• Scalable, distributed storage
• Scalable data storage at petabyte scale, storing trillions of
rows distributed across hundreds or thousands of machines
• Automatic fault tolerance and data distribution as machines
crash or rejoin the cluster
• Linear scaling of IOPS and data capacity by adding servers
• Data model is a big sorted hierarchical map
Copyright 2012 Cloudera Inc. All rights reserved 2
3. Sorted Map Datastores
• Each row has a row key (like a Primary Key in RDBMS
terms)
• Users may query by exact row key or by range of row keys
• Data is always stored and returned in sorted order
• Each row has some number of columns
• Each column has a qualifier and some piece of data. Like a
Map<byte[], byte[]>
• Different rows may have different sets of columns
• Each cell has an associated timestamp and may retain a
history of previous values
• Columns are grouped into column families and locality
groups
Copyright 2012 Cloudera Inc. All rights reserved 3
4. Sorted Map Datastore
(logical view as “records”)
Implicit PRIMARY KEY in
RDBMS terms Data is all byte[] in HBase
Row key Data
Different types of
data separated into
cutting info: , ‘height’: ‘9ft’, ‘state’: ‘CA’ -
different roles: , ‘ASF’: ‘Director’, ‘Hadoop’: ‘Founder’ -
“column families” tlipcon info: , ‘height’: ‘5ft7, ‘state’: ‘CA’ -
roles: , ‘Hadoop’: ‘Committer’@ts=2010,
‘Hadoop’: ‘PMC’@ts=2011,
‘Hive’: ‘Contributor’ -
Different rows may have different sets A single cell might have different
of columns(table is sparse) values at different timestamps
Useful for *-To-Many mappings
5. Locality Groups
• Different sets of columns may have different properties
and access patterns
• Perhaps a few columns are accessed all the time, whereas
others are large and rarely needed
• For example, a user’s metadata (1kb, accessed frequently) and
their photo (1MB, cached by CDN and accessed rarely)
• Put metadata in one locality group and photos in
another
• Locality groups stored separately on disk: access just the
metadata without reading the photo
6. Sorted Map Datastore
(physical view as “cells”)
info Column Family / Locality Group
Row key Column key Timestamp Cell value
cutting info:height 1273516197868 9ft
cutting info:state 1043871824184 CA
tlipcon info:height 1273878447049 5ft7
tlipcon info:state 1273616297446 CA
roles Column Family / Locality Group
Row key Column key Timestamp Cell value
cutting roles:ASF 1273871823022 Director
Sorted
on disk by cutting roles:Hadoop 1183746289103 Founder
Row key, Col tlipcon roles:Hadoop 1300062064923 PMC
key,
descending tlipcon roles:Hadoop 1293388212294 Committer
timestamp
tlipcon roles:Hive 1273616297446 Contributor
Milliseconds since unix epoch
8. Accumulo/HBase Terminology
Accumulo HBase Definition
Tablet Region A partition of a table (eg email inboxes starting
with ‘a’-’c’)
TabletServer RegionServer A server in the cluster which hosts a number of
tablets/regions, providing read/write access
Log/WAL HLog/WAL Write-ahead log – used for durably logging edits
Minor Flush Writing data from memory to disk
compaction
Major Minor Merging several on-disk files into a larger one
compaction Compaction
Major Major Merging all of the on-disk files into a larger one
compaction compaction
with all files
Copyright 2012 Cloudera Inc. All rights reserved 8
9. That’s all the intro we have time for…
• Check out the excellent Accumulo manual at
http://incubator.apache.org/accumulo
• And the HBase manual at
http://hbase.apache.org/book.html
• Also some longer intro videos on Cloudera’s website,
and an excellent O’Reilly book
Copyright 2012 Cloudera Inc. All rights reserved 9
10. Commonalities (the non-controversial stuff)
• Both systems scale well
• Clusters with >1000 nodes, >1PB
• Example HBase users: StumbleUpon, TrendMicro, Facebook,
eBay, Flurry, ngmoco, Mozilla, Adobe, etc.
• Example Accumulo users: ??????? (I don’t have clearance but
I’m told they’re big and important)
• Both systems perform well
• Depending on tuning, one might beat the other at any given
benchmark, but overall results seem comparable
• Both open source with active development
Copyright 2012 Cloudera Inc. All rights reserved 10
11. Commonalities (the non-controversial stuff)
• Storage formats are very similar
• Used to be the same, then diverged, then re-converged!
• Multi-level BTrees, bloom filters, compression
• Prefix compression currently missing in HBase, 95% complete
for 0.94.0
• Caching code very similar
• Accumulo uses an older version of HBase’s LRUBlockCache
• HBase has some recent improvements (off-heap cache), but I
imagine Accumulo will grab them soon enough.
Copyright 2012 Cloudera Inc. All rights reserved 11
12. General features
• Both have good MapReduce integration
• Both have a command-line shell
• Both have a pretty good test suite
• Accumulo used to be ahead here, but we traded off some
ideas and use similar testing strategies now
• Both use ZooKeeper for fault tolerant metadata storage,
and support failover Masters
Copyright 2012 Cloudera Inc. All rights reserved 12
13. Now for the fun part… BigTable shootout 2012
• Warning: I am necessarily biased as an HBase
committer.
• I will be comparing the very latest versions
• HBase 0.92.0 (released only 2 days ago!)
• Accumulo 1.4 (not yet released, due out mid Feb?)
• Please feel free to loudly disagree after the talk during
the time allotted for questions – I am happy to be
proven wrong! I’ll invite Aaron Cordova and John Vines
up to help answer questions.
Copyright 2012 Cloudera Inc. All rights reserved 13
14. Differences – Active contributors and users
(plus various contractors thereof)
(I ran out of space)
Copyright 2012 Cloudera Inc. All rights reserved 14
15. Differences – User Mailing list activity
500-600 messages 50-100 messages
per month (peak per month (peak
1088) 105)
*but it’s new at Apache+
Winner:
Copyright 2012 Cloudera Inc. All rights reserved 15
16. Differences – Access Control
• Accumulo has per-cell visibility labels as well as table
ACLs
• Each cell has an ACL of what users may see it. (eg
(TS|(SECRET&PROJECTX)))
• Users who don’t have access can’t tell the cell even exists
• Very useful for classified information!
• HBase has column family ACLs but no built-in per-cell
visibility support
• Some early work to add visibility labels, but not done yet
Winner:
Copyright 2012 Cloudera Inc. All rights reserved 16
17. Differences – Authentication
• Accumulo has a built-in user database
• Users are authenticated by username/password
• Passed in plaintext over the wire
• HBase optionally uses Kerberos
• Central administration (eg via Active Directory)
• Key-based secure credential exchange
• Temporary delegation tokens are created for MR jobs, so even
if a job’s data leaks, credentials are not compromised
• Consistent with rest of Hadoop ecosystem
Winner:
Copyright 2012 Cloudera Inc. All rights reserved 17
18. Differences – Locality Groups
• HBase has a 1:1 correspondence of Column Families
and Locality Groups
• Moving columns from one locality group to another after data
has been inserted is impossible
• Accumulo has a proper distinction and allows online
reassignment of column-to-locality-group mappings
Winner:
Copyright 2012 Cloudera Inc. All rights reserved 18
19. Differences – extensibility frameworks
• Accumulo has iterators
• Allows custom processing to be inserted in the read path as
well as into the table maintenance code. Provides neat
features like automatic summary maintenance, for example.
• HBase has coprocessors
• Much more general framework that also subsumes triggers,
stored procedures, and cluster management hooks. (e.g
Access Control is an HBase coprocessor).
• Generality has its cost: very difficult to do some things that
are simple with iterators
• Some iterator use cases can be done with HBase filters
• I’ll call this one a tie
Copyright 2012 Cloudera Inc. All rights reserved 19
20. Differences – Web UI and Monitoring
Winner:
Copyright 2012 Cloudera Inc. All rights reserved 20
21. Differences – Write-ahead logging
• HBase uses HDFS files as a WAL
• Takes advantage of HDFS performance improvements as they
are developed
• Same trusted replication and checksumming schemes as HDFS
• Accumulo has its own Logger implementation
• Extra daemons to run
• Does not leverage improvements in HDFS
• Won’t re-replicate if loggers go down
Winner:
Copyright 2012 Cloudera Inc. All rights reserved 21
22. Differences – Other features
• Accumulo has a nice mock Accumulo implementation
• Nice for testing user software
• Accumulo supports isolated scans on super-wide rows
• HBase supports wide rows but isolation properties are lost
• Accumulo supports tablet merging
• If tablets get too small, they’ll merge with neighbors
• Accumulo supports table snapshotting/cloning
• Other sundry features: logical clocks, RPC tracing, RPC
wire compatibility, and more.
Copyright 2012 Cloudera Inc. All rights reserved 22
23. Differences – Other features
• HBase has RPM and Debian packages as part of Apache
BigTop
• Integrated (and integration-tested) with Hive, Pig, and others
• HBase has commercial support available from Cloudera,
as well as several vendors and other projects building
on top (Lily, OMID, etc)
• HBase has first-class support for REST clients and thin
Thrift clients
• HBase has inter-cluster wide-area replication
• HBase has significantly more advanced bloom filters
and other such optimizations (thanks Facebook!)
Copyright 2012 Cloudera Inc. All rights reserved 23
24. Summary
• Neither system is better!
• One system may very well be better for your use case,
or for the community you want to interact with
• Over time, the feature sets are converging
• RFile vs HFile v2, Security, Caching, Compaction policies,
Iterators/Coprocessors
• Now that both projects are in Apache, open dialogue,
code sharing, and friendly competition will help make
both projects better!
Copyright 2012 Cloudera Inc. All rights reserved 24
25. Thanks!
Aaron Cordova and John Vines
(Accumulo committers) will now join
me for some discussion / questions
Email: todd@cloudera.com
Twitter: @tlipcon
Copyright 2012 Cloudera Inc. All rights reserved 25
Hinweis der Redaktion
Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.