This document discusses HBase, an open-source, non-relational, distributed database built on top of Hadoop. It provides an overview of why HBase is useful, examples of how Navteq uses HBase at scale, and considerations for designing HBase schemas and deploying HBase clusters, including hardware requirements and configuration tuning. The document also outlines some desired future features for HBase like better tools, secondary indexes, and security improvements.
2. Topics
Why HBase?
HBase Usecases – HBase @Navteq
Design Considerations
Hardware/Deployment Considerations
Practical Tips (Tuning/Optimization)
Wanted Features
Ravi Veeramachaneni HBase – In Practice 2
3. Hadoop Benefits
• Stores (HDFS) and Process (MR) large amounts of data
• Scales (100s and 1000s of nodes)
• Inexpensive (no license cost, low cost hardware)
• Fast (1TB sort in 62s, 1PB in 16.25h*)
• Availability (failover built into the platform)
• Data Recoverability (failure should not result in any data
loss)
• Replication (out-of-the-box 3-way replication and
configurable)
• Better Throughput (Time to read the whole dataset is more
important than latency in reading the first record)
• Write once and read-many-times pattern
• Works well with structured, unstructured or semi-structured
data
*YDN Blog: Jim Gray’s Benchmark @ http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_petabyte_in_162/
Ravi Veeramachaneni HBase – In Practice 3
4. But …
Not so good or does not support
• Random access
• Updating the data and/or file (writes are always
at the EOF)
• Apps that require low latency access to data
• Does not to support lots of small files
• Does not support multiple writers
• Not a solution for every Data problem
Ravi Veeramachaneni HBase – In Practice 4
5. Featuring HBase
HBase Scales (runs on top of Hadoop)
HBase provides fast table scans for time ranges and
fast key based lookups
HBase stores null values for free
• Saves both disk space and disk IO time
HBase supports unstructured/semi-structured data through
column families
HBase has built-in version management
Map Reduce data input
• Tables are sorted and have unique keys
• Reducer often times optional
• Combiner not needed
Strong community support and wider adoption
Ravi Veeramachaneni HBase – In Practice 5
6. HBase Usecases
To solve Big Data problems
Sparse data (un- or semi-structured)
Cost effectively Scalable
Versioned data
Some other features may interest to you
Linear distribution of data across the data nodes
Rows are stored in byte-lexographic sorted order
Atomic Read/Write/Update
Data Access – Random, Sequential reads and writes
Automatic replication of Data for HA
But not for every Data problem
Ravi Veeramachaneni HBase – In Practice 6
7. Navteq’s Usecase
Content is
– Constantly growing (in higher TB)
– Sparse and unstructured
– Provided in multiple data formats
– Ingested, processed and delivered in transactional and batch mode
Content Breadth
– 100s of millions of content records
– 100s of content suppliers + community input
Content Depth
– On average, a content record has 120 attributes
– Certain types of content have more than 400 attributes
– Content classified across 270+ categories
Ravi Veeramachaneni HBase – In Practice 7
8. Content Processing High-level Overview
Batch and Transactional API
Bulk Content Customer and
Sources Community UGC
Merchant Community, Us
Data er and
Merchant
Media
Place ID from Place Registry
Location ID from Location Referencing
Source & Blended Record Management Tiered Quality System
PUBLISHING
real-time, on-demand
Place ID Bulk Content delivery; Search, and
Location ID other mobile devices
Ravi Veeramachaneni HBase – In Practice 8
9. HBase @ NAVTEQ
Started in 2009, hbase 0.19.x (apache)
• 8-node VMWare Sandbox Cluster
• Flaky, unstable, RS Failures
• Switched to CDH
Early 2010, hbase 0.20.x (CDH2)
• 10-node Physical Sandbox Cluster
• Still had lot of challenges, RS Failures, META corruption
• Cluster expanded significantly with multiple environments
Current (hbase 0.90.3)
• Moved to CDH3u1 official release
• Multiple teams/projects using Hadoop/HBase implementation
• Working on Hive/HBase integration, Oozie, Lucene/Solr
integration, Cloudera Enterprise and few other
Ravi Veeramachaneni HBase – In Practice 9
10. Measured Business Value
Scalability & Deployment
• Handling spikes are addressed by simply adding nodes
• No code changes or deployment needed
• From 15 to 30 to 60 nodes and more, as data grows
• Deployment are well managed and controlled (from 12-16
hours to < 2 hours)
Speed to Market
• By supporting Real-time transactions (instead of quarterly
update)
• Batch updates are handled more efficiently (from days to
hours)
Faster Supplier On-boarding
• Flexible and externally managed Business Rules
Cheaper than the existing solution
<$2m vs. $12m (based on projected growth)
Ravi Veeramachaneni HBase – In Practice 10
11. HBase & Zookeeper
ZK – Distributed coordination service
• Coordinates messages sent across the network between nodes
(network fails, etc.)
HBase depends on ZK and authorizes ZK to manage the state
HBase hosts key info on ZK
• Location of root catalog table
• Address of the current cluster master
• Bootstrapping a client connection to an HBase cluster
Client connects to ZK quorum first
• To learn the location of -ROOT-
• Clients consult -ROOT- to elicit the location of the .META. Region
• Client then does a lookup against the found .META. Region to figure
the hosting user-space region and its location
• Clients caches all the above for future traversing
Ravi Veeramachaneni HBase – In Practice 11
12. Design Considerations
Database/schema design
• Transition to Column-oriented or flat schema
Understand your access pattern
Row-key design/implementation
• Sequential keys
• Suffers from distribution of load but uses the block caches
• Can be addressed by pre-splitting the regions
• Randomize keys to get better distribution
• Achieved through hashing on Key Attributes – SHA1 or MD5
• Suffers range scans
Too many Column Families (NOT Good)
• Initially we had about 30 or so, now reduced to 8
Compression
• LZO or Snappy (20% better than LZO) – Block (default)
Ravi Veeramachaneni HBase – In Practice 12
13. Design Considerations
Serialization
• AVRO didn’t work well – deserialization issue
• Developed configurable serialization mechanism that uses JSON
except Date type
Secondary Indexes
• Were using ITHBase and IHBase from contrib – doesn’t work well
• Redesigned schema without need for index
• We still need it though
Performance
• Several tunable parameters
• Hadoop, HBase, OS, JVM, Networking, Hardware
Scalability
• Interfacing with real-time (interactive) systems from batch oriented
system
Ravi Veeramachaneni HBase – In Practice 13
15. Hardware/Deployment Considerations
Hardware (Hadoop+HBase)
• Data Node - 24GB RAM, 8 Cores, 4x1TB (64GB, 24 Cores, 8x2TB)
• 6 mappers and 6 reducers per node (16 mappers, 4 reducers)
• Memory allocation by process
• Data Node – 1GB (2GB)
• Task Tracker – 1GB (2GB)
• Map Tasks – 6x1GB (16x1.5GB)
• Reduce Tasks – 6x1GB (4x1.5GB)
• Region Server – 8GB (24GB)
• Total Allocation: 24GB (64GB)
Deployment
• Do not run ZK instances on DN, have a separate ZK quorum (3
minimum)
• Do not run HMaster on NN
• Avoid SPOF for HMaster (run additional master(s))
Ravi Veeramachaneni HBase – In Practice 15
16. HBase Configuration/Tuning
Configuring HBase
• Configuration is the key
• Many moving parts – typos, out of synchronization
• Operating System
• Number of open files (ulimit) to 32K or even higher (/etc/security/limits.conf)
• vm.swapiness to lower or 0
• HDFS
• Adjust block size based on the use case
• Increase xceivers to 2047 (dfs.datanode.max.xceivers)
• Set socket timeout to 0 (dfs.datanode.socket.write.timeout)
• HBase
• Needs more memory
• No swapping – JVM hates it
• GC pauses could cause timeouts or RS failures (read article posted by
Todd Lipcon on avoiding full GC)
Ravi Veeramachaneni HBase – In Practice 16
17. HBase Configuration/Tuning
HBase
• Per-cluster
• Turn-off block cache if the hit ratio is less (hfile.block.cache.size, default
20%)
• Per-table
• MemStore flush Size (hbase.hregion.memstore.flush.size, default 64MB and
hbase.hregion.memstore.block.multiplier, default 2)
• Max File Size (hbase.hregion.max.filesize, default 256MB)
• Per-CF
• Compression
• Bloom Filter
• Per-RS
• Amount of heap in each RS to reserve for all MemStores
(hbase.regionserver.global.memstore.upperLimit, default 0.4)
• MemStore flush size
• Max file size
• Per-SF
• Maximum number of SFs per store to allow
(hbase.hstore.blockingStoreFiles, default 7)
Ravi Veeramachaneni HBase – In Practice 17
18. HBase Configuration/Tuning
• HBase
• Write (puts) optimization (Ryan Rawson HUG8 presentation – HBase
importing)
– hbase.regionserver.global.memstore.upperLimit=0.3
– hbase.regionserver.global.memstore.lowerLimit=0.15
– hbase.regionserver.handler.count=256
– hbase.hregion.memstore.block.multiplier=8
– hbase.hstore.blockingStoreFiles=25
• Control number of store files (hbase.hregion.max.filesize)
Security
• Still in flux, need robust RBAC
Reliability
• Name Node is SPOF
• HBase is sensitive
• Region Server Failures
Ravi Veeramachaneni HBase – In Practice 18
19. Desired Features
Better operational tools for using Hadoop and HBase
• Job management, backup, restore, user provisioning, general
administrative tasks, etc.
Support for Secondary Indexes
Full-text Indexes and Searching (Lucene/Solr integration?)
HA support for Name Node
Need Data Replication for HA & DR
Security at Table, CF and Row level
Good documentation (it’s getting better though) – now Lars
book out
Ravi Veeramachaneni HBase – In Practice 19