5. Data Assignment in HBase Classic
Page 5
Data is range partitioned and each key belongs to exactly one RegionServer
HBase Table
Keys within HBase
Divided among
different RegionServers
6. Data Assignment with HBase HA
Page 6
Each key has a primary RegionServer and a backup RegionServer
HBase Table
Keys within HBase
Divided among
different RegionServers
7. Differences between Primary and Standby
• Primary:
–Handles reads or writes.
–“Owns” the data and has the latest value.
• Standby
–Handles only reads.
–Data may be stale to some degree.
–When data is read from Standby it is marked as potentially stale.
Page 7
8. HBase HA: Warm Standby RegionServers
Redundant RegionServers provide read availability with
near zero downtime during failures.
Page 8
Client
1 (or more)
standby
RegionServers
RS 1 RS 1*
HDFS
Data replicated via HDFS
9. HBase HA Delivered in 2 Phases
Page 9
HBase HA Phase 1 HBase HA Phase 2
• Standby RegionServers.
• Primary RegionServers configured
to flush every 5 minutes or less.
• Standbys serve reads in < 5s, data
at most 5 minutes stale.
• Standbys serve reads in under 1s.
Stale reads mostly eliminated.
• Write-Ahead Log per RegionServer
• Active WAL tailing in standby
RegionServers.
• Faster recovery of failed
RegionServers.
Note: HA covers read availability. Writes still
coordinated by primaries.
10. What is Timeline Consistency?
• Readers all agree on current value, when it can be
read from the Primary.
• When reading from Secondary, clients see all updates
in the same order.
• Result:
–Eliminates different clients making decisions on different data.
–Simplifies programming logic and complex corner cases versus
eventual consistency.
–Lower latency than quorum based strong consistency.
Page 10
11. Configuring HBase HA: Server Side
Page 11
<property>
<name>hbase.regionserver.storefile.refresh.period</name>
<value>0</value>
<description>
The period (in milliseconds) for refreshing the store files for the secondary
regions. 0 means this feature is disabled. Secondary regions sees new files (from
flushes and compactions) from primary once the secondary region refreshes the list
of files in the region (there is no notification mechanism). But too frequent
refreshes might cause extra Namenode pressure.
</description>
</property>
<property>
<name>hbase.master.loadbalancer.class</name>
<value>org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer</value>
<description>
Only StochasticLoadBalancer is supported for using region replicas
</description>
</property>
Suggested value for refresh period = 300000 (300 seconds / 5 minutes)
Leads to max data staleness of about 5 minutes.
12. Configuring HBase HA: Client Side
Page 12
<property>
<name>hbase.ipc.client.allowsInterrupt</name>
<value>true</value>
<description>
Whether to enable interruption of RPC threads at the client side. This is required
for region replicas with fallback RPC’s to secondary regions.
</description>
</property>
<property>
<name>hbase.client.primaryCallTimeout.get</name>
<value>10000</value>
<description>
The timeout (in microseconds), before secondary fallback RPC’s are submitted for
get requests with Consistency.TIMELINE to the secondary replicas of the regions.
Defaults to 10ms. Setting this lower will increase the number of RPC’s, but will
lower the p99 latencies.
</description>
</property>
Reaching out to secondary RegionServers is an option per request
14. Off-Heap Support
• Using off-heap memory allows RegionServers to scale
beyond traditional 16GB barrier.
• Benefits:
–Eliminates latency hiccups related to garbage collection pauses.
–Makes it easier to run HBase on servers with large RAM.
–Certified up to 96GB off-heap memory in one RegionServer.
Page 14
Managed by JVM
Garbage Collection
Managed by HBase
On-Heap Memory Off-Heap Memory
RegionServer Process
15. 4.485 3 12 19 29
610
4.458 3 11 18 27
134
0
100
200
300
400
500
600
Avg Median 95% 99% 99.9% Max
Latency Comparison: On-Heap versus Off-Heap
Point Lookups, 400GB Dataset, 75% of data in memory
On Heap Off Heap (BucketCache)
HBase Off-Heap Reduces Latency
Page 15
50 concurrent clients
Latency (ms)
16. Fast Access to Big Data with Off-Heap
Page 16
1.3 4.1
14.3
20.0
27.7
38.7
265.0
0.0
50.0
100.0
150.0
200.0
250.0
300.0
Median Average 95% 99% 99.90% 99.99% 99.999%
Latency Measures using Off-Heap
Point Lookups, 3TB Dataset, 100% of data in memory
Latency (ms) 50 concurrent clients
Throughput = 1095 reqs/s
17. More to come…
• SlabCache, BucketCache(Hbase-7404)
• Hbase-9535: network interface
• Hbase-10191: new read/write pipeline with end-to-end offheap
Page 17
19. Analytics over HBase Snapshots
• What is it?
• Introduces the ability to run Hive queries over HBase snapshots.
• Why is this important?
• Hive can access the data via disk rather than via networking.
• More performant and less disruptive to other HBase clients.
• When to use it?
• Use this feature when you have full-table scans over all data in HBase.
• Not appropriate for analytics of small subsets of data in HBase.
• Note:
• Snapshot data may not be the latest.
• Tradeoff between performance and data freshness.
20. Hive over HBase Snapshot: About 2.5x
Faster
Query Run Workload
Snapshot Time
(s)
Direct Time
(s)
Time X
Factor
count(*) 1 a 191.019 488.915 2.56x
count(*) 2 a 200.641 480.837 2.40x
Aggregate 1 field 1 a 214.452 499.304 2.33x
Aggregate 1 field 2 a 217.744 500.07 2.30x
Aggregate 9 fields 1 a 281.514 802.799 2.85x
Aggregate 9 fields 2 a 272.358 785.816 2.89x
Aggregate 1 with
GBY
1 a 248.874 558.143 2.24x
Aggregate 1 with
GBY
2 a 269.658 533.562 1.98x
count(*) 1 b 194.739 482.261 2.48x
count(*) 2 b 195.178 481.437 2.47x
Aggregate 1 field 1 b 220.325 498.956 2.26x
Aggregate 1 field 2 b 227.117 489.27 2.15x
Aggregate 9 fields 1 b 276.939 817.118 2.95x
Aggregate 9 fields 2 b 290.288 876.753 3.02x
Aggregate 1 with
GBY
1 b 244.025 563.884 2.31x
Aggregate 1 with
GBY
2 b 225.431 570.723 2.53x
count(*) 1 c 194.568 502.79 2.58x
count(*) 2 c 205.418 508.319 2.47x
Aggregate 1 field 1 c 209.709 531.39 2.53x
Aggregate 1 field 2 c 217.551 526.878 2.42x
Aggregate 9 fields 1 c 267.93 756.476 2.82x
Aggregate 9 fields 2 c 273.107 723.459 2.65x
Aggregate 1 with
GBY
1 c 240.991 526.053 2.18x
Aggregate 1 with
GBY
2 c 258.06 527.845 2.05x
Test Scenario:
• YCSB Data Load.
• 180 million rows.
• 20 node cluster, 6 disk/node, 10GB net.
• Query run while simultaneously
running a YCSB workload.
• Direct time = query via HBase API.
• Snapshot time = query by reading
snapshot.
• Query over snapshot ~2.5x faster.
21. Analytics over HBase Snapshot: Usage
Patterns
Co-located Analytics
Note: Consider tuning these values:
• hbase.client.retries.number
• hbase.rpc.timeout
• zookeeper.session.timeout
• zookeeper.recovery.retry
If using co-located analytics.
HBase
Clients
Snapsh
ots
Tez /
MR
Operational Reporting
HBase
1
HBase
2Replication
Clients
Snapsh
ots
Tez /
MR
Better for strict SLAs.
22. Analytics over HBase Snapshots: Example
# Create a snapshot in the HBase Shell or via API.
snapshot ‘usertable’, ‘snapshot_2014_08_03’
# Refer to the same snapshot in the Hive shell.
set hive.hbase.snapshot.name=snapshot_2014_08_03;
set hive.hbase.snapshot.restoredir=/tmp/restore;
select count(*) from hbase_table;
# You can “unset hive.hbase.snapshot.name” to stop using the
snapshot.
Note: Be sure to delete your snapshots after you’re done with them.
24. Deploying HBase with Slider
• What is it?
• Deploy HBase into the Hadoop cluster using YARN.
Benefit Details
Simplified
Deployment
No need to deploy HBase or its configuration to
individual cluster nodes.
Lifecycle
Management
Start / stop / process management handled
automatically.
Multitenancy Different users can run HBase clusters within one
Hadoop cluster.
Multiple Versions Run different versions of HBase (e.g. 0.98 and 1.0) on
the same cluster.
Elasticity Cluster size is a parameter and easily changed.
Co-located
Analytics
HBase resource usage is known to YARN, nodes
running HBase will not be used as heavily to satisfy
MapReduce or Tez jobs.
27. HBase Cell Level Security
• Table/Column Family ACLs since 0.92
• HBase-6222: Add per-KeyValue security since 0.98
• APIs stable as of 1.0
Page 27
28. Security in Hadoop with HDP + XA Secure
Authorization
Restrict access
to explicit data
Audit
Understand
who did what
Data Protection
Encrypt data at
rest & in motion
• Kerberos in
native Apache
Hadoop
• HTTP/REST
API Secured
with Apache
Knox Gateway
• MapReduce Access Control Lists
• HDFS Permissions, HDFS ACL,
• Audit logs in with HDFS & MR
• Hive ATZ-NG
• Cell level access control in
Apache Accumulo
Authentication
Who am I/prove
it?
• Wire
encryption in
Hadoop
• Orchestrated
encryption
with 3rd party
tools
• HDFS, Hive
& Hbase
• RBAC
• Centralized
audit
reporting
• Policy and
access
history
• Future
roadmap
• Strategy to
be finalized
HDP2.1XASecure
Centralized Security Administration
• As-Is, works
with current
authenticatio
n methods
29. XA Secure Integration with Hadoop
Hadoop
distributed file
system (HDFS)
XA Administration Portal
HBase
Hive
Server2
XA Policy
Server
XA Audit
Server
XA
Agent
HadoopComponentsEnterprise
Users
XA
Agent
XA
Agent
Legacy
Tools
Integration
API
RDBM
S
HDFS
Search
Falcon
XA
Agent
*
XA
Agent
*
XA
Agent
*
Storm
YARN : Data Operating System
XA
Agent
*
* - Future Integration
30. Simplified Workflow - HBase
30
XA
Policy
Manag
er
XA
Agent
Admin sets policies for
HBase table/cf/column
Data scientist
runs a map
reduce job
User
Applicati
on
HBase
Server
Audit
Databas
e
Audit logs pushed to
DB
HBase
server
provide
data access
to users
1
2
3
4
5
IT users
access
HBase via
HBShell
2
HBase
Authorizes
with XAAgent
Users access HBase
data using Java API
2
32. Stability: Co-Locate Meta with Master
(HBASE-10569)
• Simplify, Improve region assignment reliability
– Fewer components involved in updating “truth”. (ZK-less region assignment,
HBASE-11059)
• Master embeds a RegionServer
– Will host only system tables
– Baby step towards combining RS/Master into a single hbase daemon
• Backup masters unchanged
– Can be configured to host user tables while in standby
• Plumbing is all there, OFF by default
– Jira: HBASE-10569.
33. Availability: Region Replicas
• Multiple RegionServers host a Region
– One is primary, others are replicas
– Only primary accepts writes
• Client reads against primary only or any
– Results marked as appropriate
• Baby step towards quorum reads, writes
• Plumbing is all there, OFF by default
– Jira: HBASE-10070.
34. New and Noteworthy
• Client API cleanup: jira HBASE-10602
• Automatic tuning of global MemStore and BlockCache
sizes
• BucketCache easier to configure
• Compressed BlockCache
• Pluggable replication endpoint
• A Dockerfile to easily build and run HBase from source
…
35. Under the Covers
• Zookeeper abstractions
• Meta table used for assignment
• Cell-based read/write path
• Combining mvcc/seqid
• Sundry security, tags, labels improvements
36. Groundwork for 2.0
• More, Smaller Regions
– Millions, 1G or less (HBASE-11165)
– Less write amplification
– Splitting hbase:meta
• Performance
– More off-heap
– Less resource contention
– Faster region failover/recovery
– Multiple WALs
– QoS/Quotas/Multi-tenancy
• Rigging
– Faster, more intelligent assignment
– Procedure bus (HBASE-12439)
– Resumable, query-able operations
• Other possibilities
– Quorum/consensus reads, writes?
– Hydrabase, multi-DC consensus?
– Streaming RPCs?
– High level coprocessor API?
37. References
• Enis Soztutar: Hbase Read High Availability Using
Timeline Consistent Region Replicas
• Nick Dimiduk: Apache HBase 1.0 Release
• …
Page 37
Hinweis der Redaktion
Create a replicated table
Insert some data into it
flush it
Kill the primary
Attempt a write – fails
Read a value
Test performed using 6 AWS nodes (i2.8xlarge) + 5 client nodes (m2.4xlarge)
Test performed using 6 AWS nodes (i2.8xlarge) + 5 client nodes (m2.4xlarge)