HBase with MapR

Running HBase with the MapR distribution
Tomer Shiran
Director of Product Management, MapR Technologies

7/23/2012 ©MapR Technologies 1

Agenda
• The HBase volume
• HBase backups with snapshots
• Mirroring
• Tuning memory settings
• Architecting applications with many objects


MapR
• Complete Hadoop distribution
• Makes it easy to deploy HBase
• MapR 1.2 includes HBase 0.90.4 + 15 patches

• Seeing huge growth in HBase adoption
• Thanks to everyone in this room!

• MapR expands the market for HBase
• Enterprises require HA, data protection and disaster recovery
• MapR makes it easier to run HBase in production
 One minute to set up hourly snapshots
 One minute to set up cross-datacenter mirroring
 No need to worry about NameNode


Volumes – easy data management
• MapR makes data
management easier with
volumes
• Volumes are directories
with management policies
• Replication, snapshots,
mirroring, data placement
control, quotas, usage
tracking, …
• Each user/project
directory should be a
volume
• 100K volumes not a
problem


The HBase volume
• All HBase data should be in one volume
• HBase WALs are per RegionServer, so can’t create per-table volumes
• A volume for HBase data is created on installation
• Name: hbase.volume
• Mount path: /hbase
• Replication optimized for low latency
• Star replication beats chain replication for HBase
• For bulk load, create the HFiles in the HBase volume (/hbase)

# cd /mapr/default/hbase Reminder: A MapR
# ls -la
total 7
cluster can be mounted
drwxrwxrwx 13 root root 12 2012-01-16 11:44 . via NFS so cd and ls
drwxrwxrwx 6 root root 7 2012-01-13 16:08 .. just work
drwxrwxrwx 3 root root 1 2012-01-15 11:30 AdImpressions
-rwxrwxrwx 1 root root 3 2011-12-16 13:03 hbase.version
drwxrwxrwx 5 root root 3 2012-01-12 15:28 .logs All WALs are in .logs,
drwxrwxrwx 3 root root 1 2011-12-16 13:03 .META. not in the user table
drwxrwxrwx 2 root root 0 2012-01-13 14:29 .oldlogs
drwxrwxrwx 3 root root 1 2011-12-16 13:03 -ROOT- directories
drwxrwxrwx 3 root root 1 2012-01-16 11:44 Users (AdImpressions, Users)


HBase backups with snapshots
• Why snapshots?
• Consistent – HFiles and HLogs at the same point in time
• No downtime – snapshot a live HBase cluster, no performance impact
• No data duplication – takes seconds to snapshot petabytes
• Short RPOs – snapshot hourly or more frequently

• Access HBase snapshots in /hbase/.snapshot:
# cd .snapshot
# pwd
/mapr/default/hbase/.snapshot
# ls -la
total 3
drwxr-xr-x 5 root root 3 Jan 16 16:02 .
drwxrwxrwx 7 root root 6 Jan 16 11:46 ..
drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.14-02-02
# ls -a 2012-01-16.16-02-02
. .. AdImpressions hbase.version .logs .META. .oldlogs -ROOT-


Manage your schedules


Choose a snapshot schedule for HBase

Use this GUI dialog, or the CLI
or REST API

Choose a snapshot schedule
for this volume


Mirroring

Mirror to…
• Research cluster
• Failover (DR) cluster
• Remote backup cluster
• Same cluster!
•…

Fast (and easy) Safe Flexible

• Differential (deltas) • Consistent (snapshot) • Scheduled or on-
• Compressed • Checksummed demand
• Intranet, WAN or
Sneakernet


Mirroring the HBase volume

Create a new volume on
destination cluster. Choose
Remote Mirroring Volume type

Choose source cluster and
volume (mapr.hbase)

Choose mirroring schedule


Mirroring vs. HBase master/slave replication
• Block level
• No need to run HBase on sink cluster
• Only latest update to the a block needs to be sent
 With master/slave every operation is sent

• MapR mirroring is practically stateless
• Each sink cluster keeps one integer – a serial number
 When asking for the next update, sink provides most recently seen serial
number
• Master cluster does not keep any state
 No resources consumed on the master cluster
• No ZooKeeper involved
• Master/slave replication is challenging when it gets out of sync

• One system for mirroring both HBase and file/directories


Warden
• Warden runs on each server
• /etc/init.d/mapr-warden start
• Warden starts/manages services on the node
• Warden decides how much memory to give each
service based on settings in warden.conf

# cat /opt/mapr/conf/warden.conf
…
service.command.hbregion.heapsize.percent=25
service.command.hbregion.heapsize.max=4000
service.command.hbregion.heapsize.min=1000
service.command.mfs.heapsize.percent=20
service.command.mfs.heapsize.min=512
…


Tuning memory settings
• The defaults are suitable in most cases

• Guidelines:
• Don’t exceed 100-200 regions per server
• Don’t give RegionServer more than 16GB RAM
 Garbage collection might kill you
• Give spare memory to FileServer
 Written in C/C++ (unlike HDFS DataNode)
 Advanced caching and prefetching
• Don’t enable TaskTracker unless you need it
 Or Warden will reserve memory for tasks
 If TaskTracker not enabled and mfs.heapsize.max not in
warden.conf, Warden assigns spare memory to FileServer


Architecting applications with many objects
• MapR supports up to 1 trillion files (small files OK)
• Fully distributed metadata
 No NameNode or block reports
• Extremely fast random I/O (10-1000x compared to HDFS)
• With HDFS Federation the upcoming HA NameNode you would need 20K
NameNodes and an HA NetApp :-)

• Keep smaller objects in HBase and larger objects (> 100KB) in MapR
storage services

Metadata (IDs, attributes, etc.)

Content (messages, attachments, etc.)

HBase

MapR storage services


Three ways to access the files
• NFS
• Mount the cluster over NFS
• NFS HA ensures availability – MapR assigns and manages virtual IPs
• No client library, works with any language
$ mount –o … mycluster:/mapr /mapr
$ python
>>> with open(r'/mapr/mycluster/images/asdfghjkl', 'w') as f:
... f.write(…)

• Java – Hadoop FileSystem API
FileSystem fs = FileSystem.get(new Configuration());
FSDataOutputStream out = fs.create(…);
out.write(…)

• C/C++ – native libhdfs library (MapR 1.2+)
• Same API (header file) as libhdfs, but no Java involved
hdfsFS fs = hdfsConnect(...);
hdfsFile f = hdfsOpenFile(fs, ...);
hdfsWrite(fs, f, ...);

Questions?


HBase with MapR

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to HBase with MapR

Similar to HBase with MapR (20)

Recently uploaded

Recently uploaded (20)

HBase with MapR