More Related Content
Similar to HBase with MapR
Similar to HBase with MapR (20)
HBase with MapR
- 1. Running HBase with the MapR distribution
Tomer Shiran
Director of Product Management, MapR Technologies
7/23/2012 ©MapR Technologies 1
- 2. Agenda
• The HBase volume
• HBase backups with snapshots
• Mirroring
• Tuning memory settings
• Architecting applications with many objects
7/23/2012 ©MapR Technologies 2
- 3. MapR
• Complete Hadoop distribution
• Makes it easy to deploy HBase
• MapR 1.2 includes HBase 0.90.4 + 15 patches
• Seeing huge growth in HBase adoption
• Thanks to everyone in this room!
• MapR expands the market for HBase
• Enterprises require HA, data protection and disaster recovery
• MapR makes it easier to run HBase in production
One minute to set up hourly snapshots
One minute to set up cross-datacenter mirroring
No need to worry about NameNode
7/23/2012 ©MapR Technologies 3
- 4. Volumes – easy data management
• MapR makes data
management easier with
volumes
• Volumes are directories
with management policies
• Replication, snapshots,
mirroring, data placement
control, quotas, usage
tracking, …
• Each user/project
directory should be a
volume
• 100K volumes not a
problem
7/23/2012 ©MapR Technologies 4
- 5. The HBase volume
• All HBase data should be in one volume
• HBase WALs are per RegionServer, so can’t create per-table volumes
• A volume for HBase data is created on installation
• Name: hbase.volume
• Mount path: /hbase
• Replication optimized for low latency
• Star replication beats chain replication for HBase
• For bulk load, create the HFiles in the HBase volume (/hbase)
# cd /mapr/default/hbase Reminder: A MapR
# ls -la
total 7
cluster can be mounted
drwxrwxrwx 13 root root 12 2012-01-16 11:44 . via NFS so cd and ls
drwxrwxrwx 6 root root 7 2012-01-13 16:08 .. just work
drwxrwxrwx 3 root root 1 2012-01-15 11:30 AdImpressions
-rwxrwxrwx 1 root root 3 2011-12-16 13:03 hbase.version
drwxrwxrwx 5 root root 3 2012-01-12 15:28 .logs All WALs are in .logs,
drwxrwxrwx 3 root root 1 2011-12-16 13:03 .META. not in the user table
drwxrwxrwx 2 root root 0 2012-01-13 14:29 .oldlogs
drwxrwxrwx 3 root root 1 2011-12-16 13:03 -ROOT- directories
drwxrwxrwx 3 root root 1 2012-01-16 11:44 Users (AdImpressions, Users)
7/23/2012 ©MapR Technologies 5
- 6. HBase backups with snapshots
• Why snapshots?
• Consistent – HFiles and HLogs at the same point in time
• No downtime – snapshot a live HBase cluster, no performance impact
• No data duplication – takes seconds to snapshot petabytes
• Short RPOs – snapshot hourly or more frequently
• Access HBase snapshots in /hbase/.snapshot:
# cd .snapshot
# pwd
/mapr/default/hbase/.snapshot
# ls -la
total 3
drwxr-xr-x 5 root root 3 Jan 16 16:02 .
drwxrwxrwx 7 root root 6 Jan 16 11:46 ..
drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.14-02-02
drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.15-02-02
drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.16-02-02
# ls -a 2012-01-16.16-02-02
. .. AdImpressions hbase.version .logs .META. .oldlogs -ROOT-
7/23/2012 ©MapR Technologies 6
- 8. Choose a snapshot schedule for HBase
Use this GUI dialog, or the CLI
or REST API
Choose a snapshot schedule
for this volume
7/23/2012 ©MapR Technologies 8
- 9. Mirroring
Mirror to…
• Research cluster
• Failover (DR) cluster
• Remote backup cluster
• Same cluster!
•…
Fast (and easy) Safe Flexible
• Differential (deltas) • Consistent (snapshot) • Scheduled or on-
• Compressed • Checksummed demand
• Intranet, WAN or
Sneakernet
7/23/2012 ©MapR Technologies 9
- 10. Mirroring the HBase volume
Create a new volume on
destination cluster. Choose
Remote Mirroring Volume type
Choose source cluster and
volume (mapr.hbase)
Choose mirroring schedule
7/23/2012 ©MapR Technologies 10
- 11. Mirroring vs. HBase master/slave replication
• Block level
• No need to run HBase on sink cluster
• Only latest update to the a block needs to be sent
With master/slave every operation is sent
• MapR mirroring is practically stateless
• Each sink cluster keeps one integer – a serial number
When asking for the next update, sink provides most recently seen serial
number
• Master cluster does not keep any state
No resources consumed on the master cluster
• No ZooKeeper involved
• Master/slave replication is challenging when it gets out of sync
• One system for mirroring both HBase and file/directories
7/23/2012 ©MapR Technologies 11
- 12. Warden
• Warden runs on each server
• /etc/init.d/mapr-warden start
• Warden starts/manages services on the node
• Warden decides how much memory to give each
service based on settings in warden.conf
# cat /opt/mapr/conf/warden.conf
…
service.command.hbregion.heapsize.percent=25
service.command.hbregion.heapsize.max=4000
service.command.hbregion.heapsize.min=1000
service.command.mfs.heapsize.percent=20
service.command.mfs.heapsize.min=512
…
7/23/2012 ©MapR Technologies 12
- 13. Tuning memory settings
• The defaults are suitable in most cases
• Guidelines:
• Don’t exceed 100-200 regions per server
• Don’t give RegionServer more than 16GB RAM
Garbage collection might kill you
• Give spare memory to FileServer
Written in C/C++ (unlike HDFS DataNode)
Advanced caching and prefetching
• Don’t enable TaskTracker unless you need it
Or Warden will reserve memory for tasks
If TaskTracker not enabled and mfs.heapsize.max not in
warden.conf, Warden assigns spare memory to FileServer
7/23/2012 ©MapR Technologies 13
- 14. Architecting applications with many objects
• MapR supports up to 1 trillion files (small files OK)
• Fully distributed metadata
No NameNode or block reports
• Extremely fast random I/O (10-1000x compared to HDFS)
• With HDFS Federation the upcoming HA NameNode you would need 20K
NameNodes and an HA NetApp :-)
• Keep smaller objects in HBase and larger objects (> 100KB) in MapR
storage services
Metadata (IDs, attributes, etc.)
Content (messages, attachments, etc.)
HBase
MapR storage services
7/23/2012 ©MapR Technologies 14
- 15. Three ways to access the files
• NFS
• Mount the cluster over NFS
• NFS HA ensures availability – MapR assigns and manages virtual IPs
• No client library, works with any language
$ mount –o … mycluster:/mapr /mapr
$ python
>>> with open(r'/mapr/mycluster/images/asdfghjkl', 'w') as f:
... f.write(…)
• Java – Hadoop FileSystem API
FileSystem fs = FileSystem.get(new Configuration());
FSDataOutputStream out = fs.create(…);
out.write(…)
• C/C++ – native libhdfs library (MapR 1.2+)
• Same API (header file) as libhdfs, but no Java involved
hdfsFS fs = hdfsConnect(...);
hdfsFile f = hdfsOpenFile(fs, ...);
hdfsWrite(fs, f, ...);
7/23/2012 ©MapR Technologies 15