SlideShare a Scribd company logo
1 of 16
Running HBase with the MapR distribution
                                                       Tomer Shiran
                  Director of Product Management, MapR Technologies

      7/23/2012           ©MapR Technologies                          1
Agenda
•   The HBase volume
•   HBase backups with snapshots
•   Mirroring
•   Tuning memory settings
•   Architecting applications with many objects




        7/23/2012    ©MapR Technologies           2
MapR
• Complete Hadoop distribution
   • Makes it easy to deploy HBase
   • MapR 1.2 includes HBase 0.90.4 + 15 patches

• Seeing huge growth in HBase adoption
   • Thanks to everyone in this room!

• MapR expands the market for HBase
   • Enterprises require HA, data protection and disaster recovery
   • MapR makes it easier to run HBase in production
       One minute to set up hourly snapshots
       One minute to set up cross-datacenter mirroring
       No need to worry about NameNode

        7/23/2012          ©MapR Technologies                  3
Volumes – easy data management
• MapR makes data
  management easier with
  volumes
• Volumes are directories
  with management policies
   • Replication, snapshots,
     mirroring, data placement
     control, quotas, usage
     tracking, …
• Each user/project
  directory should be a
  volume
   • 100K volumes not a
     problem


        7/23/2012         ©MapR Technologies   4
The HBase volume
•   All HBase data should be in one volume
     •   HBase WALs are per RegionServer, so can’t create per-table volumes
•   A volume for HBase data is created on installation
     •   Name: hbase.volume
     •   Mount path: /hbase
•   Replication optimized for low latency
     •   Star replication beats chain replication for HBase
•   For bulk load, create the HFiles in the HBase volume (/hbase)

# cd /mapr/default/hbase                                                         Reminder: A MapR
# ls -la
total 7
                                                                              cluster can be mounted
drwxrwxrwx 13 root root 12        2012-01-16    11:44   .                       via NFS so cd and ls
drwxrwxrwx 6 root root 7          2012-01-13    16:08   ..                            just work
drwxrwxrwx 3 root root 1          2012-01-15    11:30   AdImpressions
-rwxrwxrwx 1 root root 3          2011-12-16    13:03   hbase.version
drwxrwxrwx 5 root root 3          2012-01-12    15:28   .logs                  All WALs are in .logs,
drwxrwxrwx 3 root root 1          2011-12-16    13:03   .META.                 not in the user table
drwxrwxrwx 2 root root 0          2012-01-13    14:29   .oldlogs
drwxrwxrwx 3 root root 1          2011-12-16    13:03   -ROOT-                      directories
drwxrwxrwx 3 root root 1          2012-01-16    11:44   Users                 (AdImpressions, Users)

              7/23/2012                   ©MapR Technologies                                      5
HBase backups with snapshots
• Why snapshots?
   •    Consistent – HFiles and HLogs at the same point in time
   •    No downtime – snapshot a live HBase cluster, no performance impact
   •    No data duplication – takes seconds to snapshot petabytes
   •    Short RPOs – snapshot hourly or more frequently

• Access HBase snapshots in /hbase/.snapshot:
       # cd .snapshot
       # pwd
       /mapr/default/hbase/.snapshot
       # ls -la
       total 3
       drwxr-xr-x 5 root root 3 Jan 16 16:02 .
       drwxrwxrwx 7 root root 6 Jan 16 11:46 ..
       drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.14-02-02
       drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.15-02-02
       drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.16-02-02
       # ls -a 2012-01-16.16-02-02
       . .. AdImpressions hbase.version .logs .META. .oldlogs      -ROOT-

            7/23/2012            ©MapR Technologies                          6
Manage your schedules




     7/23/2012   ©MapR Technologies   7
Choose a snapshot schedule for HBase

                                      Use this GUI dialog, or the CLI
                                               or REST API




                                      Choose a snapshot schedule
                                           for this volume

     7/23/2012   ©MapR Technologies                              8
Mirroring

                                            Mirror to…
                                            • Research cluster
                                            • Failover (DR) cluster
                                            • Remote backup cluster
                                            • Same cluster!
                                            •…

     Fast (and easy)               Safe                     Flexible

• Differential (deltas)   • Consistent (snapshot)   • Scheduled or on-
• Compressed              • Checksummed               demand
                                                    • Intranet, WAN or
                                                      Sneakernet


            7/23/2012        ©MapR Technologies                          9
Mirroring the HBase volume

                                         Create a new volume on
                                        destination cluster. Choose
                                      Remote Mirroring Volume type


                                        Choose source cluster and
                                          volume (mapr.hbase)




                                        Choose mirroring schedule




     7/23/2012   ©MapR Technologies                            10
Mirroring vs. HBase master/slave replication
• Block level
    • No need to run HBase on sink cluster
    • Only latest update to the a block needs to be sent
         With master/slave every operation is sent


• MapR mirroring is practically stateless
    • Each sink cluster keeps one integer – a serial number
         When asking for the next update, sink provides most recently seen serial
          number
    • Master cluster does not keep any state
         No resources consumed on the master cluster
    • No ZooKeeper involved
    • Master/slave replication is challenging when it gets out of sync

• One system for mirroring both HBase and file/directories


          7/23/2012               ©MapR Technologies                                 11
Warden
• Warden runs on each server
   • /etc/init.d/mapr-warden start
• Warden starts/manages services on the node
• Warden decides how much memory to give each
  service based on settings in warden.conf

 # cat /opt/mapr/conf/warden.conf
 …
 service.command.hbregion.heapsize.percent=25
 service.command.hbregion.heapsize.max=4000
 service.command.hbregion.heapsize.min=1000
 service.command.mfs.heapsize.percent=20
 service.command.mfs.heapsize.min=512
 …



         7/23/2012          ©MapR Technologies   12
Tuning memory settings
• The defaults are suitable in most cases

• Guidelines:
   • Don’t exceed 100-200 regions per server
   • Don’t give RegionServer more than 16GB RAM
       Garbage collection might kill you
   • Give spare memory to FileServer
       Written in C/C++ (unlike HDFS DataNode)
       Advanced caching and prefetching
   • Don’t enable TaskTracker unless you need it
       Or Warden will reserve memory for tasks
       If TaskTracker not enabled and mfs.heapsize.max not in
        warden.conf, Warden assigns spare memory to FileServer



        7/23/2012           ©MapR Technologies                   13
Architecting applications with many objects
• MapR supports up to 1 trillion files (small files OK)
    • Fully distributed metadata
          No NameNode or block reports
    • Extremely fast random I/O (10-1000x compared to HDFS)
    • With HDFS Federation the upcoming HA NameNode you would need 20K
      NameNodes and an HA NetApp :-)

• Keep smaller objects in HBase and larger objects (> 100KB) in MapR
  storage services

 Metadata (IDs, attributes, etc.)

                                     Content (messages, attachments, etc.)

              HBase

                                MapR storage services

           7/23/2012               ©MapR Technologies                        14
Three ways to access the files
• NFS
   • Mount the cluster over NFS
   • NFS HA ensures availability – MapR assigns and manages virtual IPs
   • No client library, works with any language
   $ mount –o … mycluster:/mapr /mapr
   $ python
   >>> with open(r'/mapr/mycluster/images/asdfghjkl', 'w') as f:
   ...     f.write(…)

• Java – Hadoop FileSystem API
   FileSystem fs = FileSystem.get(new Configuration());
   FSDataOutputStream out = fs.create(…);
   out.write(…)

• C/C++ – native libhdfs library (MapR 1.2+)
   • Same API (header file) as libhdfs, but no Java involved
   hdfsFS fs = hdfsConnect(...);
   hdfsFile f = hdfsOpenFile(fs, ...);
   hdfsWrite(fs, f, ...);
         7/23/2012          ©MapR Technologies                            15
Questions?


   7/23/2012   ©MapR Technologies   16

More Related Content

What's hot

Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopContinuent
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 

What's hot (20)

Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Hadoop sqoop
Hadoop sqoop Hadoop sqoop
Hadoop sqoop
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 

Viewers also liked

I phonepromoseminar 120426
I phonepromoseminar 120426 I phonepromoseminar 120426
I phonepromoseminar 120426 Nobuyoshi Noda
 
Acuity Analytics Presentation
Acuity Analytics PresentationAcuity Analytics Presentation
Acuity Analytics PresentationRalph Zuponcic
 
Module 3 lesson 15 solving and graphing inequalities
Module 3 lesson 15 solving and graphing inequalitiesModule 3 lesson 15 solving and graphing inequalities
Module 3 lesson 15 solving and graphing inequalitiesErik Tjersland
 
Boy with a Coin - Iron and Wine
Boy with a Coin - Iron and WineBoy with a Coin - Iron and Wine
Boy with a Coin - Iron and Winehma1
 
AIM presentation 2016 :: RUCOM
AIM presentation 2016 :: RUCOMAIM presentation 2016 :: RUCOM
AIM presentation 2016 :: RUCOMRUCOM
 
People leave managers, not companies
People leave managers, not companiesPeople leave managers, not companies
People leave managers, not companiesSiddharthan VGJ
 
Vine App Case Study - Red Vines
Vine App Case Study - Red VinesVine App Case Study - Red Vines
Vine App Case Study - Red VinesJohn Dempsey
 
Module 3 lesson 1 combining like terms
Module 3 lesson 1 combining like termsModule 3 lesson 1 combining like terms
Module 3 lesson 1 combining like termsErik Tjersland
 
Brainly case study - How we rebuilt our apps and achieved success
Brainly case study - How we rebuilt our apps and achieved successBrainly case study - How we rebuilt our apps and achieved success
Brainly case study - How we rebuilt our apps and achieved successRoman Barzyczak
 
Data Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowData Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowJulien Le Dem
 
pidsumok_01_2014
pidsumok_01_2014pidsumok_01_2014
pidsumok_01_2014wbc-rivne
 
Proces 04 2014
Proces 04 2014Proces 04 2014
Proces 04 2014wbc-rivne
 

Viewers also liked (14)

I phonepromoseminar 120426
I phonepromoseminar 120426 I phonepromoseminar 120426
I phonepromoseminar 120426
 
Acuity Analytics Presentation
Acuity Analytics PresentationAcuity Analytics Presentation
Acuity Analytics Presentation
 
Module 3 lesson 15 solving and graphing inequalities
Module 3 lesson 15 solving and graphing inequalitiesModule 3 lesson 15 solving and graphing inequalities
Module 3 lesson 15 solving and graphing inequalities
 
INICA
INICAINICA
INICA
 
Boy with a Coin - Iron and Wine
Boy with a Coin - Iron and WineBoy with a Coin - Iron and Wine
Boy with a Coin - Iron and Wine
 
AIM presentation 2016 :: RUCOM
AIM presentation 2016 :: RUCOMAIM presentation 2016 :: RUCOM
AIM presentation 2016 :: RUCOM
 
People leave managers, not companies
People leave managers, not companiesPeople leave managers, not companies
People leave managers, not companies
 
Vine App Case Study - Red Vines
Vine App Case Study - Red VinesVine App Case Study - Red Vines
Vine App Case Study - Red Vines
 
Module 3 lesson 1 combining like terms
Module 3 lesson 1 combining like termsModule 3 lesson 1 combining like terms
Module 3 lesson 1 combining like terms
 
Romania meeting, 2016
Romania meeting, 2016 Romania meeting, 2016
Romania meeting, 2016
 
Brainly case study - How we rebuilt our apps and achieved success
Brainly case study - How we rebuilt our apps and achieved successBrainly case study - How we rebuilt our apps and achieved success
Brainly case study - How we rebuilt our apps and achieved success
 
Data Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowData Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet Arrow
 
pidsumok_01_2014
pidsumok_01_2014pidsumok_01_2014
pidsumok_01_2014
 
Proces 04 2014
Proces 04 2014Proces 04 2014
Proces 04 2014
 

Similar to HBase with MapR

TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 

Similar to HBase with MapR (20)

TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
22 configuration
22 configuration22 configuration
22 configuration
 
13c planning
13c planning13c planning
13c planning
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Hoya for Code Review
Hoya for Code ReviewHoya for Code Review
Hoya for Code Review
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
20a installation
20a installation20a installation
20a installation
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

HBase with MapR

  • 1. Running HBase with the MapR distribution Tomer Shiran Director of Product Management, MapR Technologies 7/23/2012 ©MapR Technologies 1
  • 2. Agenda • The HBase volume • HBase backups with snapshots • Mirroring • Tuning memory settings • Architecting applications with many objects 7/23/2012 ©MapR Technologies 2
  • 3. MapR • Complete Hadoop distribution • Makes it easy to deploy HBase • MapR 1.2 includes HBase 0.90.4 + 15 patches • Seeing huge growth in HBase adoption • Thanks to everyone in this room! • MapR expands the market for HBase • Enterprises require HA, data protection and disaster recovery • MapR makes it easier to run HBase in production  One minute to set up hourly snapshots  One minute to set up cross-datacenter mirroring  No need to worry about NameNode 7/23/2012 ©MapR Technologies 3
  • 4. Volumes – easy data management • MapR makes data management easier with volumes • Volumes are directories with management policies • Replication, snapshots, mirroring, data placement control, quotas, usage tracking, … • Each user/project directory should be a volume • 100K volumes not a problem 7/23/2012 ©MapR Technologies 4
  • 5. The HBase volume • All HBase data should be in one volume • HBase WALs are per RegionServer, so can’t create per-table volumes • A volume for HBase data is created on installation • Name: hbase.volume • Mount path: /hbase • Replication optimized for low latency • Star replication beats chain replication for HBase • For bulk load, create the HFiles in the HBase volume (/hbase) # cd /mapr/default/hbase Reminder: A MapR # ls -la total 7 cluster can be mounted drwxrwxrwx 13 root root 12 2012-01-16 11:44 . via NFS so cd and ls drwxrwxrwx 6 root root 7 2012-01-13 16:08 .. just work drwxrwxrwx 3 root root 1 2012-01-15 11:30 AdImpressions -rwxrwxrwx 1 root root 3 2011-12-16 13:03 hbase.version drwxrwxrwx 5 root root 3 2012-01-12 15:28 .logs All WALs are in .logs, drwxrwxrwx 3 root root 1 2011-12-16 13:03 .META. not in the user table drwxrwxrwx 2 root root 0 2012-01-13 14:29 .oldlogs drwxrwxrwx 3 root root 1 2011-12-16 13:03 -ROOT- directories drwxrwxrwx 3 root root 1 2012-01-16 11:44 Users (AdImpressions, Users) 7/23/2012 ©MapR Technologies 5
  • 6. HBase backups with snapshots • Why snapshots? • Consistent – HFiles and HLogs at the same point in time • No downtime – snapshot a live HBase cluster, no performance impact • No data duplication – takes seconds to snapshot petabytes • Short RPOs – snapshot hourly or more frequently • Access HBase snapshots in /hbase/.snapshot: # cd .snapshot # pwd /mapr/default/hbase/.snapshot # ls -la total 3 drwxr-xr-x 5 root root 3 Jan 16 16:02 . drwxrwxrwx 7 root root 6 Jan 16 11:46 .. drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.14-02-02 drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.15-02-02 drwxrwxrwx 7 root root 6 Jan 16 11:46 2012-01-16.16-02-02 # ls -a 2012-01-16.16-02-02 . .. AdImpressions hbase.version .logs .META. .oldlogs -ROOT- 7/23/2012 ©MapR Technologies 6
  • 7. Manage your schedules 7/23/2012 ©MapR Technologies 7
  • 8. Choose a snapshot schedule for HBase Use this GUI dialog, or the CLI or REST API Choose a snapshot schedule for this volume 7/23/2012 ©MapR Technologies 8
  • 9. Mirroring Mirror to… • Research cluster • Failover (DR) cluster • Remote backup cluster • Same cluster! •… Fast (and easy) Safe Flexible • Differential (deltas) • Consistent (snapshot) • Scheduled or on- • Compressed • Checksummed demand • Intranet, WAN or Sneakernet 7/23/2012 ©MapR Technologies 9
  • 10. Mirroring the HBase volume Create a new volume on destination cluster. Choose Remote Mirroring Volume type Choose source cluster and volume (mapr.hbase) Choose mirroring schedule 7/23/2012 ©MapR Technologies 10
  • 11. Mirroring vs. HBase master/slave replication • Block level • No need to run HBase on sink cluster • Only latest update to the a block needs to be sent  With master/slave every operation is sent • MapR mirroring is practically stateless • Each sink cluster keeps one integer – a serial number  When asking for the next update, sink provides most recently seen serial number • Master cluster does not keep any state  No resources consumed on the master cluster • No ZooKeeper involved • Master/slave replication is challenging when it gets out of sync • One system for mirroring both HBase and file/directories 7/23/2012 ©MapR Technologies 11
  • 12. Warden • Warden runs on each server • /etc/init.d/mapr-warden start • Warden starts/manages services on the node • Warden decides how much memory to give each service based on settings in warden.conf # cat /opt/mapr/conf/warden.conf … service.command.hbregion.heapsize.percent=25 service.command.hbregion.heapsize.max=4000 service.command.hbregion.heapsize.min=1000 service.command.mfs.heapsize.percent=20 service.command.mfs.heapsize.min=512 … 7/23/2012 ©MapR Technologies 12
  • 13. Tuning memory settings • The defaults are suitable in most cases • Guidelines: • Don’t exceed 100-200 regions per server • Don’t give RegionServer more than 16GB RAM  Garbage collection might kill you • Give spare memory to FileServer  Written in C/C++ (unlike HDFS DataNode)  Advanced caching and prefetching • Don’t enable TaskTracker unless you need it  Or Warden will reserve memory for tasks  If TaskTracker not enabled and mfs.heapsize.max not in warden.conf, Warden assigns spare memory to FileServer 7/23/2012 ©MapR Technologies 13
  • 14. Architecting applications with many objects • MapR supports up to 1 trillion files (small files OK) • Fully distributed metadata  No NameNode or block reports • Extremely fast random I/O (10-1000x compared to HDFS) • With HDFS Federation the upcoming HA NameNode you would need 20K NameNodes and an HA NetApp :-) • Keep smaller objects in HBase and larger objects (> 100KB) in MapR storage services Metadata (IDs, attributes, etc.) Content (messages, attachments, etc.) HBase MapR storage services 7/23/2012 ©MapR Technologies 14
  • 15. Three ways to access the files • NFS • Mount the cluster over NFS • NFS HA ensures availability – MapR assigns and manages virtual IPs • No client library, works with any language $ mount –o … mycluster:/mapr /mapr $ python >>> with open(r'/mapr/mycluster/images/asdfghjkl', 'w') as f: ... f.write(…) • Java – Hadoop FileSystem API FileSystem fs = FileSystem.get(new Configuration()); FSDataOutputStream out = fs.create(…); out.write(…) • C/C++ – native libhdfs library (MapR 1.2+) • Same API (header file) as libhdfs, but no Java involved hdfsFS fs = hdfsConnect(...); hdfsFile f = hdfsOpenFile(fs, ...); hdfsWrite(fs, f, ...); 7/23/2012 ©MapR Technologies 15
  • 16. Questions? 7/23/2012 ©MapR Technologies 16