SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Operating HBase –
Things You Need to Know
       Christian Gügi
Outline
●   HBase internals
●   Overview of HBase utilities
●   HBase split visualisation with Hannibal
●   Challenges & lessons learned
●   Resources to get started




                                              2
About me
●   Software Architect @ Sentric
●   Founder and organizer of the Swiss Big
    Data User Group
    http://www.bigdata-usergroup.ch

●   Contact:
    christian.guegi@sentric.ch
    http://www.sentric.ch
    @chrisgugi

                                             3
HBase Internals




                  4
Data Model
●   A sparse, multi-dimensional, sorted map
●   Table consist of rows, each has a row key
●   Each row may have any number of columns
●   Rows are sorted lexicographically based on row key
●   Column = Column Family : Column Qualifier
    –   Cell → {rowkey, column, timestamp}




                                    [Bigtable: A Distributed Storage System for Structured Data]

●   Region: contiguous set of sorted rows
●   Region: unit of distribution and availability                                                  5
Physical Data Organization
    Region
                      content Column Family        anchor Column Family

                   Store                         Store
(WAL on HFDS)




                                 Memstore                        Memstore
    HLog




                        HFile         HFile            HFile
                     (on HDFS)     (on HDFS)        (on HDFS)




●      Column families are stored separately on disk
          –     Unit of access control with different patterns
●      Writes are held (sorted) in memory until flush
●      Sorted on disk in predictable order
          –     By row key, column key, descending timestamp                6
Flushes and Compaction
●   Flushing/compaction per Region
    –   One thread (CompactSplitThread) per region
        server
●   Minor compaction
    –   Merges two or more HFiles into one
●   Major compaction
    –   Picks up all HFiles in the region, merges them and
        removes deleted k/v
●   Regions are split when grown too large

                                                             7
System Architecture

           HBase                        API


                                    RegionServer
                 Master
                                        HFile      Memstore
                                        Write-Ahead Log




                    HDFS                        ZooKeeper



    [HBase: The Definitive Guide]

                                                              8
Key Design & Distribution
●   Bad idea: continuous number or timestamp
    (sequential row keys)
    –   RegionServer hot-spotting
●   Better: use hash function and/or composite
    key
    –   Distribute keys over random regions
    –   Uniform reads/writes across key space
●   Proper key design is very essential
    –   E.g. reversed URL (Bigtable paper)
                                                 9
Overview
HBase Utilities




                  10
Useful Tools
●   hbck – checks and fixes table integrity and
    region consistency
●   HFile – examine contents of HFile
●   HLog – examine contents of HLog file
●   OfflineMetaRepair – rebuild meta table
    from file system
●   HBase web interfaces
    –   Master
    –   RegionsServer
                                                  11
Monitoring Tools
●   Ganglia
●   Nagios
●   OpenTSDB
●   …

    All tools use metrics provided through JMX




                                                 12
Manual Splitting
●   Via master web interface
    –   Split
●   HBase shell split command
●   RegionSplitter
    –   Create table with pre-split regions
    –   Rolling split of all regions on existing table
    –   . /bin/hbase
        org.apache.hadoop.hbase.util.RegionSplitter


                                                         13
Disable Automatic Splitting
●   Determined by hbase.hregion.max.filesize
●   Set to max. 100GB
●   OK, but:
    –   How do I monitor my region growth?
    –   Where do I split when I have irregular data
        growth?




                                                      14
HBase Split Visualisation
    with Hannibal




                            15
Hannibal
●   Open source, project on github
    – https://github.com/sentric/hannibal
●   Web based
●   Implemented in Scala
●   Compatible with HBase 0.90
●   Support > 0.92 added soon
●   Check it out!

                                            16
How well are regions balanced
over the cluster?




                                17
How well are the regions split for
the table?




                                     18
How did the region evolve over
time?




                                 19
Future Plans
●   HBase 0.92 client API changes allow to
    query Compaction-State on Regions
    through HBaseAdmin → differentiate major
    from minor compactions
●   Add tool to find best region-key for irregular
    data growth
●   Expose metrics through JMX



                                                     20
Challenges
& Lessons Learned




                    21
Challenges
●   Everyone is still learning
●   Some issues only appear at scale
    –   At scale, nothing works as advertised
●   Production cluster configuration
    –   Hardware issues
    –   Tuning cluster configuration to our work loads
●   HBase stability
●   Monitoring health of HBase
                                                         22
Lessons Learned
●   Schema & key design
    –   What’s queried together should be stored together
●   Monitoring/Operational tooling is most important
●   Forget “emergency actions”, it takes some time
●   You need DevOps in production
●   Huge know-how curve, you need to know the whole
    ecosystem
    –   Hadoop, HDFS, Map/Red, ZooKeeper



                                                            23
Resources to get started
●   https://github.com/sentric/hannibal
●   http://hbase.apache.org/book.html
●   https://github.com/jmhsieh/hbase-repair-
    scripts
●   http://www.sentric.ch/blog/best-practice-
    why-monitoring-hbase-is-important
●   HBase: The Definitive Guide


                                                24
Thank you!



       Questions?
             @chrisgugi




                          25

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopJan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Yahoo Developer Network
 

Was ist angesagt? (20)

HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
HBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase ReplicationHBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase Replication
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopJan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 

Andere mochten auch

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Yahoo Developer Network
 

Andere mochten auch (17)

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
BIG Data Science: A Path Forward
BIG Data Science:  A Path ForwardBIG Data Science:  A Path Forward
BIG Data Science: A Path Forward
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting Value
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
 
Predictive Analytics using R
Predictive Analytics using RPredictive Analytics using R
Predictive Analytics using R
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Ähnlich wie Apachecon Europe 2012: Operating HBase - Things you need to know

Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
Taldor Group
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 

Ähnlich wie Apachecon Europe 2012: Operating HBase - Things you need to know (20)

Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.
 
Hbase Introduction
Hbase IntroductionHbase Introduction
Hbase Introduction
 
Training
TrainingTraining
Training
 
HBase Introduction
HBase IntroductionHBase Introduction
HBase Introduction
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Cloudera Impala presentation
Cloudera Impala presentationCloudera Impala presentation
Cloudera Impala presentation
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
HBase introduction talk
HBase introduction talkHBase introduction talk
HBase introduction talk
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 

Mehr von Christian Gügi

Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
Christian Gügi
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
Christian Gügi
 

Mehr von Christian Gügi (7)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
 
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Apachecon Europe 2012: Operating HBase - Things you need to know

  • 1. Operating HBase – Things You Need to Know Christian Gügi
  • 2. Outline ● HBase internals ● Overview of HBase utilities ● HBase split visualisation with Hannibal ● Challenges & lessons learned ● Resources to get started 2
  • 3. About me ● Software Architect @ Sentric ● Founder and organizer of the Swiss Big Data User Group http://www.bigdata-usergroup.ch ● Contact: christian.guegi@sentric.ch http://www.sentric.ch @chrisgugi 3
  • 5. Data Model ● A sparse, multi-dimensional, sorted map ● Table consist of rows, each has a row key ● Each row may have any number of columns ● Rows are sorted lexicographically based on row key ● Column = Column Family : Column Qualifier – Cell → {rowkey, column, timestamp} [Bigtable: A Distributed Storage System for Structured Data] ● Region: contiguous set of sorted rows ● Region: unit of distribution and availability 5
  • 6. Physical Data Organization Region content Column Family anchor Column Family Store Store (WAL on HFDS) Memstore Memstore HLog HFile HFile HFile (on HDFS) (on HDFS) (on HDFS) ● Column families are stored separately on disk – Unit of access control with different patterns ● Writes are held (sorted) in memory until flush ● Sorted on disk in predictable order – By row key, column key, descending timestamp 6
  • 7. Flushes and Compaction ● Flushing/compaction per Region – One thread (CompactSplitThread) per region server ● Minor compaction – Merges two or more HFiles into one ● Major compaction – Picks up all HFiles in the region, merges them and removes deleted k/v ● Regions are split when grown too large 7
  • 8. System Architecture HBase API RegionServer Master HFile Memstore Write-Ahead Log HDFS ZooKeeper [HBase: The Definitive Guide] 8
  • 9. Key Design & Distribution ● Bad idea: continuous number or timestamp (sequential row keys) – RegionServer hot-spotting ● Better: use hash function and/or composite key – Distribute keys over random regions – Uniform reads/writes across key space ● Proper key design is very essential – E.g. reversed URL (Bigtable paper) 9
  • 11. Useful Tools ● hbck – checks and fixes table integrity and region consistency ● HFile – examine contents of HFile ● HLog – examine contents of HLog file ● OfflineMetaRepair – rebuild meta table from file system ● HBase web interfaces – Master – RegionsServer 11
  • 12. Monitoring Tools ● Ganglia ● Nagios ● OpenTSDB ● … All tools use metrics provided through JMX 12
  • 13. Manual Splitting ● Via master web interface – Split ● HBase shell split command ● RegionSplitter – Create table with pre-split regions – Rolling split of all regions on existing table – . /bin/hbase org.apache.hadoop.hbase.util.RegionSplitter 13
  • 14. Disable Automatic Splitting ● Determined by hbase.hregion.max.filesize ● Set to max. 100GB ● OK, but: – How do I monitor my region growth? – Where do I split when I have irregular data growth? 14
  • 15. HBase Split Visualisation with Hannibal 15
  • 16. Hannibal ● Open source, project on github – https://github.com/sentric/hannibal ● Web based ● Implemented in Scala ● Compatible with HBase 0.90 ● Support > 0.92 added soon ● Check it out! 16
  • 17. How well are regions balanced over the cluster? 17
  • 18. How well are the regions split for the table? 18
  • 19. How did the region evolve over time? 19
  • 20. Future Plans ● HBase 0.92 client API changes allow to query Compaction-State on Regions through HBaseAdmin → differentiate major from minor compactions ● Add tool to find best region-key for irregular data growth ● Expose metrics through JMX 20
  • 22. Challenges ● Everyone is still learning ● Some issues only appear at scale – At scale, nothing works as advertised ● Production cluster configuration – Hardware issues – Tuning cluster configuration to our work loads ● HBase stability ● Monitoring health of HBase 22
  • 23. Lessons Learned ● Schema & key design – What’s queried together should be stored together ● Monitoring/Operational tooling is most important ● Forget “emergency actions”, it takes some time ● You need DevOps in production ● Huge know-how curve, you need to know the whole ecosystem – Hadoop, HDFS, Map/Red, ZooKeeper 23
  • 24. Resources to get started ● https://github.com/sentric/hannibal ● http://hbase.apache.org/book.html ● https://github.com/jmhsieh/hbase-repair- scripts ● http://www.sentric.ch/blog/best-practice- why-monitoring-hbase-is-important ● HBase: The Definitive Guide 24
  • 25. Thank you! Questions? @chrisgugi 25