SlideShare ist ein Scribd-Unternehmen logo
1 von 21
HBase Metrics: What they mean to you

             David S. Wang
           dsw@cloudera.com
               5/22/12

                     HBaseCon 2012. 5/22/12
          Copyright 2012 Cloudera Inc. All rights reserved
Agenda

•   Motivation
•   Where to get metrics
•   Collection methods
•   Metrics:
    •   HBase Operations
    •   Memory-related
    •   Latency-related
    •   Host-level
• Takeaways

                                 HBaseCon 2012. 5/22/12                  2
                      Copyright 2012 Cloudera Inc. All rights reserved
Motivation

• To understand the steady-state
  behavior of your HBase cluster
• To debug when something bad
  happens
• To help evaluate changes
• To figure out when you need
  to buy more hardware




                             HBaseCon 2012. 5/22/12                  3
                  Copyright 2012 Cloudera Inc. All rights reserved
Where to get metrics
       JVM                                                JVM
            Region server
                                                                  Master
               Region
                Region

                                                         JVM

                                                                 HDFS
    Metrics can be obtained                                    DataNode
    from all of these areas

     Host
                CPUs          Memory                   Disks                Network



                                    HBaseCon 2012. 5/22/12                            4
                         Copyright 2012 Cloudera Inc. All rights reserved
Collection methods

                                          •     Dump to file
                                          •     web UI /jmx
                                          •     jconsole
                                          •     Ganglia
                                          •     Cloudera Manager
                                          •     Many other tools out
                                                there – suggestions
                                                welcome


                         HBaseCon 2012. 5/22/12                        5
              Copyright 2012 Cloudera Inc. All rights reserved
HBase Operations: compactions




  • hbase.regionserver.compactionQueueSize – size of queue of compactions
    waiting to be processed
  • h.r.compactionSize_avg_time – average bytes processed per compaction
  • h.r.compactionTime_avg_time – average time per compaction in
    milliseconds
  • h.r.compactionSize_num_ops, h.r.compactionTime_num_ops – total
    number of compactions so far
  • TIP: Spikes in h.r.compactionQueueSize could mean all of your regions
    are growing at the same rate and need to split/compact at around the
    same time – time to presplit your regions or turn off auto-compactions
                                 HBaseCon 2012. 5/22/12                  6
                      Copyright 2012 Cloudera Inc. All rights reserved
HBase Operations: flushes




  • h.r.flushQueueSize – size of queue of flushes waiting to be
    processed
  • h.r.flushSize_avg_time – average bytes processed per flush
  • h.r.flushTime_avg_time – average time per flush in milliseconds
  • h.r.flushSize_num_ops, h.r.flushTime_num_ops – total number of
    flushes so far
  • TIP: Small h.r.flushSize_avg_time and large flushQueueSize mean
    premature flushes: may indicate that you need more RAM, or you
    are ingesting and flushing faster than your disks can handle
                               HBaseCon 2012. 5/22/12                  7
                    Copyright 2012 Cloudera Inc. All rights reserved
HBase Operations: region splits




  • h.r.regionSplitFailureCount – number of unsuccessful splits
  • h.r.regionSplitSuccessCount – number of successful splits
  • TIP: Spikes in h.r.regionSplitSuccessCount may mean that you are
    spending too much time splitting – think about pre-splitting your
    regions
  • TIP: Sustained high rates of h.r.regionSplitSuccessCount may mean
    that you are ingesting fast enough to exceed your region size
    configuration – change the configuration and/or presplit/disable
    automatic splits
                                HBaseCon 2012. 5/22/12                  8
                     Copyright 2012 Cloudera Inc. All rights reserved
HBase Operations: stores and storefiles




  • h.r.stores – total number of stores
  • h.r.storefiles – total number of storefiles
  • TIP: Large ratio between number of h.r.stores and
    h.r.storefiles may indicate it’s time to compact more
    frequently – otherwise read performance can suffer


                                HBaseCon 2012. 5/22/12                  9
                     Copyright 2012 Cloudera Inc. All rights reserved
Memory: memstores




  • h.r.memstoreSizeMB – sum of all the memstore sizes in megabytes
  • h.r.numPutsWithoutWAL – number of puts with
    setWriteToWAL(false)
  • h.r.mbInMemoryWithoutWAL – amount of data from puts with
    setWriteToWAL(false) in megabytes
  • TIP: Anytime you have non-zero values in h.r.numPutsWithoutWAL
    and h.r.mbInMemoryWithoutWAL, you risk data loss if the RS
    crashes, until the next flush when they go back down to zero.
    Useful to detect applications that setWriteToWAL(false).

                               HBaseCon 2012. 5/22/12                  10
                    Copyright 2012 Cloudera Inc. All rights reserved
Memory: block cache




  • h.r.blockCacheCount - number of blocks in block cache
  • h.r.blockCacheEvictedCount – number of blocks evicted in the last period
  • h.r.blockCacheFree, h.r.blockCacheSize – free/occupied space in block
    cache in bytes
  • TIP: High sustained levels of h.r.blockCacheEvictedCount mean your block
    cache is turning over due to heap size constraints – possibly more GCs
      • Get more memory
      • Repartition heap amongst the various processes on the host – more for RS


                                    HBaseCon 2012. 5/22/12                         11
                         Copyright 2012 Cloudera Inc. All rights reserved
Memory: block cache




  • h.r.blockCacheHitCount, h.r.blockCacheMissCount – number of cache
    hits/misses
  • h.r.blockCacheHitRatio/h.r.blockCacheHitRatioPastNPeriods – hit ratio (cache
    hits/total requests) for past period, past N periods
  • h.r.blockCacheHitCachingRatio/h.r.blockCacheHitCachingRatioPastNPeriods
    – hit caching ratio (cache hits from requests set to use block cache/total
    requests set to use block cache) for past period, past N periods
  • TIP: With sustained traffic, in general you want your block cache to be fully
    utilized for maximum performance – high h.r.blockCacheHitRatio and
    h.r.blockCacheHitCachingRatio

                                   HBaseCon 2012. 5/22/12                      12
                        Copyright 2012 Cloudera Inc. All rights reserved
Memory: JVM-specific metrics




  • GC
     • TIP: Correlate with “concurrent mode failure” or “promotion failed” in the
        GC logs for GC pauses
  • Heap usage
     • TIP: Do not want this to max out – add more memory or assign more heap
        to your RS
  • Number of logs categorized by level
     • TIP: Be careful of any WARN or ERROR logs – helpful first indicator to look
        more deeply in logs
  • Threads blocked/running

                                   HBaseCon 2012. 5/22/12                        13
                        Copyright 2012 Cloudera Inc. All rights reserved
Latency: FS read




  • h.r.fsReadLatency_avg_time – average time per sequential read in ms
  • h.r.fsReadLatency_num_ops – number of sequential reads
  • h.r.fsPreadLatency_avg_time – average time per positional read in ms
  • h.r.fsPreadLatency_num_ops – number of positional reads
  • Various histograms based on this – 75th, 95th percentile, mean, standard
    deviation, etc., in nanoseconds
  • TIP: Spikes in h.r.fsReadLatency_avg_time/h.r.fsPreadLatency_avg_time
    can indicate HDFS/disk/network problems
                                  HBaseCon 2012. 5/22/12                       14
                       Copyright 2012 Cloudera Inc. All rights reserved
Latency: FS write/sync
  • h.r.fsWriteLatency_avg_time – average time per HLog edit write in ms
  • h.r.fsWriteSize_avg_time – average size per HLog edit write in bytes
  • h.r.fsWriteLatency_num_ops, h.r.fsWriteSize_num_ops – number of HLog
    edit writes
  • h.r.fsSyncLatency_avg_time – average time to sync HLogs to the filesystem
    in ms
  • h.r.fsSyncLatency_num_ops – number of HLog syncs
  • h.r.slowHLogAppendCount – number of HLog edits that took longer than 1
    second
      TIP: Search for a WARN-level log message with the text “appending an edit to hlog”;
      use that time to spelunk through logs to see what else is going on
  • Various histograms based on this – 75th, 95th percentile, mean, standard
    deviation, etc., in nanoseconds
  • TIP: Spikes in write or sync latencies can also be due to HDFS/bad
    disks/bad network

                                     HBaseCon 2012. 5/22/12                             15
                          Copyright 2012 Cloudera Inc. All rights reserved
Latency, region: per-operation-type, per-region
  • From regionserver metrics: h.r.*RequestLatency - latency histograms for
    various client operations in nanoseconds
  • Also general RPC metrics for all requests from clients
      • rpc.metrics.RPCQueueTime is time it takes from receipt of the RPC until it starts
        being processed
      • rpc.metrics.RPCProcessingTime is time it takes from start to end of processing
      • TIP: These are some of the first places to look if your client seems to be running
        slow
  • h.r.readRequestsCount – requests for gets, scanners. Not # of rows.
  • h.r.writeRequestsCount – requests for puts, deletes, etc. Not # of rows.
  • hbase.RegionServerDynamicStatistics.* contains some metrics on a per-
    region and per-column-family basis
      • Subset of block cache, compaction, flush, store/storefile sizes, etc.
  • TIP: These metrics can be used to answer the questions:
      • Is the problem only in one region or in all regions?
      • Is the problem only with one operation type or all operations?


                                      HBaseCon 2012. 5/22/12                                 16
                           Copyright 2012 Cloudera Inc. All rights reserved
Host-level metrics
  • CPU:
     •   Load averages
     •   % idle/system/user/wio (waiting for block I/O)
     •   Running/total processes
     •   Correlate with “top”, “sar”, “mpstat” to figure out which process is
         doing what.
           • TIP: If a process is eating a lot of CPU, that is a clue to look at its logs
  • Memory:
     • swap_free/swap_total
     • cached/free/shared
     • TIP: Compare with logs, other metrics to help determine causes of
         OOMs


                                      HBaseCon 2012. 5/22/12                                17
                           Copyright 2012 Cloudera Inc. All rights reserved
Host-level metrics
  • Network:
     • Bytes/packets sent/received
     • TIP: Correlate with ZK timeouts from logs if RSes are going down
     • TIP: Also look at Ethernet frame errors from ifconfig if you are
       experiencing unexpected dips in network traffic
  • Disk I/O:
     • Read/write latencies
         • Slow disks make HBase performance very bad, especially on .META.
         • TIP: Check dmesg for SCSI errors
     • disk space available
     • swap usage
         • TIP: Should be 0 – otherwise you will have timeouts and RSes going
            down. Add RAM and/or buy more boxes.
                                  HBaseCon 2012. 5/22/12                        18
                       Copyright 2012 Cloudera Inc. All rights reserved
Takeaways

• Know what you are looking for
   • Know what metrics are available, what they mean
   • Collect a lot of different metrics, but don’t drown in them
• Metrics are not all you need
   • Correlate metrics with logs, workload
   • Metrics are another tool in your toolbox. You still have to do
     the work to monitor/debug/tune.




                                HBaseCon 2012. 5/22/12                  19
                     Copyright 2012 Cloudera Inc. All rights reserved
Takeaways
                                        • Take a baseline of your
                                          system in steady-state
                                                • For later comparison if things
                                                  go bad
                                                • Try to make this as apples-to-
                                                  apples as possible, e.g. same
                                                  hardware/workload/config
                                        • Spikes or dips from
                                          baseline can indicate
                                          problems
                                                • But also depends on which
                                                  metric and if you can explain
                                                  it (e.g. increased workload)

                       HBaseCon 2012. 5/22/12                                     20
            Copyright 2012 Cloudera Inc. All rights reserved
Thank you
 (Especially to Jon Hsieh and Todd Lipcon
for their helpful reviews and suggestions)




         Cloudera is hiring
     E-mail me: dsw@cloudera.com


                   HBaseCon 2012. 5/22/12                  21
        Copyright 2012 Cloudera Inc. All rights reserved

Weitere ähnliche Inhalte

Was ist angesagt?

Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
Databricks
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 

Was ist angesagt? (20)

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Apache HBase at Airbnb
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
 
Terraform
TerraformTerraform
Terraform
 

Andere mochten auch

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Andere mochten auch (20)

HBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
 

Ähnlich wie HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera

TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
boorad
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
fann wu
 

Ähnlich wie HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera (20)

Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trenches
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera

  • 1. HBase Metrics: What they mean to you David S. Wang dsw@cloudera.com 5/22/12 HBaseCon 2012. 5/22/12 Copyright 2012 Cloudera Inc. All rights reserved
  • 2. Agenda • Motivation • Where to get metrics • Collection methods • Metrics: • HBase Operations • Memory-related • Latency-related • Host-level • Takeaways HBaseCon 2012. 5/22/12 2 Copyright 2012 Cloudera Inc. All rights reserved
  • 3. Motivation • To understand the steady-state behavior of your HBase cluster • To debug when something bad happens • To help evaluate changes • To figure out when you need to buy more hardware HBaseCon 2012. 5/22/12 3 Copyright 2012 Cloudera Inc. All rights reserved
  • 4. Where to get metrics JVM JVM Region server Master Region Region JVM HDFS Metrics can be obtained DataNode from all of these areas Host CPUs Memory Disks Network HBaseCon 2012. 5/22/12 4 Copyright 2012 Cloudera Inc. All rights reserved
  • 5. Collection methods • Dump to file • web UI /jmx • jconsole • Ganglia • Cloudera Manager • Many other tools out there – suggestions welcome HBaseCon 2012. 5/22/12 5 Copyright 2012 Cloudera Inc. All rights reserved
  • 6. HBase Operations: compactions • hbase.regionserver.compactionQueueSize – size of queue of compactions waiting to be processed • h.r.compactionSize_avg_time – average bytes processed per compaction • h.r.compactionTime_avg_time – average time per compaction in milliseconds • h.r.compactionSize_num_ops, h.r.compactionTime_num_ops – total number of compactions so far • TIP: Spikes in h.r.compactionQueueSize could mean all of your regions are growing at the same rate and need to split/compact at around the same time – time to presplit your regions or turn off auto-compactions HBaseCon 2012. 5/22/12 6 Copyright 2012 Cloudera Inc. All rights reserved
  • 7. HBase Operations: flushes • h.r.flushQueueSize – size of queue of flushes waiting to be processed • h.r.flushSize_avg_time – average bytes processed per flush • h.r.flushTime_avg_time – average time per flush in milliseconds • h.r.flushSize_num_ops, h.r.flushTime_num_ops – total number of flushes so far • TIP: Small h.r.flushSize_avg_time and large flushQueueSize mean premature flushes: may indicate that you need more RAM, or you are ingesting and flushing faster than your disks can handle HBaseCon 2012. 5/22/12 7 Copyright 2012 Cloudera Inc. All rights reserved
  • 8. HBase Operations: region splits • h.r.regionSplitFailureCount – number of unsuccessful splits • h.r.regionSplitSuccessCount – number of successful splits • TIP: Spikes in h.r.regionSplitSuccessCount may mean that you are spending too much time splitting – think about pre-splitting your regions • TIP: Sustained high rates of h.r.regionSplitSuccessCount may mean that you are ingesting fast enough to exceed your region size configuration – change the configuration and/or presplit/disable automatic splits HBaseCon 2012. 5/22/12 8 Copyright 2012 Cloudera Inc. All rights reserved
  • 9. HBase Operations: stores and storefiles • h.r.stores – total number of stores • h.r.storefiles – total number of storefiles • TIP: Large ratio between number of h.r.stores and h.r.storefiles may indicate it’s time to compact more frequently – otherwise read performance can suffer HBaseCon 2012. 5/22/12 9 Copyright 2012 Cloudera Inc. All rights reserved
  • 10. Memory: memstores • h.r.memstoreSizeMB – sum of all the memstore sizes in megabytes • h.r.numPutsWithoutWAL – number of puts with setWriteToWAL(false) • h.r.mbInMemoryWithoutWAL – amount of data from puts with setWriteToWAL(false) in megabytes • TIP: Anytime you have non-zero values in h.r.numPutsWithoutWAL and h.r.mbInMemoryWithoutWAL, you risk data loss if the RS crashes, until the next flush when they go back down to zero. Useful to detect applications that setWriteToWAL(false). HBaseCon 2012. 5/22/12 10 Copyright 2012 Cloudera Inc. All rights reserved
  • 11. Memory: block cache • h.r.blockCacheCount - number of blocks in block cache • h.r.blockCacheEvictedCount – number of blocks evicted in the last period • h.r.blockCacheFree, h.r.blockCacheSize – free/occupied space in block cache in bytes • TIP: High sustained levels of h.r.blockCacheEvictedCount mean your block cache is turning over due to heap size constraints – possibly more GCs • Get more memory • Repartition heap amongst the various processes on the host – more for RS HBaseCon 2012. 5/22/12 11 Copyright 2012 Cloudera Inc. All rights reserved
  • 12. Memory: block cache • h.r.blockCacheHitCount, h.r.blockCacheMissCount – number of cache hits/misses • h.r.blockCacheHitRatio/h.r.blockCacheHitRatioPastNPeriods – hit ratio (cache hits/total requests) for past period, past N periods • h.r.blockCacheHitCachingRatio/h.r.blockCacheHitCachingRatioPastNPeriods – hit caching ratio (cache hits from requests set to use block cache/total requests set to use block cache) for past period, past N periods • TIP: With sustained traffic, in general you want your block cache to be fully utilized for maximum performance – high h.r.blockCacheHitRatio and h.r.blockCacheHitCachingRatio HBaseCon 2012. 5/22/12 12 Copyright 2012 Cloudera Inc. All rights reserved
  • 13. Memory: JVM-specific metrics • GC • TIP: Correlate with “concurrent mode failure” or “promotion failed” in the GC logs for GC pauses • Heap usage • TIP: Do not want this to max out – add more memory or assign more heap to your RS • Number of logs categorized by level • TIP: Be careful of any WARN or ERROR logs – helpful first indicator to look more deeply in logs • Threads blocked/running HBaseCon 2012. 5/22/12 13 Copyright 2012 Cloudera Inc. All rights reserved
  • 14. Latency: FS read • h.r.fsReadLatency_avg_time – average time per sequential read in ms • h.r.fsReadLatency_num_ops – number of sequential reads • h.r.fsPreadLatency_avg_time – average time per positional read in ms • h.r.fsPreadLatency_num_ops – number of positional reads • Various histograms based on this – 75th, 95th percentile, mean, standard deviation, etc., in nanoseconds • TIP: Spikes in h.r.fsReadLatency_avg_time/h.r.fsPreadLatency_avg_time can indicate HDFS/disk/network problems HBaseCon 2012. 5/22/12 14 Copyright 2012 Cloudera Inc. All rights reserved
  • 15. Latency: FS write/sync • h.r.fsWriteLatency_avg_time – average time per HLog edit write in ms • h.r.fsWriteSize_avg_time – average size per HLog edit write in bytes • h.r.fsWriteLatency_num_ops, h.r.fsWriteSize_num_ops – number of HLog edit writes • h.r.fsSyncLatency_avg_time – average time to sync HLogs to the filesystem in ms • h.r.fsSyncLatency_num_ops – number of HLog syncs • h.r.slowHLogAppendCount – number of HLog edits that took longer than 1 second TIP: Search for a WARN-level log message with the text “appending an edit to hlog”; use that time to spelunk through logs to see what else is going on • Various histograms based on this – 75th, 95th percentile, mean, standard deviation, etc., in nanoseconds • TIP: Spikes in write or sync latencies can also be due to HDFS/bad disks/bad network HBaseCon 2012. 5/22/12 15 Copyright 2012 Cloudera Inc. All rights reserved
  • 16. Latency, region: per-operation-type, per-region • From regionserver metrics: h.r.*RequestLatency - latency histograms for various client operations in nanoseconds • Also general RPC metrics for all requests from clients • rpc.metrics.RPCQueueTime is time it takes from receipt of the RPC until it starts being processed • rpc.metrics.RPCProcessingTime is time it takes from start to end of processing • TIP: These are some of the first places to look if your client seems to be running slow • h.r.readRequestsCount – requests for gets, scanners. Not # of rows. • h.r.writeRequestsCount – requests for puts, deletes, etc. Not # of rows. • hbase.RegionServerDynamicStatistics.* contains some metrics on a per- region and per-column-family basis • Subset of block cache, compaction, flush, store/storefile sizes, etc. • TIP: These metrics can be used to answer the questions: • Is the problem only in one region or in all regions? • Is the problem only with one operation type or all operations? HBaseCon 2012. 5/22/12 16 Copyright 2012 Cloudera Inc. All rights reserved
  • 17. Host-level metrics • CPU: • Load averages • % idle/system/user/wio (waiting for block I/O) • Running/total processes • Correlate with “top”, “sar”, “mpstat” to figure out which process is doing what. • TIP: If a process is eating a lot of CPU, that is a clue to look at its logs • Memory: • swap_free/swap_total • cached/free/shared • TIP: Compare with logs, other metrics to help determine causes of OOMs HBaseCon 2012. 5/22/12 17 Copyright 2012 Cloudera Inc. All rights reserved
  • 18. Host-level metrics • Network: • Bytes/packets sent/received • TIP: Correlate with ZK timeouts from logs if RSes are going down • TIP: Also look at Ethernet frame errors from ifconfig if you are experiencing unexpected dips in network traffic • Disk I/O: • Read/write latencies • Slow disks make HBase performance very bad, especially on .META. • TIP: Check dmesg for SCSI errors • disk space available • swap usage • TIP: Should be 0 – otherwise you will have timeouts and RSes going down. Add RAM and/or buy more boxes. HBaseCon 2012. 5/22/12 18 Copyright 2012 Cloudera Inc. All rights reserved
  • 19. Takeaways • Know what you are looking for • Know what metrics are available, what they mean • Collect a lot of different metrics, but don’t drown in them • Metrics are not all you need • Correlate metrics with logs, workload • Metrics are another tool in your toolbox. You still have to do the work to monitor/debug/tune. HBaseCon 2012. 5/22/12 19 Copyright 2012 Cloudera Inc. All rights reserved
  • 20. Takeaways • Take a baseline of your system in steady-state • For later comparison if things go bad • Try to make this as apples-to- apples as possible, e.g. same hardware/workload/config • Spikes or dips from baseline can indicate problems • But also depends on which metric and if you can explain it (e.g. increased workload) HBaseCon 2012. 5/22/12 20 Copyright 2012 Cloudera Inc. All rights reserved
  • 21. Thank you (Especially to Jon Hsieh and Todd Lipcon for their helpful reviews and suggestions) Cloudera is hiring E-mail me: dsw@cloudera.com HBaseCon 2012. 5/22/12 21 Copyright 2012 Cloudera Inc. All rights reserved

Hinweis der Redaktion

  1. To understand the steady-state behavior of your HBase cluster:Establish a baselineTo debug when something bad happens:Compare with baselineTo help evaluate changes:ConfigurationWorkloadHardwareHelps point out bottlenecks
  2. Most metrics reside in the region serverMetrics for various categories (e.g. Stores, compactions, flushes)Metrics per operation type (e.g. get, put, delete)Remember HDFS,JVM, and host metrics as well
  3. Use FileContext in hadoop-metrics.properties to have metrics dumped to a file periodicallyweb UIs’ /jmx web pages dump metrics in JSONjconsole for interactive browsingGanglia: has built-in support in Hadoop/HBase, use GangliaContext or GangliaContext31
  4. avg_time metrics name is confusing when you are measuring something that isn’t time. Multiple num_ops metrics that mean the same thing are also confusing. Both are legacy of how code is writtenor use time-based keys, salted keys, reduce data (less columns/smaller column names, compression)
  5. again can change data schema (salted keys, row key composite of time and sha1/md5)
  6. Unsuccessful splits are rolled back. If the rollback fails, the RS aborts.
  7. Stores represent a CF, and can contain one or more StoreFiles. Reads have to go through all of the StoreFiles to get an overall view of the dataset for reads.Bloom filters and time range predicates may preclude storefiles from being culled.
  8. Remember that there is one memstore per store/CF
  9. http://hbase.apache.org/book/regionserver.arch.html contains a thorough explanation of how the block cache is used and what you can do to utilize it better.Turn off block caching for tables that have full table scans
  10. http://hbase.apache.org/book/trouble.log.html#trouble.log.gc explains how to configure GC logging and what they meanhttp://hbase.apache.org/book/jvm.html for what to do about GCs
  11. Network problems can come into play here if HDFS needs to fetch data for the read from somewhere else.
  12. HFile latencies are handled by the compaction and flush metricsHLog edits affect client-facing system latency more visiblyHDFS will have a pipeline per write, so any hiccups along that pipeline in either network or disk will affect latency.
  13. Hotspotting can be solved by methods such as hashing your rowkeys and pre-splitting regionshdfsBlocksLocalityIndex (HBASE-4114) may also be useful as higher indices normally mean lower latencies
  14. Easier to screen out metrics if you know what each of them mean
  15. Keep a library of baselines, changing one thing at a time