SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Chapter 10: Operations
HBase In Action
Overview: Operations
Monitoring your cluster
Performance of your HBase cluster
Cluster Management
Backup and Replication
Summary
09/24/15
10.1 Monitoring Your Cluster
 A critical aspect of any production system is the ability of its
operators to monitor its state and behavior.
 In this section, we’ll talk about how HBase exposes metrics and the
frameworks that are available to you to capture these metrics and
use them to make sense of how your cluster is performing.
 How HBase exposes metrics
 Collecting and graphing the metrics
 The metrics HBase exposes
 Application-side monitoring
09/24/15
10.1.1 How HBase exposes metrics
 The metrics framework is another of the many ways that HBase depends on
Hadoop.
 HBase is tightly integrated with Hadoop and uses Hadoop’s underlying
metrics framework to expose its metrics.
 The metrics framework works by outputting metrics based on a context
implementation that implements the MetricsContext interface.
 Ganglia context and File context.
 HBase also exposes metrics using Java Management Extensions
Hbase Course
Data Manipulation at Scale: Systems and
Algorithms
Using HBase for Real-time Access to Your Big
Data
09/24/15
10.1.2 Collecting and graphing the metrics
 Metrics solutions involve two aspects: collection and graphing.
 Collection frameworks collect the metrics being generated by the system that is being
monitored and store them efficiently so they can be used later.
 Graphing tools use the data captured and stored by collection frameworks and make it
easily consumable for the end user in the form of graphs and pretty pictures.
 Numerous collection and graphing tools are available. But not all of them
are tightly integrated with how Hadoop and HBase expose metrics.
 GANGLIA
 JMX
09/24/15
10.1.2 Collecting and graphing the metrics
 GANGLIA
 Ganglia
(http://ganglia.sourceforge.net/)
5 is a distributed monitoring
framework designed to monitor
clusters.
 It was developed at UC Berkeley
and open-sourced.
 Configure HBase to output
metrics to Ganglia
 Set the parameters in the hadoop-
metrics.properties file, which
resides in the
$HBASE_HOME/conf/ directory.
09/24/15
10.1.2 Collecting and graphing the metrics
 JMX
 Several open source tools such as Cacti and OpenTSDB can be used to collect metrics
via JMX. JMX metrics can also be viewed as JSON from the Master and RegionServer
web UI:
 JMX metrics from the Master: http://master_ip_address:port/jmx
 JMX metrics from a particular RegionServer: http://region_server_ip
_address:port/jmx
 The default port for the Master is 60010 and for the RegionServer is 60030.
 FILE BASED
 HBase can also be configured to output metrics into a flat file.
 File-based metrics aren’t a useful way of recording metrics because they’re hard to
consume thereafter.
09/24/15
10.1.3 The metrics HBase exposes
The Master and RegionServers expose metrics. The metrics of
interest depend on the workload the cluster is sustaining, and
we’ll categorize them accordingly.
 GENERAL METRICS
 HDFS throughput and latency
 HDFS usage
 Underlying disk throughput
 Network throughput and latency from each node
 WRITE-RELATED METRICS
 To understand the system state during writes, the metrics of interest are
the ones that are collected as data is written into the system.
 READ-RELATED METRICS
 Reads are different than writes, and so are the metrics you should monitor to
understand them.
09/24/15
10.1.3 The metrics HBase exposes(con't)
09/24/15
10.1.3 The metrics HBase exposes(con't)
09/24/15
10.1.3 The metrics HBase exposes(con't)
Hbase Course
Data Manipulation at Scale: Systems and
Algorithms
Using HBase for Real-time Access to Your Big
Data
09/24/15
10.1.4 Application-side monitoring
 In a production environment, we recommend that you add to the
system-level monitoring that Ganglia and other tools provide and
also monitor how HBase looks from your application’s perspective.
 Put performance as seen by the client (the application) for every
RegionServer
 Get performance as seen by the client for every RegionServer
 Scan performance as seen by the client for every RegionServer
 Connectivity to all RegionServers
 Network latencies between the application tier and the HBase cluster
 Number of concurrent clients opening to HBase at any point in time
 Connectivity to ZooKeeper
09/24/15
10.2 Performance of your HBase cluster
 Performance of any database is measured in terms of the response
times of the operations that it supports.
 This is important to measure in the context of your application so you can set
the right expectations for users.
 To make sure your HBase cluster is performing within the expected
SLAs, you must test performance thoroughly and tune the cluster to
extract the maximum performance you can get out of it.
 Performance testing
 What impacts HBase’s performance?
 Tuning dependency systems
 Tuning HBase
09/24/15
10.2.1 Performance testing
 There are different ways you can
test the performance of your
HBase cluster.
 PERFORMANCEEVALUATION
TOOL—BUNDLED WITH HBASE
 HBase ships with a tool called
PerformanceEvaluation, which you can
use to evaluate the performance of your
HBase cluster in terms of various
operations.
Examples:
To run a single evaluation client:
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
$ hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10
sequentialWrite 1
09/24/15
10.2.1 Performance testing (con't)
 YCSB—YAHOO! CLOUD SERVING BENCHMARK7
 YCSB is the closest we have come to having a standard benchmarking tool that can be used
to measure and compare the performance of different distributed databases.
 YCSB is available from the project’s GitHub repository
(http://github.com/brianfrankcooper/YCSB/).
 Before running the workload, you need to create the HBase table YCSB will
write to. You can do that from the shell:
 hbase(main):002:0> create 'mytable', 'myfamily'
 $ bin/ycsb load hbase -P workloads/workloada -p columnfamily=myfamily  -p
table=mytable
 You can do all sorts of fancy stuff with YCSB workloads, including configuring
multiple clients, configuring multiple threads, and running mixed workloads
with different statistical distributions of the data.
09/24/15
10.2.2 What impacts HBase’s performance?
 HBase is a distributed database and is
tightly coupled with Hadoop. That
makes it susceptible to the entire stack
under it (figure 10.8) when it comes to
performance.
09/24/15
10.2.3 Tuning dependency systems
 Tuning an HBase cluster to extract maximum performance involves tuning
all dependencies.
 HARDWARE CHOICES
 NETWORK CONFIGURATION
 OPERATING SYSTEM
 LOCAL FILE SYSTEM
 HDFS
09/24/15
10.2.4 Tuning HBase
 Tuning an HBase cluster typically involves tuning multiple different
configuration parameters to suit the workload that you plan to put on the
cluster.
 Random-read-heavy
 Sequential-read-heavy
 Write-heavy
 Mixed
 Each of these workloads demands a different kind of configuration tuning
09/24/15
10.2.4 Tuning HBase
 RANDOM-READ-HEAVY : For
random-read-heavy workloads,
effective use of the cache and
better indexing will get you higher
performance.
09/24/15
10.2.4 Tuning HBase
 For sequential-read-heavy
workloads, the read cache doesn’t
buy you a lot; chances are you’ll be
hitting the disk more often than
not unless the sequential reads are
small in size and are limited to a
particular key range.
09/24/15
10.2.4 Tuning HBase
 WRITE-HEAVY : Write-heavy
workloads need different tuning
than read-heavy ones. The cache
doesn’t play an important role
anymore. Writes always go into the
MemStore and are flushed to form
new HFiles, which later are
compacted.
 The way to get good write
performance is by not flushing,
compacting, or splitting too often
because the I/O load goes up
during that time, slowing the
system.
09/24/15
10.2.4 Tuning HBase
09/24/15
10.2.4 Tuning HBase
 MIXED : With completely mixed workloads, tuning becomes slightly
trickier. You have to tweak a mix of the parameters described earlier to
achieve the optimal combination. Iterate over various combinations, and
run performance tests to see where you get the best results.
 Compression
 Rowkey design
 Major compactions
 RegionServer handler count
09/24/15
10.3 Cluster management
 During the course of running a production system, management
tasks need to be performed at different stages.
 Things like starting or stopping the cluster, upgrading the OS on the nodes,
replacing bad hardware, and backing up data are important tasks and need to be
done right to keep the cluster running smoothly.
 This section highlights some of the important tasks you may need to
perform and teaches how to do them.
09/24/15
10.3.1 Starting and stopping HBase
 The order in which the HBase daemons are stopped and started
matters only to the extent that the dependency systems (HDFS and
ZooKeeper) need to be up before HBase is started and should be
shut down only after HBase has shut down.
 SCRIPTS : in the $HBASE_HOME/bin directory
 CENTRALIZED MANAGEMENT : Cluster-management frameworks like Puppet
and Chef can be used to manage the starting and stopping of daemons from a
central location.
09/24/15
10.3.2 Graceful stop and decommissioning nodes
 When you need to shut down daemons on individual servers for any
management purpose (upgrading, replacing hardware, and so on), you need
to ensure that the rest of the cluster keeps working fine and there is
minimal outage as seen by client applications.
 The script follows these steps (in order) to gracefully stop a RegionServer:
 Disable the region balancer.
 Move the regions off the RegionServer, and randomly assign them to other servers in the
cluster
 Stop the REST and Thrift services if they’re active.
 Stop the RegionServer process.
$ bin/graceful_stop.sh
Usage: graceful_stop.sh [--config <conf-dir>] [--restart] [--reload]
[--thrift] [--rest] <hostname>
thrift If we should stop/start thrift before/after the
09/24/15
10.3.3 Adding nodes
 As your application gets more successful or more use cases crop up, chances
are you’ll need to scale up your HBase cluster.
 It could also be that you’re replacing a node for some reason. The process to
add a node to the HBase cluster is the same in both cases.
09/24/15
10.3.4 Rolling restarts and upgrading
 It’s not rare to patch or upgrade Hadoop and HBase releases in running
clusters.
 In production systems, upgrades can be tricky. Often, it isn’t possible to
take downtime on the cluster to do upgrades.
 But not all upgrades are between major releases and require downtime.
 To do upgrades without taking a downtime, follow these steps:
 Deploy the new HBase version to all nodes in the cluster, including the new ZooKeeper if
that needs an update as well.
 Turn off the balancer process. One by one, gracefully stop the RegionServers and bring them
back up.
 Restart the HBase Masters one by one.
 If ZooKeeper requires a restart, restart all the nodes in the quorum one by one.
 Upgrade the clients.
 You can use the same steps to do a rolling restart for any other purpose as well.
09/24/15
10.3.5 bin/hbase and the HBase shell
 The script basically runs the Java
class associated with the command
you choose to pass it:
09/24/15
10.3.5 bin/hbase and the HBase shell
09/24/15
 We’ll focus on the tools group of
commands (shown in bold). To get a
description for any command, you can
run help 'command_name' in the shell
like this
 ZK_DUMP : You can use the zk_dump
command to find out the current state
of ZooKeeper:
 STATUS COMMAND : You can use the
status command to determine the
status of the cluster.
 COMPACTIONS
 BALANCER
 SPLITTING TABLES OR REGIONS
 ALTERING TABLE SCHEMAS
 TRUNCATING TABLES
10.3.5 bin/hbase and the HBase shell
09/24/15
10.3.6 Maintaining consistency—hbck
 HBase comes with a tool called hbck (or HBaseFsck) that checks for the
consistency and integrity of the HBase cluster.
 Hbck recently underwent an overhaul, and the resulting tool was nicknamed uberhbck.
 Hbck is a tool that helps in checking for inconsistencies in HBase clusters.
Inconsistencies can occur at two levels:
 Region inconsistencies
 Table inconsistencies
 Hbck performs two primary functions: detect inconsistencies and fix
inconsistencies.
 DETECTING INCONSISTENCIES :
 $ $HBASE_HOME/bin/hbase hbck
 $ $HBASE_HOME/bin/hbase hbck -details
 FIXING INCONSISTENCIES :
 Incorrect assignments
 Missing or extra regions
09/24/15
10.3.7 Viewing HFiles and HLogs
 HBase provides utilities to
examine the HFiles and HLogs
(WAL) that are being created at
write time.
 The HLogs are located in the .logs
directory in the HBase root
directory on the file system. You
can examine them by using the
hlog command of the bin/hbase
script, like this:
09/24/15
10.3.7 Viewing HFiles and HLogs
 The script has a similar utility for
examining the HFiles. To print the
help for the command, run the
command without any arguments:
 You can see that there is a lot of
information about the HFile.
Other options can be used to get
different bits of information.
09/24/15
10.3.8 Presplitting tables
 Table splitting during heavy write
loads can result in increased latencies.
Splitting is typically followed by
regions moving around to balance the
cluster, which adds to the overhead.
 Presplitting tables is also desirable for
bulk loads, which we cover later in the
chapter. If the key distribution is well
known, you can split the table into the
desired number of regions at the time
of table creation.
09/24/15
10.4 Backup and replication
 Inter-cluster replication
 Backup using MapReduce jobs
 Backing up the root directory
09/24/15
10.4.1 Inter-cluster replication
 Inter-cluster replication can be of
three types:
 Master-slave
 Master-master
 Cyclic
09/24/15
10.4.2 Backup using MapReduce jobs
 MapReduce jobs can be configured to use HBase tables as the source and
sink, as we covered in chapter 3. This ability can come in handy to do point-
in-time backups of tables by scanning through them and outputting the
data into flat files or other HBase tables.
 This is different from inter-cluster replication, which the last section
described.
 Inter-cluster replication is a push mechanism.
 Running MapReduce jobs over tables is a pull mechanism
 EXPORT/IMPORT
 The prebundled Export MapReduce job can be used to export data from HBase tables into
flat files.
 That data can then later be imported into another HBase table on the same or a different
cluster using the Import job.
09/24/15
10.4.2 Backup using MapReduce jobs
09/24/15
10.4.2 Backup using MapReduce jobs
 ADVANCED IMPORT WITH
IMPORTTSV
 ImportTsv is more feature-rich.
 It allows you to load data from newline-
terminated, delimited text files.
09/24/15
10.4.3 Backing up the root directory
 HBase stores its data in the directory specified by the hbase.rootdir
configuration property. This directory contains all the region information,
all the HFiles for the tables, as well as the WALs for all RegionServers.
 When an HBase cluster is up and running, several things are going on:
MemStore flushes, region splits, compactions, and so on.
 But if you stop the HBase daemons cleanly, the MemStore is flushed and
the root directory isn’t altered by any process.
Hbase Course
Data Manipulation at Scale: Systems and
Algorithms
Using HBase for Real-time Access to Your Big
Data
09/24/15
10.5 Summary
Production-quality operations of any software system are
learned over time. This chapter covered several aspects of
operating HBase in production with the intention of getting
you started on the path to understanding the concepts.
New tools and scripts probably will be developed by HBase
users and will benefit you.
 The first aspect of operations is instrumenting and monitoring the system.
 From monitoring, the chapter transitioned into talking about performance
testing, measuring performance, and tuning HBase for different kinds of
workloads.
 From there we covered a list of common management tasks and how and
when to do them.
 Mastering HBase operations requires an understanding of the internals and
experience gained by working with the system.

Weitere ähnliche Inhalte

Was ist angesagt?

(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
NAVER D2
 

Was ist angesagt? (20)

How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Presentation day1oracle 12c
Presentation day1oracle 12cPresentation day1oracle 12c
Presentation day1oracle 12c
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
What's New in PostgreSQL 9.3
What's New in PostgreSQL 9.3What's New in PostgreSQL 9.3
What's New in PostgreSQL 9.3
 
Sql server backup internals
Sql server backup internalsSql server backup internals
Sql server backup internals
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Oracle database 12c new features
Oracle database 12c new featuresOracle database 12c new features
Oracle database 12c new features
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insights
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Presentationday3oracle12c
Presentationday3oracle12cPresentationday3oracle12c
Presentationday3oracle12c
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Presentation day2 oracle12c
Presentation day2 oracle12cPresentation day2 oracle12c
Presentation day2 oracle12c
 

Andere mochten auch

Session 1 Tp1
Session 1 Tp1Session 1 Tp1
Session 1 Tp1
phanleson
 
Hibernate Tutorial
Hibernate TutorialHibernate Tutorial
Hibernate Tutorial
Ram132
 
Introduction to hibernate
Introduction to hibernateIntroduction to hibernate
Introduction to hibernate
hr1383
 

Andere mochten auch (17)

Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Session 1 Tp1
Session 1 Tp1Session 1 Tp1
Session 1 Tp1
 
COM Introduction
COM IntroductionCOM Introduction
COM Introduction
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
enterprise java bean
enterprise java beanenterprise java bean
enterprise java bean
 
Hibernate Tutorial
Hibernate TutorialHibernate Tutorial
Hibernate Tutorial
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
 
JPA and Hibernate
JPA and HibernateJPA and Hibernate
JPA and Hibernate
 
Introduction to hibernate
Introduction to hibernateIntroduction to hibernate
Introduction to hibernate
 
Intro To Hibernate
Intro To HibernateIntro To Hibernate
Intro To Hibernate
 
Hibernate performance tuning
Hibernate performance tuningHibernate performance tuning
Hibernate performance tuning
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginners
 

Ähnlich wie HBase In Action - Chapter 10 - Operations

Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
Fei Dong
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 
Benchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docxBenchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docx
jasoninnes20
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
Scott Gray
 

Ähnlich wie HBase In Action - Chapter 10 - Operations (20)

hbase lab
hbase labhbase lab
hbase lab
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
 
Benchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docxBenchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docx
 
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
SAP HANA SPS09 - Multitenant Database Containers
SAP HANA SPS09 - Multitenant Database ContainersSAP HANA SPS09 - Multitenant Database Containers
SAP HANA SPS09 - Multitenant Database Containers
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Cool features 7.4
Cool features 7.4Cool features 7.4
Cool features 7.4
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 
CDS Views.pptx
CDS Views.pptxCDS Views.pptx
CDS Views.pptx
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 

Mehr von phanleson

Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
phanleson
 

Mehr von phanleson (20)

E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposes
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Development
 
Lecture 15 - Technical Details
Lecture 15 - Technical DetailsLecture 15 - Technical Details
Lecture 15 - Technical Details
 
Lecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange PatternsLecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange Patterns
 
Lecture 9 - SOA in Context
Lecture 9 - SOA in ContextLecture 9 - SOA in Context
Lecture 9 - SOA in Context
 
Lecture 07 - Business Process Management
Lecture 07 - Business Process ManagementLecture 07 - Business Process Management
Lecture 07 - Business Process Management
 
Lecture 04 - Loose Coupling
Lecture 04 - Loose CouplingLecture 04 - Loose Coupling
Lecture 04 - Loose Coupling
 
Lecture 2 - SOA
Lecture 2 - SOALecture 2 - SOA
Lecture 2 - SOA
 
Lecture 3 - Services
Lecture 3 - ServicesLecture 3 - Services
Lecture 3 - Services
 
Lecture 01 - Motivation
Lecture 01 - MotivationLecture 01 - Motivation
Lecture 01 - Motivation
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 

Kürzlich hochgeladen (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

HBase In Action - Chapter 10 - Operations

  • 2. Overview: Operations Monitoring your cluster Performance of your HBase cluster Cluster Management Backup and Replication Summary
  • 3. 09/24/15 10.1 Monitoring Your Cluster  A critical aspect of any production system is the ability of its operators to monitor its state and behavior.  In this section, we’ll talk about how HBase exposes metrics and the frameworks that are available to you to capture these metrics and use them to make sense of how your cluster is performing.  How HBase exposes metrics  Collecting and graphing the metrics  The metrics HBase exposes  Application-side monitoring
  • 4. 09/24/15 10.1.1 How HBase exposes metrics  The metrics framework is another of the many ways that HBase depends on Hadoop.  HBase is tightly integrated with Hadoop and uses Hadoop’s underlying metrics framework to expose its metrics.  The metrics framework works by outputting metrics based on a context implementation that implements the MetricsContext interface.  Ganglia context and File context.  HBase also exposes metrics using Java Management Extensions
  • 5. Hbase Course Data Manipulation at Scale: Systems and Algorithms Using HBase for Real-time Access to Your Big Data
  • 6. 09/24/15 10.1.2 Collecting and graphing the metrics  Metrics solutions involve two aspects: collection and graphing.  Collection frameworks collect the metrics being generated by the system that is being monitored and store them efficiently so they can be used later.  Graphing tools use the data captured and stored by collection frameworks and make it easily consumable for the end user in the form of graphs and pretty pictures.  Numerous collection and graphing tools are available. But not all of them are tightly integrated with how Hadoop and HBase expose metrics.  GANGLIA  JMX
  • 7. 09/24/15 10.1.2 Collecting and graphing the metrics  GANGLIA  Ganglia (http://ganglia.sourceforge.net/) 5 is a distributed monitoring framework designed to monitor clusters.  It was developed at UC Berkeley and open-sourced.  Configure HBase to output metrics to Ganglia  Set the parameters in the hadoop- metrics.properties file, which resides in the $HBASE_HOME/conf/ directory.
  • 8. 09/24/15 10.1.2 Collecting and graphing the metrics  JMX  Several open source tools such as Cacti and OpenTSDB can be used to collect metrics via JMX. JMX metrics can also be viewed as JSON from the Master and RegionServer web UI:  JMX metrics from the Master: http://master_ip_address:port/jmx  JMX metrics from a particular RegionServer: http://region_server_ip _address:port/jmx  The default port for the Master is 60010 and for the RegionServer is 60030.  FILE BASED  HBase can also be configured to output metrics into a flat file.  File-based metrics aren’t a useful way of recording metrics because they’re hard to consume thereafter.
  • 9. 09/24/15 10.1.3 The metrics HBase exposes The Master and RegionServers expose metrics. The metrics of interest depend on the workload the cluster is sustaining, and we’ll categorize them accordingly.  GENERAL METRICS  HDFS throughput and latency  HDFS usage  Underlying disk throughput  Network throughput and latency from each node  WRITE-RELATED METRICS  To understand the system state during writes, the metrics of interest are the ones that are collected as data is written into the system.  READ-RELATED METRICS  Reads are different than writes, and so are the metrics you should monitor to understand them.
  • 10. 09/24/15 10.1.3 The metrics HBase exposes(con't)
  • 11. 09/24/15 10.1.3 The metrics HBase exposes(con't)
  • 12. 09/24/15 10.1.3 The metrics HBase exposes(con't)
  • 13. Hbase Course Data Manipulation at Scale: Systems and Algorithms Using HBase for Real-time Access to Your Big Data
  • 14. 09/24/15 10.1.4 Application-side monitoring  In a production environment, we recommend that you add to the system-level monitoring that Ganglia and other tools provide and also monitor how HBase looks from your application’s perspective.  Put performance as seen by the client (the application) for every RegionServer  Get performance as seen by the client for every RegionServer  Scan performance as seen by the client for every RegionServer  Connectivity to all RegionServers  Network latencies between the application tier and the HBase cluster  Number of concurrent clients opening to HBase at any point in time  Connectivity to ZooKeeper
  • 15. 09/24/15 10.2 Performance of your HBase cluster  Performance of any database is measured in terms of the response times of the operations that it supports.  This is important to measure in the context of your application so you can set the right expectations for users.  To make sure your HBase cluster is performing within the expected SLAs, you must test performance thoroughly and tune the cluster to extract the maximum performance you can get out of it.  Performance testing  What impacts HBase’s performance?  Tuning dependency systems  Tuning HBase
  • 16. 09/24/15 10.2.1 Performance testing  There are different ways you can test the performance of your HBase cluster.  PERFORMANCEEVALUATION TOOL—BUNDLED WITH HBASE  HBase ships with a tool called PerformanceEvaluation, which you can use to evaluate the performance of your HBase cluster in terms of various operations. Examples: To run a single evaluation client: $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 $ hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 sequentialWrite 1
  • 17. 09/24/15 10.2.1 Performance testing (con't)  YCSB—YAHOO! CLOUD SERVING BENCHMARK7  YCSB is the closest we have come to having a standard benchmarking tool that can be used to measure and compare the performance of different distributed databases.  YCSB is available from the project’s GitHub repository (http://github.com/brianfrankcooper/YCSB/).  Before running the workload, you need to create the HBase table YCSB will write to. You can do that from the shell:  hbase(main):002:0> create 'mytable', 'myfamily'  $ bin/ycsb load hbase -P workloads/workloada -p columnfamily=myfamily -p table=mytable  You can do all sorts of fancy stuff with YCSB workloads, including configuring multiple clients, configuring multiple threads, and running mixed workloads with different statistical distributions of the data.
  • 18. 09/24/15 10.2.2 What impacts HBase’s performance?  HBase is a distributed database and is tightly coupled with Hadoop. That makes it susceptible to the entire stack under it (figure 10.8) when it comes to performance.
  • 19. 09/24/15 10.2.3 Tuning dependency systems  Tuning an HBase cluster to extract maximum performance involves tuning all dependencies.  HARDWARE CHOICES  NETWORK CONFIGURATION  OPERATING SYSTEM  LOCAL FILE SYSTEM  HDFS
  • 20. 09/24/15 10.2.4 Tuning HBase  Tuning an HBase cluster typically involves tuning multiple different configuration parameters to suit the workload that you plan to put on the cluster.  Random-read-heavy  Sequential-read-heavy  Write-heavy  Mixed  Each of these workloads demands a different kind of configuration tuning
  • 21. 09/24/15 10.2.4 Tuning HBase  RANDOM-READ-HEAVY : For random-read-heavy workloads, effective use of the cache and better indexing will get you higher performance.
  • 22. 09/24/15 10.2.4 Tuning HBase  For sequential-read-heavy workloads, the read cache doesn’t buy you a lot; chances are you’ll be hitting the disk more often than not unless the sequential reads are small in size and are limited to a particular key range.
  • 23. 09/24/15 10.2.4 Tuning HBase  WRITE-HEAVY : Write-heavy workloads need different tuning than read-heavy ones. The cache doesn’t play an important role anymore. Writes always go into the MemStore and are flushed to form new HFiles, which later are compacted.  The way to get good write performance is by not flushing, compacting, or splitting too often because the I/O load goes up during that time, slowing the system.
  • 25. 09/24/15 10.2.4 Tuning HBase  MIXED : With completely mixed workloads, tuning becomes slightly trickier. You have to tweak a mix of the parameters described earlier to achieve the optimal combination. Iterate over various combinations, and run performance tests to see where you get the best results.  Compression  Rowkey design  Major compactions  RegionServer handler count
  • 26. 09/24/15 10.3 Cluster management  During the course of running a production system, management tasks need to be performed at different stages.  Things like starting or stopping the cluster, upgrading the OS on the nodes, replacing bad hardware, and backing up data are important tasks and need to be done right to keep the cluster running smoothly.  This section highlights some of the important tasks you may need to perform and teaches how to do them.
  • 27. 09/24/15 10.3.1 Starting and stopping HBase  The order in which the HBase daemons are stopped and started matters only to the extent that the dependency systems (HDFS and ZooKeeper) need to be up before HBase is started and should be shut down only after HBase has shut down.  SCRIPTS : in the $HBASE_HOME/bin directory  CENTRALIZED MANAGEMENT : Cluster-management frameworks like Puppet and Chef can be used to manage the starting and stopping of daemons from a central location.
  • 28. 09/24/15 10.3.2 Graceful stop and decommissioning nodes  When you need to shut down daemons on individual servers for any management purpose (upgrading, replacing hardware, and so on), you need to ensure that the rest of the cluster keeps working fine and there is minimal outage as seen by client applications.  The script follows these steps (in order) to gracefully stop a RegionServer:  Disable the region balancer.  Move the regions off the RegionServer, and randomly assign them to other servers in the cluster  Stop the REST and Thrift services if they’re active.  Stop the RegionServer process. $ bin/graceful_stop.sh Usage: graceful_stop.sh [--config <conf-dir>] [--restart] [--reload] [--thrift] [--rest] <hostname> thrift If we should stop/start thrift before/after the
  • 29. 09/24/15 10.3.3 Adding nodes  As your application gets more successful or more use cases crop up, chances are you’ll need to scale up your HBase cluster.  It could also be that you’re replacing a node for some reason. The process to add a node to the HBase cluster is the same in both cases.
  • 30. 09/24/15 10.3.4 Rolling restarts and upgrading  It’s not rare to patch or upgrade Hadoop and HBase releases in running clusters.  In production systems, upgrades can be tricky. Often, it isn’t possible to take downtime on the cluster to do upgrades.  But not all upgrades are between major releases and require downtime.  To do upgrades without taking a downtime, follow these steps:  Deploy the new HBase version to all nodes in the cluster, including the new ZooKeeper if that needs an update as well.  Turn off the balancer process. One by one, gracefully stop the RegionServers and bring them back up.  Restart the HBase Masters one by one.  If ZooKeeper requires a restart, restart all the nodes in the quorum one by one.  Upgrade the clients.  You can use the same steps to do a rolling restart for any other purpose as well.
  • 31. 09/24/15 10.3.5 bin/hbase and the HBase shell  The script basically runs the Java class associated with the command you choose to pass it:
  • 32. 09/24/15 10.3.5 bin/hbase and the HBase shell
  • 33. 09/24/15  We’ll focus on the tools group of commands (shown in bold). To get a description for any command, you can run help 'command_name' in the shell like this  ZK_DUMP : You can use the zk_dump command to find out the current state of ZooKeeper:  STATUS COMMAND : You can use the status command to determine the status of the cluster.  COMPACTIONS  BALANCER  SPLITTING TABLES OR REGIONS  ALTERING TABLE SCHEMAS  TRUNCATING TABLES 10.3.5 bin/hbase and the HBase shell
  • 34. 09/24/15 10.3.6 Maintaining consistency—hbck  HBase comes with a tool called hbck (or HBaseFsck) that checks for the consistency and integrity of the HBase cluster.  Hbck recently underwent an overhaul, and the resulting tool was nicknamed uberhbck.  Hbck is a tool that helps in checking for inconsistencies in HBase clusters. Inconsistencies can occur at two levels:  Region inconsistencies  Table inconsistencies  Hbck performs two primary functions: detect inconsistencies and fix inconsistencies.  DETECTING INCONSISTENCIES :  $ $HBASE_HOME/bin/hbase hbck  $ $HBASE_HOME/bin/hbase hbck -details  FIXING INCONSISTENCIES :  Incorrect assignments  Missing or extra regions
  • 35. 09/24/15 10.3.7 Viewing HFiles and HLogs  HBase provides utilities to examine the HFiles and HLogs (WAL) that are being created at write time.  The HLogs are located in the .logs directory in the HBase root directory on the file system. You can examine them by using the hlog command of the bin/hbase script, like this:
  • 36. 09/24/15 10.3.7 Viewing HFiles and HLogs  The script has a similar utility for examining the HFiles. To print the help for the command, run the command without any arguments:  You can see that there is a lot of information about the HFile. Other options can be used to get different bits of information.
  • 37. 09/24/15 10.3.8 Presplitting tables  Table splitting during heavy write loads can result in increased latencies. Splitting is typically followed by regions moving around to balance the cluster, which adds to the overhead.  Presplitting tables is also desirable for bulk loads, which we cover later in the chapter. If the key distribution is well known, you can split the table into the desired number of regions at the time of table creation.
  • 38. 09/24/15 10.4 Backup and replication  Inter-cluster replication  Backup using MapReduce jobs  Backing up the root directory
  • 39. 09/24/15 10.4.1 Inter-cluster replication  Inter-cluster replication can be of three types:  Master-slave  Master-master  Cyclic
  • 40. 09/24/15 10.4.2 Backup using MapReduce jobs  MapReduce jobs can be configured to use HBase tables as the source and sink, as we covered in chapter 3. This ability can come in handy to do point- in-time backups of tables by scanning through them and outputting the data into flat files or other HBase tables.  This is different from inter-cluster replication, which the last section described.  Inter-cluster replication is a push mechanism.  Running MapReduce jobs over tables is a pull mechanism  EXPORT/IMPORT  The prebundled Export MapReduce job can be used to export data from HBase tables into flat files.  That data can then later be imported into another HBase table on the same or a different cluster using the Import job.
  • 42. 09/24/15 10.4.2 Backup using MapReduce jobs  ADVANCED IMPORT WITH IMPORTTSV  ImportTsv is more feature-rich.  It allows you to load data from newline- terminated, delimited text files.
  • 43. 09/24/15 10.4.3 Backing up the root directory  HBase stores its data in the directory specified by the hbase.rootdir configuration property. This directory contains all the region information, all the HFiles for the tables, as well as the WALs for all RegionServers.  When an HBase cluster is up and running, several things are going on: MemStore flushes, region splits, compactions, and so on.  But if you stop the HBase daemons cleanly, the MemStore is flushed and the root directory isn’t altered by any process.
  • 44. Hbase Course Data Manipulation at Scale: Systems and Algorithms Using HBase for Real-time Access to Your Big Data
  • 45. 09/24/15 10.5 Summary Production-quality operations of any software system are learned over time. This chapter covered several aspects of operating HBase in production with the intention of getting you started on the path to understanding the concepts. New tools and scripts probably will be developed by HBase users and will benefit you.  The first aspect of operations is instrumenting and monitoring the system.  From monitoring, the chapter transitioned into talking about performance testing, measuring performance, and tuning HBase for different kinds of workloads.  From there we covered a list of common management tasks and how and when to do them.  Mastering HBase operations requires an understanding of the internals and experience gained by working with the system.

Hinweis der Redaktion

  1. When issues happen, the last thing an operator wants to do is to sift through GBs and TBs of logs to make sense of the state of the system and the root cause of the issue. Not many people are champions at reading thousands of log lines across multiple servers to make sense of what’s going on. That’s where recording detailed metrics comes into play. Many things are happening in a production-quality database like HBase, and each of them can be measured in different ways. These measurements are exposed by the system and can be captured by external frameworks that are designed to record them and make them available to operators in a consumable fashion. We recommend that you set up your full metrics collection, graphing, and monitoring stack even in the prototyping stage of your HBase adoption. This will enable you to become familiar with the various aspects of operating HBase and will make the transition to production much smoother.
  2. http://ouo.io/uaiKO
  3. One interesting metric to keep an eye on is the CPU I/O wait percentage. This indicates the amount of time the CPU spends waiting for disk I/O and is a good indicator of whether your system is I/O bound.
  4. http://ouo.io/uaiKO
  5. The limitation of this testing utility is that you can’t run mixed workloads without coding it up yourself. The test has to be one of the bundled ones, and they have to be run individually as separate runs. If your workload consists of Scans and Gets and Puts happening at the same time, this tool doesn’t give you the ability to truly test your cluster by mixing it all up. That brings us to our next testing utility.
  6. Once YCSB is compiled, put your HBase cluster’s configuration in hbase/src/main/ conf/hbase-site.xml. You only need to put the hbase.zookeeper.quorum property in the config file so YCSB can use it as the entry point for the cluster. Now you’re ready to run workloads to test your cluster. YCSB comes with a few sample workloads that you can find in the workloads directory.
  7. Performance is affected by everything from the underlying hardware that makes up the boxes in the cluster to the network connecting them to the OS (specifically the file system) to the JVM to HDFS. The state of the HBase system matters too. For instance, performance is different during a compaction or during MemStore flushes compared to when nothing is going on in the cluster. Your application’s performance depends on how it interacts with HBase, and your schema design plays an integral role as much as anything else. When looking at HBase performance, all of these factors matter; and when you tune your cluster, you need to look into all of them. Going into tuning each of those layers is beyond the scope of this text. We covered JVM tuning (garbage collection specifically) in chapter 9. We’ll discuss some key aspects of tuning your HBase cluster next.
  8. Although Import is a simple complement to Export, ImportTsv is more feature-rich. It allows you to load data from newline-terminated, delimited text files. Most commonly, this is a tab-separated format, but the delimiter is configurable (for loading comma-separated files). You specify a destination table and provide it with a mapping from columns in your data file(s) to columns in HBase:
  9. http://ouo.io/uaiKO