HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

What’s in it for you?
What is HBase?
1

What is HBase?
1
HBase Use Case
2

What is HBase?
Applications of HBase
1
HBase Use Case
2
3

What is HBase?
HBase vs RDBMS
1
HBase Use Case
2
3
4

What is HBase?
HBase Storage HBase vs RDBMS
1
HBase Use Case
2
3
45

What is HBase?
HBase Storage
HBase Architectural
Components
HBase vs RDBMS
1
HBase Use Case
2
3
45
6

What is HBase?
HBase Storage
HBase Architectural
Components
Demo on HBase
HBase vs RDBMS
1
HBase Use Case
2
3
45
6
7

Introduction to HBaseIntroduction to HBase

Introduction to HBase
This data could be easily
stored in a Relational
Database (RDMS)
Structured data
Back in the days, data
used to be less and was
mostly structured

Then, Internet evolved and
huge volumes of
structured and semi-
structured data got
generated
Storing and processing this
data on RDBMS became a
major problem
Semi-structured
data

Apache HBASE was
the solution for this
Semi-structured
data
SolutionThen, Internet evolved and
huge volumes of
structured and semi-
structured data got
generated

Introduction to HBaseHBase History

HBase History
1
Google released the
paper on BigTable
Nov 2006

HBase History
1
2
Google released the
paper on BigTable HBase prototype was
created as a Hadoop
contribution
Nov 2006
Feb 2007

HBase History
1
2
3
Google released the
created as a Hadoop
contribution
First usable HBase
along with Hadoop
0.15.0 was released
Nov 2006
Feb 2007
Oct 2007

HBase History
1
2
3
4
Google released the
created as a Hadoop
contribution
First usable HBase
along with Hadoop
0.15.0 was released HBase became the
subproject of Hadoop
Nov 2006
Feb 2007
Oct 2007
Jan 2008

HBase History
1
2
3
4
5
Google released the
created as a Hadoop
contribution
First usable HBase
along with Hadoop
HBase 0.81.1, 0.19.0
and 0.20.0 was
released between Oct
2008 – Sep 2009
Nov 2006
Feb 2007
Oct 2007
Jan 2008
Oct 2008 – Sep 2009

HBase History
1
2
3
4
5
6
Google released the
created as a Hadoop
contribution
First usable HBase
along with Hadoop
HBase 0.81.1, 0.19.0
and 0.20.0 was
released between Oct
2008 – Sep 2009 HBase became Apache
top-level project
Nov 2006
Feb 2007
Oct 2007
Jan 2008
Oct 2008 – Sep 2009
May 2010

Introduction to HBaseWhat is HBase?

What is HBase?
HBase is a column oriented database management system derived from Google’s NoSQL database
BigTable that runs on top of HDFS

What is HBase?
Open source project that is horizontally scalable1

What is HBase?
Open source project that is horizontally scalable1
2
NoSQL database written in JAVA which performs
faster querying

What is HBase?
Open source project that is horizontally scalable
NoSQL database written in JAVA which performs
faster querying
Well suited for sparse data sets
(can contain missing or NA values)
1
2
3

Introduction to HBaseCompanies using HBase

Introduction to HBaseHBase Use Case

HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China

HBase Use Case
China
Generated billions of Call
Detail Records (CDR)

HBase Use Case
China
Traditional database systems were
unable to scale up to the vast volumes
of data and provide a cost-effective
solution

HBase Use Case
China
Storing and real-time analysis of billions
of call records was a major problem

HBase Use Case
China
HBase stores billions of rows of detailed
call records
Solution

HBase Use Case
China
HBase performs fast processing of
records using SQL queries
Generates billions of Call Detail
Records (CDR)

Introduction to HBaseApplications of HBase

Medical
HBase is used for storing genome
sequences
Storing disease history of people
or an area

Medical E-Commerce
sequences
or an area
HBase is used for storing logs about
customer search history
Performs analytics and target
advertisement for better business insights

Medical E-Commerce Sports
sequences
or an area
HBase is used for storing logs about
customer search history
Performs analytics and target
advertisement for better business insights
HBase stores match details and history of
each match
Uses this data for better prediction

Introduction to HBaseHBase vs RDBMS

HBase vs RDBMS
Does not have a fixed schema (schema-less). Defines only
column families
Has a fixed schema which describes the structure of the
tables
HBase RDBMS

HBase vs RDBMS
column families
tables
Works well with structured and semi-structured data Works well with structured data
HBase RDBMS

HBase vs RDBMS
column families
tables
RDBMS can store only normalized data
HBase RDBMS
It can have de-normalized data

HBase vs RDBMS
column families
tables
It can have de-normalized data
RDBMS can store only normalized data
Built for wide tables that can be scaled horizontally Built for thin tables that is hard to scale
HBase RDBMS

Introduction to HBaseFeatures of HBase

Features of HBase
Scalable
Data can be scaled
across various
nodes as it is stored
in HDFS

Features of HBase
Scalable
Data can be scaled
across various
in HDFS
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure

Features of HBase
Scalable
Data can be scaled
across various
in HDFS
Consistent read and
write
HBase provides
consistent read and
write of data
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure

Features of HBase
Scalable
Data can be scaled
across various
in HDFS
Consistent read and
write
HBase provides
consistent read and
write of data
JAVA API for client
access
Provides easy to use
JAVA API for clients
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure

Features of HBase
Scalable
Data can be scaled
across various
in HDFS
Consistent read and
write
HBase provides
consistent read and
write of data
JAVA API for client
access
Provides easy to use
JAVA API for clients
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure
Block cache and
bloom filters
Supports block
cache and bloom
filters for high
volume query
optimization

Introduction to HBaseHBase Storage

HBase column oriented storage
Column Family 1 Column Family 2 Column Family 3
Rowid
Col 1 Col 2
Row 1
Row 2
Row 3
Col 3 Col 3Col 1 Col 2 Col 3Col 1 Col 2
Row Key Column Family
Column
Qualifiers
Cells

HBase column oriented storage
Personal data Professional dataRowid
name
1
2
3
Row Key Column Family
Column
Qualifiers
Cells
city age salaryempid
Angela
Dwayne
David
Chicago
Boston
Seattle
31
35
29
Data
Analyst
Web
Developer
Big Data
Architect
$70,000
$65,000
$55,000
designation

Introduction to HBaseHBase Architecture

HBase Architectural Components
Region Server
HLog
MemStore
StoreFile StoreFile
HFile HFile
StoreRegion
Region Server
HLog
MemStore
StoreFile StoreFile
HFile HFile
StoreRegion
Region Server
HLog
MemStore
StoreFile StoreFile
HFile HFile
StoreRegion
HDFS
HMaster
HBase Master assigns
regions and load
balancing
ZooKeeper is used for
monitoring
Region server serves data
for read and write

HBase Architectural Components - Regions
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
startKey
endKey endKey
Client
HBase tables are divided horizontally by row key
range into “Regions”
A region contains all rows in the table between the
region’s start key and end key
Regions are assigned to the nodes in the cluster,
called “Region Servers”
These servers serve data for read and write
startKey
get
Region Server 1 Region Server 2

HBase Architectural Components - HMaster
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
ClientRegion assignment, Data Definition Language operation
(create, delete) are handled by HMaster
Assigning and re-assigning regions for recovery or
load balancing and monitoring all servers
HMaster
create, delete, update
table

Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
HMaster
table
Monitors region
servers

Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
HMaster
table
Monitors region
servers
Assigns regions to
region servers
HBase has a distributed environment where HMaster alone is not sufficient to
manage everything. Hence, ZooKeeper was introduced
Assigns regions to
region servers

Inactive
HMaster
HBase Architectural Components - ZooKeeper
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
Zookeeper maintains which servers are alive and
available, and provides server failure notification
Active
HMaster
ZooKeeper
Active HMaster sends a heartbeat signal to ZooKeeper indicating that
its active

Inactive
HMaster
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
Active
HMaster
heartbeat
Region servers send their status to ZooKeeper indicating they are
ready for read and write operation
ZooKeeper

Inactive
HMaster
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
Active
HMaster
heartbeat
Inactive server acts as a backup. If the active HMaster fails, it will come
to rescue
ZooKeeper

How the components work together?
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
HMaster
ZooKeeper
1 master is
active
• Active HMaster selection
• Region Server session
Active HMaster and Region Servers connect with a session to ZooKeeper

Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
HMaster
heartbeat
1 master is
active
Active HMaster and Region Servers connect with a session to ZooKeeper
ZooKeeper

Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
……... ….….
HMaster
heartbeat
1 master is
active
Ephemeral
node
Ephemeral
node
ZooKeeper maintains ephemeral nodes for active sessions via
heartbeats to indicate that region servers are up and running
ZooKeeper

Introduction to HBaseHBase Read or Write

HBase Read or Write
ZooKeeper
.META location is stored in
ZooKeeper
There is a special HBase Catalog table called the META table, which holds the location of the
regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
Client
Region Server Region Server
DataNode DataNode
The client gets the Region Server
that hosts the META table from
ZooKeeper
Request for
Region Server

HBase Read or Write
ZooKeeper
ZooKeeper
Client
DataNode DataNode
Meta table
location
The client gets the Region Server
that hosts the META table from
ZooKeeper
Request for
Region Server

HBase Read or Write
ZooKeeper
Client
Meta Cache
DataNode DataNode
The client will query the .META
server to get the region server
corresponding to the row key it
wants to access
The client caches this information
along with the META table
location
Meta table
location
Request for
Region Server
Get region server for row key
from meta table
ZooKeeper

HBase Read or Write
ZooKeeper
Client
DataNode DataNode
Put row
Meta Cache
It will get the Row from the
corresponding Region Server
Get region server for row key
from meta table
ZooKeeper
Meta table
location
Request for
Region Server
Get row

Introduction to HBaseHBase Meta Table

HBase Meta Table
Meta Table
Row key value
table, key, region region server
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Region 1 Region 2
Region Server
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Region 3 Region 4
Region Server
Special HBase catalog table that
maintains a list of all the Region
Servers in the HBase storage
system
META table is used to find the
Region for a given Table key

Introduction to HBaseHBase Write Mechanism

HBase Write Mechanism
WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
When client issues a put request, it will write the data to the write-ahead log (WAL)1
Write Ahead Log (WAL) is a file
used to store new data that is yet to
be put on permanent storage. It is
used for recovery is the case of
failure.

WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
2
Once data is written to the WAL, it is then copied to the MemStore2
MemStore is the write cache that
stores new data that has not yet
been written to disk. There is one
MemStore per column family per
region.

WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
3 ACK
2
Once the data is placed in MemStore, the client then receives the acknowledgment3

WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
3 ACK
2
4 4
When the MemStore reaches the threshold, it dumps or commits the data into a HFile4
Hfiles store the rows of data as
sorted KeyValue on disk

Introduction to HBaseDemo on HBase

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

Ähnlich wie HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn (20)

Mehr von Simplilearn

Mehr von Simplilearn (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

Hinweis der Redaktion