SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
HBase
Introduction
● HBase is a distributed column-oriented database built on top of the Hadoop
file system.
● It is an open-source project and is horizontally scalable.
● HBase is a data model that is similar to Google’s big table designed to provide
quick random access to huge amounts of structured data.
● It is a part of the Hadoop ecosystem that provides random real-time read/write
access to data in the Hadoop File System.
Introduction
HBase Architecture and Data Model
● An HBase table consists of rows and columns and has a third dimension,
version, to maintain the different values of a row and column intersection over
time
● Example : customer doing online shopping
● For this type of application, real-time access is required
● Thus, the use of the batch processing of Pig, Hive, or Hadoop's MapReduce is
not a reasonable implementation approach
● HBase stores the data and provides real-time read and write access
HBase Architecture and Data Model (cont’d)
● HBase uses a key/value structure to store the contents of an HBase table
● (row key, column family, column, timestamp) -> value
● Each value is the data to be stored at the intersection of the row, column, and
version
● Each key consists of the following elements
○ Row length
○ Row (sometimes called the row key)
○ Column family length
○ Column family
○ Column qualifier
○ Version
○ Key type
HBase Architecture and Data Model (cont’d)
● Table is a collection of rows.
● Row is a collection of column families.
● Column family is a collection of columns.
● Column is a collection of key value pairs.
HBase Architecture and Data Model (cont’d)
● Create HBase Table
$ hbase shell
hbase> create 'my_table', 'cfl', 'cf2',
{SPLITS =>['250000', '500000', '750000']}
● my_ table stored in HBase
$ hadoop fs -ls -R /hbase
● add data to the table
hbase> put ‘my_table’, '000700', 'cfl:cql', 'data1'
hbase> put ‘my_table’, '000700', 'cfl:cq2', 'data2'
hbase> put ‘my_table’, '000700', 'cf2:cq3', 'data3'
HBase Architecture and Data Model (cont’d)
● Data retrieved from table
○ hbase> get 'my_table', '000700', 'cf2:cq3'
● Scan function
○ hbase> scan 'my_table', {STARTROW => '000600', STOPROW =>'000800'}
● Delete the oldest entry for column
○ hbase> delete ‘my_table', '000700', 'cf2:cq3', 1393866138714
Use Cases for HBase
● a common use case for a data store such as HBase is to store the results from
a web crawler
○ row com.cnn.www corresponds to a website URl, www.cnn.com
○ A column family, called anchor, is defined to capture the website URLs that provide links to the
row's website
○ anchoring website URLs are used as the column qualifiers
○ Additional websites that provide links to www. cnn. com appear as additional column qualifiers.
○ The value stored in the cell is simply the text on the website that provides the link.
○ hbase> get 'web_table', 'com.cnn.www', {VERSIONS=> 2}
Use Cases for HBase (cont’d)
● This use case illustrates several important points
1. it is possible to get to a billion rows and millions of columns in an HBase
table.
2. row needs to be defined based on how the data will be accessed
3. it may be advantageous to use the column qualifiers to actually store the
data of interest, rather than simply storing it in a cell
● A second use case is the storage and search access of messages.
○ The row was defined to be the user I D.
○ The column qualifier was set to a word that appears in the message.
○ The version was the message I D.
○ The cell's content was the offset of the word in the message.
● This implementation allowed Facebook to provide auto-complete capability in
the search box and to return the results of the query quickly
Use Cases for HBase (cont’d)
● Power of being able to add new column by adding new column qualifiers, on
demand.
● RDBMS implementation, new columns require the involvement of a DBA to
alter the structure of the table.
Other HBase Usage Considerations
● Java API
○ The shell commands are useful for exploring the data in an HBase environment and illustrating
their use
○ in a production environment, the HBase Java API could be used to program the desired
operations and the conditions in which to execute the operations.
● Column family and column qualifier names
○ keep the name lengths of the column families and column qualifiers as short as possible
○ column family name and the column qualifier are stored as part of the key of each key/value
pair.
○ three copies of each HDFS block are replicated across the Hadoop cluster, which triples the
storage requirement.
Other HBase Usage Considerations (cont’d)
● Defining rows
○ definition of the row is the main mechanism to perform read/write operations on an HBase table
○ The row needs to be constructed in such a way that the requested columns can be easily and
quickly retrieved.
● Avoid creating sequential rows
○ all the new users and their data are being written to just one region, which is not distributing the
workload across the cluster as intended
○ randomly assign a prefix to the sequential number.
Other HBase Usage Considerations (cont’d)
● Versioning control
○ control how long a version of a cell's contents will exist
○ TimeTolive (TTL) after which any older versions will be deleted
○ minimum and maximum number of versions to maintain.
● Zookeeper
○ HBase uses Apache Zookeeper to coordinate and manage the various regions running on the
distributed cluster
○ Zookeeper is "a centralized service for maintaining configuration information, naming, providing
distributed synchronization, and providing group services.
○ Instead of building its own coordination service, HBase uses Zookeeper

Weitere ähnliche Inhalte

Was ist angesagt?

Steam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesSteam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesinovia
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.TigerGraph
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquetManpreet Khurana
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)IDERA Software
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibJens Fisseler, Dr.
 
7. backup & restore data
7. backup & restore data7. backup & restore data
7. backup & restore dataTrần Thanh
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentationvanjakom
 
Pivotal greenplum external tables
Pivotal greenplum external tablesPivotal greenplum external tables
Pivotal greenplum external tablesRajesh Goyal
 

Was ist angesagt? (20)

R tutorial
R tutorialR tutorial
R tutorial
 
Chapter13
Chapter13Chapter13
Chapter13
 
Steam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesSteam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexes
 
Hbase
HbaseHbase
Hbase
 
Big Data - How important it is
Big Data - How important it isBig Data - How important it is
Big Data - How important it is
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquet
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)
 
Hbase
HbaseHbase
Hbase
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlib
 
7. backup & restore data
7. backup & restore data7. backup & restore data
7. backup & restore data
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
8.replication
8.replication8.replication
8.replication
 
Assignment#12
Assignment#12Assignment#12
Assignment#12
 
Pivotal greenplum external tables
Pivotal greenplum external tablesPivotal greenplum external tables
Pivotal greenplum external tables
 
Sql data shrink steps
Sql data shrink stepsSql data shrink steps
Sql data shrink steps
 
Tx well data final
Tx well data finalTx well data final
Tx well data final
 

Ähnlich wie HBase Introduction and Architecture

Ähnlich wie HBase Introduction and Architecture (20)

Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable
Bigtable Bigtable
Bigtable
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorial
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
01 hbase
01 hbase01 hbase
01 hbase
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 

Mehr von Vishnupriya T H

Computer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displaysComputer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displaysVishnupriya T H
 
Security challenges in IoT
Security challenges in IoTSecurity challenges in IoT
Security challenges in IoTVishnupriya T H
 
Security auditing architecture
Security auditing architectureSecurity auditing architecture
Security auditing architectureVishnupriya T H
 
A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...Vishnupriya T H
 
Sampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determinationSampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determinationVishnupriya T H
 
Halstead's software science - ananalytical technique
Halstead's software science - ananalytical techniqueHalstead's software science - ananalytical technique
Halstead's software science - ananalytical techniqueVishnupriya T H
 
Introduction to Triz (TIPS)
Introduction to Triz (TIPS)Introduction to Triz (TIPS)
Introduction to Triz (TIPS)Vishnupriya T H
 

Mehr von Vishnupriya T H (7)

Computer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displaysComputer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displays
 
Security challenges in IoT
Security challenges in IoTSecurity challenges in IoT
Security challenges in IoT
 
Security auditing architecture
Security auditing architectureSecurity auditing architecture
Security auditing architecture
 
A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...
 
Sampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determinationSampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determination
 
Halstead's software science - ananalytical technique
Halstead's software science - ananalytical techniqueHalstead's software science - ananalytical technique
Halstead's software science - ananalytical technique
 
Introduction to Triz (TIPS)
Introduction to Triz (TIPS)Introduction to Triz (TIPS)
Introduction to Triz (TIPS)
 

Kürzlich hochgeladen

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

HBase Introduction and Architecture

  • 2. Introduction ● HBase is a distributed column-oriented database built on top of the Hadoop file system. ● It is an open-source project and is horizontally scalable. ● HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. ● It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
  • 4. HBase Architecture and Data Model ● An HBase table consists of rows and columns and has a third dimension, version, to maintain the different values of a row and column intersection over time ● Example : customer doing online shopping ● For this type of application, real-time access is required ● Thus, the use of the batch processing of Pig, Hive, or Hadoop's MapReduce is not a reasonable implementation approach ● HBase stores the data and provides real-time read and write access
  • 5. HBase Architecture and Data Model (cont’d) ● HBase uses a key/value structure to store the contents of an HBase table ● (row key, column family, column, timestamp) -> value ● Each value is the data to be stored at the intersection of the row, column, and version ● Each key consists of the following elements ○ Row length ○ Row (sometimes called the row key) ○ Column family length ○ Column family ○ Column qualifier ○ Version ○ Key type
  • 6. HBase Architecture and Data Model (cont’d) ● Table is a collection of rows. ● Row is a collection of column families. ● Column family is a collection of columns. ● Column is a collection of key value pairs.
  • 7. HBase Architecture and Data Model (cont’d) ● Create HBase Table $ hbase shell hbase> create 'my_table', 'cfl', 'cf2', {SPLITS =>['250000', '500000', '750000']} ● my_ table stored in HBase $ hadoop fs -ls -R /hbase ● add data to the table hbase> put ‘my_table’, '000700', 'cfl:cql', 'data1' hbase> put ‘my_table’, '000700', 'cfl:cq2', 'data2' hbase> put ‘my_table’, '000700', 'cf2:cq3', 'data3'
  • 8. HBase Architecture and Data Model (cont’d) ● Data retrieved from table ○ hbase> get 'my_table', '000700', 'cf2:cq3' ● Scan function ○ hbase> scan 'my_table', {STARTROW => '000600', STOPROW =>'000800'} ● Delete the oldest entry for column ○ hbase> delete ‘my_table', '000700', 'cf2:cq3', 1393866138714
  • 9. Use Cases for HBase ● a common use case for a data store such as HBase is to store the results from a web crawler ○ row com.cnn.www corresponds to a website URl, www.cnn.com ○ A column family, called anchor, is defined to capture the website URLs that provide links to the row's website ○ anchoring website URLs are used as the column qualifiers ○ Additional websites that provide links to www. cnn. com appear as additional column qualifiers. ○ The value stored in the cell is simply the text on the website that provides the link. ○ hbase> get 'web_table', 'com.cnn.www', {VERSIONS=> 2}
  • 10. Use Cases for HBase (cont’d) ● This use case illustrates several important points 1. it is possible to get to a billion rows and millions of columns in an HBase table. 2. row needs to be defined based on how the data will be accessed 3. it may be advantageous to use the column qualifiers to actually store the data of interest, rather than simply storing it in a cell ● A second use case is the storage and search access of messages. ○ The row was defined to be the user I D. ○ The column qualifier was set to a word that appears in the message. ○ The version was the message I D. ○ The cell's content was the offset of the word in the message. ● This implementation allowed Facebook to provide auto-complete capability in the search box and to return the results of the query quickly
  • 11. Use Cases for HBase (cont’d) ● Power of being able to add new column by adding new column qualifiers, on demand. ● RDBMS implementation, new columns require the involvement of a DBA to alter the structure of the table.
  • 12. Other HBase Usage Considerations ● Java API ○ The shell commands are useful for exploring the data in an HBase environment and illustrating their use ○ in a production environment, the HBase Java API could be used to program the desired operations and the conditions in which to execute the operations. ● Column family and column qualifier names ○ keep the name lengths of the column families and column qualifiers as short as possible ○ column family name and the column qualifier are stored as part of the key of each key/value pair. ○ three copies of each HDFS block are replicated across the Hadoop cluster, which triples the storage requirement.
  • 13. Other HBase Usage Considerations (cont’d) ● Defining rows ○ definition of the row is the main mechanism to perform read/write operations on an HBase table ○ The row needs to be constructed in such a way that the requested columns can be easily and quickly retrieved. ● Avoid creating sequential rows ○ all the new users and their data are being written to just one region, which is not distributing the workload across the cluster as intended ○ randomly assign a prefix to the sequential number.
  • 14. Other HBase Usage Considerations (cont’d) ● Versioning control ○ control how long a version of a cell's contents will exist ○ TimeTolive (TTL) after which any older versions will be deleted ○ minimum and maximum number of versions to maintain. ● Zookeeper ○ HBase uses Apache Zookeeper to coordinate and manage the various regions running on the distributed cluster ○ Zookeeper is "a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. ○ Instead of building its own coordination service, HBase uses Zookeeper