2. Contents
• Objective
• What is HBase?
• Why HBase?
• Features of HBase
• HBase architecture(overview)
• HBase architecture(Write-ahead-Log)
• HBase architecture(Hlog)
• HBase architecture(HFile)
• HBase Client
• Zookeeper
• Master
• HBase Region server
• HBase tables and regions
• HBase tables
• HBase Examples
• HBase users
• Conclusion
3. Objective
• To study and understand one of the growing
technologies of cloud computing and clone of
big table i.e “HBase”.
4. What is HBase?
• Open source project.
• Hbase ia a Hadoop data base.
• It is a distributed,large scale data store.
• Efficient at random reads/writes.
• Initially modeled after google’s big table.
5. Why HBase?
• Datasets are reaching petabytes.
• Need for random access and batch processing.
• Traditional databases are expensive to scale
and difficult to manage.
• Commodity hardware is cheap and powerful.
6. Features of HBase
• It supports unstructured and semistructured
data.
• It has built in version management.
• Fast key based lookups.
• It stores null values for free.
11. HBase Client
• The HBase client is responsible for finding
RegionServers that are serving the particular row
range of interest.
• It does this by querying the .META. and -ROOT-
catalog tables in Zookeeper.
• After locating the required region(s), the client
directly contacts the RegionServer serving that
region
12. Zookeeper
• Zookeeper serves as a distributed co-ordinator
service.
• It bootstraps and co-ordinates clusters.
• Manages Master election and server availability
• The catalog tables -ROOT- and .META. are
maintained in Zookeeper.
• -ROOT- keeps track of where the .META. table is.
• The .META. table keeps a list of all regions in the
system with their corresponding region server
assignments .
13. Master
• The Master server is responsible for
monitoring all RegionServer instances in the
cluster, and is the interface for all metadata
changes.
• If the active Master shuts down then the
remaining Masters jostle to take over the
Master role in the Zookeeper.
15. HBase Region Server
• It is responsible for serving and managing
regions.
• It supports both data-oriented and region-
maintenance methods.
• data(get, put, delete, next, etc.)
• Region (splitRegion, compactRegion, etc.)
interfaces.
16. HBase Tables and Regions
• HBase table is made up of roughly equal sized
regions.
• Each region may live on a different node and
is made up of several HDFS files and blocks,
each of which is replicated by Hadoop.
• Region is specified by its startKey and endKey
17. HBase Tables
• Tables are sorted by Row in lexicographical order
• Table schema only defines its column families
i)Each family consists of any number of columns
ii)Each column consists of any number of versions
iii)Columns only exist when inserted, NULLs are free
iv)Columns within a family are sorted and
stored together
v)Everything except table names are byte[]
(Table, Row, Family:Column, Timestamp) -> Value
18. Example
Let us take an example of a user and his
friendship details.
In RDBMS:
21. Conclusion
• HBase is one of the most successful ,growing
technologies of cloud computing.
• It have opened the window for further research
in many field.
• whenever we need scalability then the
propeties and the flexibility of HBase can
relieve us from the headaches associated with
scaling an RDBMS.