Bundled with the documentation to the introduction of Apache Hbase to the configuration.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
2. Apache Hbase
Apache Hbase is basically a column oriented non-
relational distributed database management
system that allows to access single rows quickly
from a trillion row table for real time needs.
Why Hbase?
ď‚› Every row is indexed by the row key pairs that
leads to faster scanning across the tables.
ď‚› Access single row quickly from a trillion row table.
ď‚› Faster access to random, real time read/write
access to data.
Rupak Roy
3. Apache Hbase
ď‚› Rows are sorted alphabetically by the row key value.
 Hbase doesn’t follow any strict schema with respect to
the columns.
ď‚› Facebook uses Hbase for social inbox messages over
Cassandera because of simpler consistency model.
But one the drawback that still Hbase lacks is it doesn’t
support complex queries even the structured query
languages.
A data value written to Hbase cannot be altered instead
another version with recent time stamp can be added.
Rupak Roy
4. Apache Hbase
ď‚› Row key & columnar family belonging to
the value are stored together
Rupak Roy
5. Setup
ď‚› Again we can download the Hbase zip from its
official website and then unzip it using tar –zxvf
ď‚› Repeat the same steps for Hbase like we did for
hive.
UPDATE .bashrc with
Export HBASE_HOME = /cloudera/user/HBase
Source ~/.bashrc
Rupak Roy
6. Setup
ď‚› Just like PIG Hbase can also run on Standalone or
Distributed mode.
ď‚› Standalone: is the default mode where it uses the
local file system.
ď‚› Distributed mode: is again further divided into
1)Pseudo-distributed with the properties of
executing in local file system as well as hdfs.
However all daemons run on a single node.
2)Fully-distributed is for a enterprise setup where
the daemons are spread across all nodes.
Therefore it runs only on HDFS
Rupak Roy
7. Hbase Configuration
ď‚› Update hbase-evn.h inside the conf folder of Hbase with
#location of Java package installed
Export JAVA_HOME = /user/lib/jvm/java-8-oracle
#location of the file which has the name of region servers
Export HBASE_REGIONSERVERS
=/cloudera/hbase/conf/region servers
Export HBASE_MANGES_ZK = TRUE
ď‚› Hbase Manage_ZK: indicates whether it should manage
its own instance of zookeeper or not.
Rupak Roy
9. Hbase Configuration
ď‚› Update hbase-site.xml (part-3)
<property>
<name>hbase.zookeper.quorum</name>
<value>localhost</value>
</property>
#Value = indicates the list of nodes on which the zookeeper
server runs. Value = Localhost for Pseudo-distributed mode.
ď‚› Update hbase-site.xml (part-4)
<property>
<name>hbase.zookeeper.property.dataDif</name>
<value>/cloudera/hbase/zk</value>
</property>
#value = directory where the snapshots will be stored.
Rupak Roy
10. Hbase Configuration
ď‚› Update the IP with 127.0.0.1for pesudo-
distributed mode in /etc/hosts using vi editor.
Some of the read and write operators are:
Get: gives the value of a Row Key
PUT: to insert a new entry.
Scan: gives the values for a range of Row Keys.
Delete: to delete a cell value.
Rupak Roy