Hbasepreso 111116185419-phpapp02

Introduction to HBase
Gokuldas K Pillai
@gokool

HBase - The Hadoop Database
• Based on Google’s BigTable (OSDI’06)
• Runs on top of Hadoop but provides real time
read/write access
• Distributed Column Oriented Database

HBase Strengths
• Can scale to billions of rows X millions of
columns
• Relatively cheap & easy to scale
• Random real time access read/write access to
very large data
• Support for update, delete

Who is using it
• StumpleUpon/ su.pr
– Uses Hbase as a realtime data storage and analytics platform
• Twitter
– Distributed read/write backup of all mySQL instances. Powers
“people search”.
• Powerset (Now part of MS)
• Adobe
• Yahoo
• Ning
• Meetup
• More at http://wiki.apache.org/hadoop/Hbase/PoweredBy

Key features
• Column Oriented store
– Table costs only for the data stored
– NULLs in rows are free
• Rows stored in sorted order
• Can scale to Petabytes (At Google)

Comparing to RDBMS
• No Joins
• No Query engine
• No transactions
• No column typing
• No SQL, No ODBC/JDBC (Hbql is there now)

Data Model - Tables
• Tables consisting of rows and columns
• Table cells are versioned (by timestamp)
• Tables are sorted by row keys
• Table access is via primary key
• Row updates lock the row no matter how
many columns are involved

Column Families
• Row’s columns are grouped into families
• Column family members identified by a
common ‘printable’ prefix
• Column family should be predefined
– but column family members can be added
dynamically
– member name can be bytes
• All column family members are collocated on
disk

Server Architecture
• Similar to HDFS
– HbaseMaster ~ NameNode
– RegionServer ~ DataNode
• HBase stores state via the Hadoop FS API
• Can persist to :
– Local
– Amazon S3
– HDFS (Default)

HBaseMaster
What it does:
• Bootstrapping a new instance
• Assignment and handling RegionServer problems
– Each region from every table is assigned to a RegionServer
• When machines fail, move regions
• When regions split, move regions to balance
What it does NOT do:
– Handle write requests (Not a DB Master)
– Handle location finding requests (handled by RegionServer)

RegionServer
• Carry the regions
• Handle client read/write requests
• Manage region splits (inform the Master)

Regions
• Horizontal Partitioning
• Every region has a subset of the table’s rows
• Region identified as
– [table, first row(+), last row(-)]
• Table starts on a single region
• Splits into two equal sized regions as the
original region grows bigger and so on..

Zookeeper
• Master election and server availability
• Cluster management
– Assignment transaction state management
• Client contacts ZooKeeper to bootstrap
connection to the Hbase cluster
• Region key ranges, region server addresses
• Guarantees consistency of data across clients

Workflow (Client connecting first time)
• Client  ZooKeeper (returns –ROOT- )
• Client  -ROOT- (returns .META.)
• Client  .META. (returns RegionServer)
• To avoid 3-lookups everytime, client caches
this info.
– Recache on fault

Write/Read Operation
• Write request from Client  RegionServer
 Commit log (on HDFS), memstore
• Flush to filesystem when memstore fills
• Read request from Client  RegionServer
Lookup the memstore if available
If not, lookup flush files (reverse chrono. Order)

Integration
• Java HBase Client API
• High performance Thrift gateway
• A REST-ful Web service gateway (Stargate)
– Supports XML, binary dat encoding options
• Cascading, Hive and Pig integration
• HBase shell (jruby)
• TableInput/TableOutputFormat for MR

Main Classes
• HBaseAdmin
– Create table, drop table, list and alter table
• HTable
– Put
– Get
– Scan

Alternatives to HBase
• Cassandra (From Facebook)
– Based on Amazon’s Dynamo
– No Master-slave but P2P
– Tunable: Consistency Vs Latency
• Yahoo’s PNUTS
– Not Open source
– Works well for multi DC/geographical disbursed servers

References
• Hadoop – The Definitive Guide
• Cloudera website
• http://wiki.hbase.apache.org
• Lars George,
– http://www.larsgeorge.com/2009/10/hbase-architecture-
101-storage.html
• Comparing Hbase, Cassandra and PNUTS
– http://blog.amandeepkhurana.com/2010/05/comparing-
pnuts-hbase-and-cassandra.html
• ACID compliance of Hbase -
http://hbase.apache.org/docs/r0.89.20100621/acid-
semantics.html

Hbasepreso 111116185419-phpapp02

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Hbasepreso 111116185419-phpapp02

Ähnlich wie Hbasepreso 111116185419-phpapp02 (20)

Hbasepreso 111116185419-phpapp02

Hinweis der Redaktion