2. HBase - The Hadoop Database
• Based on Google’s BigTable (OSDI’06)
• Runs on top of Hadoop but provides real time
read/write access
• Distributed Column Oriented Database
3. HBase Strengths
• Can scale to billions of rows X millions of
columns
• Relatively cheap & easy to scale
• Random real time access read/write access to
very large data
• Support for update, delete
4. Who is using it
• StumpleUpon/ su.pr
– Uses Hbase as a realtime data storage and analytics platform
• Twitter
– Distributed read/write backup of all mySQL instances. Powers
“people search”.
• Powerset (Now part of MS)
• Adobe
• Yahoo
• Ning
• Meetup
• More at http://wiki.apache.org/hadoop/Hbase/PoweredBy
5. Key features
• Column Oriented store
– Table costs only for the data stored
– NULLs in rows are free
• Rows stored in sorted order
• Can scale to Petabytes (At Google)
6. Comparing to RDBMS
• No Joins
• No Query engine
• No transactions
• No column typing
• No SQL, No ODBC/JDBC (Hbql is there now)
7. Data Model - Tables
• Tables consisting of rows and columns
• Table cells are versioned (by timestamp)
• Tables are sorted by row keys
• Table access is via primary key
• Row updates lock the row no matter how
many columns are involved
8. Column Families
• Row’s columns are grouped into families
• Column family members identified by a
common ‘printable’ prefix
• Column family should be predefined
– but column family members can be added
dynamically
– member name can be bytes
• All column family members are collocated on
disk
9.
10.
11. Server Architecture
• Similar to HDFS
– HbaseMaster ~ NameNode
– RegionServer ~ DataNode
• HBase stores state via the Hadoop FS API
• Can persist to :
– Local
– Amazon S3
– HDFS (Default)
12. HBaseMaster
What it does:
• Bootstrapping a new instance
• Assignment and handling RegionServer problems
– Each region from every table is assigned to a RegionServer
• When machines fail, move regions
• When regions split, move regions to balance
What it does NOT do:
– Handle write requests (Not a DB Master)
– Handle location finding requests (handled by RegionServer)
13. RegionServer
• Carry the regions
• Handle client read/write requests
• Manage region splits (inform the Master)
14. Regions
• Horizontal Partitioning
• Every region has a subset of the table’s rows
• Region identified as
– [table, first row(+), last row(-)]
• Table starts on a single region
• Splits into two equal sized regions as the
original region grows bigger and so on..
15. Zookeeper
• Master election and server availability
• Cluster management
– Assignment transaction state management
• Client contacts ZooKeeper to bootstrap
connection to the Hbase cluster
• Region key ranges, region server addresses
• Guarantees consistency of data across clients
16. Workflow (Client connecting first time)
• Client ZooKeeper (returns –ROOT- )
• Client -ROOT- (returns .META.)
• Client .META. (returns RegionServer)
• To avoid 3-lookups everytime, client caches
this info.
– Recache on fault
17. Write/Read Operation
• Write request from Client RegionServer
Commit log (on HDFS), memstore
• Flush to filesystem when memstore fills
• Read request from Client RegionServer
Lookup the memstore if available
If not, lookup flush files (reverse chrono. Order)
18. Integration
• Java HBase Client API
• High performance Thrift gateway
• A REST-ful Web service gateway (Stargate)
– Supports XML, binary dat encoding options
• Cascading, Hive and Pig integration
• HBase shell (jruby)
• TableInput/TableOutputFormat for MR
20. Alternatives to HBase
• Cassandra (From Facebook)
– Based on Amazon’s Dynamo
– No Master-slave but P2P
– Tunable: Consistency Vs Latency
• Yahoo’s PNUTS
– Not Open source
– Works well for multi DC/geographical disbursed servers
21. References
• Hadoop – The Definitive Guide
• Cloudera website
• http://wiki.hbase.apache.org
• Lars George,
– http://www.larsgeorge.com/2009/10/hbase-architecture-
101-storage.html
• Comparing Hbase, Cassandra and PNUTS
– http://blog.amandeepkhurana.com/2010/05/comparing-
pnuts-hbase-and-cassandra.html
• ACID compliance of Hbase -
http://hbase.apache.org/docs/r0.89.20100621/acid-
semantics.html
Hinweis der Redaktion
Some are also contributors
Introduce Regions from Tables.
-ROOT- Stores location of the .META. table regions.META. Stores the location of all user regionsEntries have keys as regionName and made up as [tableName, start row, timestamp, hash(1,2,3)]
Writes arriving at a regionserver are first appended to a commit log and then are added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem.The commit log is hosted on HDFS, so it remains available through a regionserver crash.Reading, the region’s memstore is consulted first. If sufficient versions are found read- ingmemstore alone, we return. Otherwise, flush files are consulted in order, from newest to oldest until versions sufficient to satisfy the query are found, or until we run out of flush files.Compaction – merges multiple flush files into one, removes > max. versions and delete expired cells
Add content one row at a time using Htable.put(Put)Create an instance of Put objectSpecify value, target column and optional TimestampRead using the get method Htable.get(Get)Broad : Get all in a rowNarrow : Return only a single cell valueScan table using Scan classCursor like accessHtable.getScanner(Scan)Invoke next on the returned objectGet, Scan return a Result object which is a List of KeyValue objectsDelete using Htable.delete(Delete) Remove individual cells or entire families etc.Put, Get, Delete lock the row.
Cassandra weak consistency comes in the form of eventual consistency which means the database eventually reaches a consistent state. As the data is replicated, the latest version of something is sitting on some node in the cluster, but older versions are still out there on other nodes, but eventually all nodes will see the latest version.The CAP theorem (Brewer) states that you have to pick two of Consistency, Availability, Partition tolerance: You can't have the three at the same time and get an acceptable latency.