WSO2's API Vision: Unifying Control, Empowering Developers
Â
HBaseCon 2014-Just the Basics
1. 1
HBase: Just the Basics
Jesse Anderson â Curriculum Developer and Instructor
v2
2. 2
What Is HBase?
Š2014 Cloudera, Inc. All rights reserved.2
⢠NoSQL datastore built on top of HDFS (Hadoop)
⢠An Apache Top Level Project
⢠Handles the various manifestations of Big Data
⢠Based on Googleâs BigTable paper
3. 3
Why Use HBase?
Š2014 Cloudera, Inc. All rights reserved.3
⢠Storing large amounts of data (TB/PB)
⢠High throughput for a large number of requests
⢠Storing unstructured or variable column data
⢠Big Data with random read and writes
4. 4
When to Consider Not Using HBase?
Š2014 Cloudera, Inc. All rights reserved.4
⢠Only use with Big Data problems
⢠Read straight through files
⢠Write all at once or append new files
⢠Not random reads or writes
⢠Access patterns of the data are ill-defined
6. 6
Meet the Daemons
Š2014 Cloudera, Inc. All rights reserved.6
⢠HBase Master
⢠RegionServer
⢠ZooKeeper
⢠HDFS
⢠NameNode/Standby NameNode
⢠DataNode
7. 7
Daemon Locations
Š2014 Cloudera, Inc. All rights reserved.7
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Master Nodes
Slave Nodes
8. 8
Tables and Column Families
Š2014 Cloudera, Inc. All rights reserved.8
Column Family âcontactinfoâ Column Family âprofilephotoâ
Tables are broken into groupings called Column Families.
Group data frequently
accessed together and
compress it Group photos with different settings
9. 9
Rows and Columns
Š2014 Cloudera, Inc. All rights reserved.9
Row key Column Family âcontactinfoâ Column Family âprofilephotoâ
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith image: <smith.jpg>
mrossi fname: Mario lname: Rossi image: <mario.jpg>
Row keys identify a row
No storage penalty for unused columns
Each Column Family can have many columns
10. 10
Regions
Š2014 Cloudera, Inc. All rights reserved.10
Row key Column Family âcontactinfoâ
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith
A table is broken into regions
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Row key Column Family âcontactinfoâ
mrossi fname: Mario lname: Rossi
zstevens fname: Zack lname: Stevens
Regions are served by
RegionServers
11. 11
Client
Write Path
Š2014 Cloudera, Inc. All rights reserved.11
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Write to
RegionServer
12. 12
Client
Read Path
Š2014 Cloudera, Inc. All rights reserved.12
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Read from
RegionServer
14. 14
No SQL Means No SQL
Š2014 Cloudera, Inc. All rights reserved.14
⢠Data is not accessed over SQL
⢠You must:
⢠Create your own connections
⢠Keep track of the type of data in a column
⢠Give each row a key
⢠Access a row by its key
15. 15
Types of Access
Š2014 Cloudera, Inc. All rights reserved.15
⢠Gets
⢠Gets a rowâs data based on the row key
⢠Puts
⢠Upserts a row with data based on the row key
⢠Scans
⢠Finds all matching rows based on the row key
⢠Scan logic can be increased by using filters
16. 16
Gets
Š2014 Cloudera, Inc. All rights reserved.16
1
2
3
4
Get g = new Get(ROW_KEY_BYTES);
Result r= table.get(g);
byte[] byteArray =
r.getValue(COLFAM_BYTS,COLDESC_BYTS);
String columnValue =
Bytes.toString(byteArray);
17. 17
Puts
Š2014 Cloudera, Inc. All rights reserved.17
1
2
3
4
Put p = new Put(ROW_KEY_BYTES);
p.add(COLFAM_BYTES, COLDESC_BYTES,
Bytes.toBytes("value"));
table.put(p);
19. 19
No SQL Means No SQL
Š2014 Cloudera, Inc. All rights reserved.19
⢠Designing schemas for HBase requires an in-depth
knowledge
⢠Schema Design is âdata-centricâ not ârelationship-
centricâ
⢠You design around how data is accessed
⢠Row keys are engineered
21. 21
Row Keys
Š2014 Cloudera, Inc. All rights reserved.21
⢠A row key is more than the glue between two tables
⢠Engineering time is spent just on constructing a row
key
⢠Contents of a row key vary by access pattern
⢠Often made up of several pieces of data:
<group_id><email>
22. 22
Schema Design
Š2014 Cloudera, Inc. All rights reserved.22
⢠Schema design does not start in an ERD
⢠Access pattern must be known and ascertained
⢠Denormalize to improve performance
⢠Fewer, bigger tables
23. 23 Š2014 Cloudera, Inc. All rights reserved.
Jesse Anderson
@jessetanderson