Building apps with HBase - Data Days Texas March 2013
1. Building
applica2ons
with
HBase
Headline
Goes
Here
Amandeep
Khurana
|
Solu7ons
Architect
Speaker
Name
or
Subhead
Goes
Here
Data
Days
Texas,
Aus2n,
March
2013
1
Monday, April 1, 13
2. About
me
• Solu7ons
Architect,
Cloudera
Inc
• Amazon
Web
Services
• Interested
in
large
scale
distributed
systems
• Co-‐author,
HBase
In
Ac7on
• TwiGer:
amansk
2
Monday, April 1, 13
3. I’m
a
wanna-‐be
(author)
www.hbaseinac7on.com
Nick Dimiduk
Amandeep Khurana
MANNING
3
Monday, April 1, 13
4. About
the
workshop
• Install
HBase
• Basic
interac7ons
with
the
shell
• Use
of
HBase
Java
API
• Data
model
• HBase
and
Hadoop
integra7on
4
Monday, April 1, 13
5. Get
HBase
and
tutorial
instruc7ons
• Get
it
here
-‐>
hGp://hbase.apache.org/
• Downloads
-‐>
Choose
a
mirror
-‐>
0.94.1
• Download
the
file
hbase-‐0.94.1.tar.gz
• Get
the
tutorial
handout
-‐>
hGp://people.apache.org/~ak/ddtx13
5
Monday, April 1, 13
6. What’s
Big
Data?
...
except
another
marke7ng
term
6
Monday, April 1, 13
8. Data
has
changed
END-USER
APPLICATIONS
THE INTERNET
DATA GROWTH
MOBILE DEVICES
SOPHISTICATED
MACHINES
UNSTRUCTURED DATA – 90%
STRUCTURED DATA – 10%
1980 2012
8
Monday, April 1, 13
10. Hadoop
-‐
Big
Data
poster
child
HDFS, MAPREDUCE
10
Monday, April 1, 13
11. Hadoop
-‐
Big
Data
poster
child
File System Mount UI Framework Machine Learning
FUSE-DFS HUE APACHE MAHOUT
Workflow Scheduling Metadata
APACHE OOZIE APACHE OOZIE APACHE HIVE
Languages / Compilers
APACHE PIG, APACHE HIVE, APACHE MAHOUT
Fast
Data Integration
Read/Write Access
APACHE FLUME, APACHE
SQOOP APACHE HBASE
HDFS, MAPREDUCE
APACHE ZOOKEEPER
Coordination
10
Monday, April 1, 13
12. What
is
HBase?
• Scalable,
distributed
data
store
• Sorted
map
of
maps
• Open
source
avatar
of
Google’s
Bigtable
• Sparse
• Mul7
dimensional
• Tightly
integrated
with
Hadoop
• Not
a
RDBMS
11
Monday, April 1, 13
13. So,
what’s
the
big
deal?
• Give
up
system
complexity
and
sophis7cated
features
for
ease
of
scale
and
manageability
• Simple
system
!=
easy
to
build
applica7ons
on
top
• Goal:
predictable
read
and
write
performance
at
scale
• Effec7ve
API
usage
• Op7mal
schema
design
• Understand
internals
and
leverage
at
scale
12
Monday, April 1, 13
14. Use
cases
• Canonical
use
case:
storing
crawl
data
and
indices
for
search
Web Search
powered by Bigtable
Crawlers
Internets Crawlers Bigtable
Crawlers
Crawlers
1
2
MapReduce
Indexing the Internet
1 Crawlers constantly scour the Internet for new pages.
Those pages are stored as individual records in Bigtable. 4 3
2 A MapReduce job runs over the entire table, generating Search
search indexes for the Web Search application. Search
Index
Search Web Search
Index
Index
5 You
Searching the Internet
3 The user initiates a Web Search request.
4 The Web Search application queries the Search Indexes
and retries matching documents directly from Bigtable.
5 Search results are presented to the user.
13
Monday, April 1, 13
15. Use
cases
(con7nued)
• Capturing
incremental
data
• Content
serving
• Informa7on
exchange
14
Monday, April 1, 13
16. Ok,
enough
of
high
level
stuff
• Big
data
is
a
buzzword
but
there
is
more
to
it
• New
products,
applica7ons,
types
of
data
• Hadoop
ecosystem
is
the
poster
child
of
big
data
architectures
• Consists
of
several
components
• HBase
is
a
part
of
the
ecosystem
• Scalable
database
• Lots
of
produc7on
deployments
15
Monday, April 1, 13
17. HBase
-‐
GeEng
Started
Installa7on
and
basic
interac7on
16
Monday, April 1, 13
18. Installing
HBase
• Get
it
here
-‐>
hGp://hbase.apache.org/
• Downloads
-‐>
Choose
a
mirror
-‐>
0.94.1
• Download
the
file
hbase-‐0.94.1.tar.gz
• Untar
it
• mkdir
~/ddtx
• cp
hbase-‐0.94.1.tar.gz
~/ddtx
• cd
~/ddtx
• tar
xvf
hbase-‐0.94.1.tar.gz
17
Monday, April 1, 13
19. Basics
• Modes
of
opera2on
• Standalone
• Pseudo-‐distributed
• Fully
distributed
• Start
up
in
standalone
mode
• cd
hbase-‐0.94.1
• bin/start-‐hbase.sh
• Logs
are
in
./logs/
• Check
out
hSp://localhost:60010
in
your
browser
18
Monday, April 1, 13
20. Shell
• Basic
interac7on
tool
• WriGen
in
JRuby
• Uses
the
Java
client
library
• Good
place
to
start
poking
around
19
Monday, April 1, 13
21. Shell
(con7nued)
• bin/hbase
shell
• Create
table
• create
‘users’
,
‘info’
• List
tables
• list
• Describe
table
• describe
‘users’
20
Monday, April 1, 13
22. Shell
(con7nued)
• Put
a
row
•put
‘users’
,
‘u1’,
‘info:name’
,
‘Mr.
Rogers’
• Get
a
row
• get
‘users’
,
‘u1’
• Put
more
• put
‘users’
,
‘u1’
,
‘info:email’
,
‘mail@rogers.com’
• put
‘users’
,
‘u1’
,
‘info:password’
,
‘foobar’
• Get
a
row
•get
‘users’
,
‘u1’
• Scan
table
• scan
‘users’
21
Monday, April 1, 13
23. Java
API
...
can’t
really
build
an
applica7on
just
with
the
shell
22
Monday, April 1, 13
24. Important
terms
• Table
• Consists
of
rows
and
columns
• Row
• Has
a
bunch
of
columns.
• Iden2fied
by
a
rowkey
(primary
key)
• Column
Qualifier
• Dynamic
column
name
• Column
Family
• Column
groups
-‐
logical
and
physical
(Similar
access
paSern)
• Cell
• The
actual
element
that
contains
the
data
for
a
row-‐column
intersec2on
• Version
• Every
cell
has
mul2ple
versions.
23
Monday, April 1, 13
25. Introducing
TwitBase
• TwiSer
clone
built
on
HBase
• Designed
to
scale
• It’s
an
example,
not
a
produc2on
app
• Start
with
users
and
twits
for
now
• Users
• Have
profiles
• Post
twits
• Twits
• Have
text
posted
by
users
• Get
sample
code
at:
hSps://github.com/hbaseinac2on/twitbase
24
Monday, April 1, 13
26. Connec7ng
to
HBase
• Using default confs:
HTableInterface usersTable = new HTable("users");
• Passing custom conf:
Configuration myConf = HBaseConfiguration.create();
myConf.set("parameter_name", "parameter_value");
HTableInterface usersTable = new HTable(myConf, "users");
• Connection pooling
HTablePool pool = new HTablePool();
HTableInterface usersTable = pool.getTable("users");
// work with the table
usersTable.close();
25
Monday, April 1, 13
27. Inser7ng
data
• Put
call
on
HTable
Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(Bytes.toBytes("info"),
Bytes.toBytes("name"),
Bytes.toBytes("Mark Twain"));
p.add(Bytes.toBytes("info"),
Bytes.toBytes("email"),
Bytes.toBytes("samuel@clemens.org"));
p.add(Bytes.toBytes("info"),
Bytes.toBytes("password"),
Bytes.toBytes("Langhorne"));
usersTable.put(p);
26
Monday, April 1, 13
28. Data
coordinates
• Row
is
addressed
using
rowkey
• Cell
is
addressed
using
[rowkey
+
family
+
qualifier]
27
Monday, April 1, 13
29. Upda7ng
data
• Update
==
Write
Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(Bytes.toBytes("info"),
Bytes.toBytes("password"),
Bytes.toBytes("abc123"));
usersTable.put(p);
28
Monday, April 1, 13
30. Write
path
WAL
Write goes to the write-ahead log (WAL)
Memstore as well as the in memory write buffer called
the Memstore.
Client HFile Client's don't interact directly with the
underlying HFiles during writes.
HFile
Memstore
Memstore gets flushed to disk to
New HFile
form a new HFile
when filled up with writes.
HFile
HFile
29
Monday, April 1, 13
31. Reading
data
• Get
the
en2re
row
Get g = new Get(Bytes.toBytes("TheRealMT"));
Result r = usersTable.get(g);
• Get
selected
columns
Get g = new Get(Bytes.toBytes("TheRealMT"));
g.addColumn(Bytes.toBytes("info"),
Bytes.toBytes("password"));
Result r = usersTable.get(g);
• Extract
the
value
out
of
the
Result
object
byte[] b = r.getValue(Bytes.toBytes("info"),
Bytes.toBytes("email"));
String email = Bytes.toString(b);
30
Monday, April 1, 13
32. Reading
data
(con7nued)
• Scan
Scan s = new Scan(startRow, stopRow);
ResultsScanner rs = twits.getScanner(s);
for(Result r : rs) {
// iterate over scanner results
}
• Single threaded iteration over defined set of rows. Could be the entire
table too.
31
Monday, April 1, 13
33. What
happens
when
you
read
data?
BlockCache
Memstore
Client HFile
HFile
32
Monday, April 1, 13
34. Dele7ng
data
• Another
simple
API
call
• Delete
en7re
row
Delete d = new Delete(Bytes.toBytes("TheRealMT"));
usersTable.delete(d);
• Delete
selected
columns
Delete d = new Delete(Bytes.toBytes("TheRealMT"));
d.deleteColumns(Bytes.toBytes("info"),
Bytes.toBytes("email"));
usersTable.delete(d);
33
Monday, April 1, 13
35. Delete
internals
• Tombstone
markers
• Dele7on
only
happens
during
major
compac7ons
• Compac7ons
• Minor
• Major
34
Monday, April 1, 13
36. Compac7ons
Memstore Memstore
HFile Two or more HFiles get combined
Compacted to form a single HFile during
HFile a compaction.
HFile
HFile HFile
35
Monday, April 1, 13
37. Hands-‐on:
Java
client
to
HBase
• Create
a
table
from
the
HBase
shell
• create
‘users’
,
‘info’
• Write
a
program
to
do
the
following
• Add
10
users
to
the
‘users’
table.
• userid
as
rowkey
• user’s
name
in
the
column
‘info:name’
• userid
could
just
be
user
+
serial
number.
eg:
user1,
user2,
user3
• Get
a
user
from
the
‘users’
table
and
display
on
screen
• Scan
the
en2re
‘users’
table
and
display
on
screen
36
Monday, April 1, 13
38. DAO
• Like
any
other
applica7on
• Makes
data
access
management
easy
• Cleaner
code
• Table
info
at
a
single
place
• Op7miza7ons
like
connec7on
pooling
etc
• Easier
management
of
database
configs
37
Monday, April 1, 13
39. Admin
HBaseAdmin Table
Opera7ons
• Administra7ve
API
to
the
cluster. • createTable()
• Provides
an
interface
to
manage
• deleteTable()
HBase
database
tables
metadata
• addColumn()
and
general
administra7ve
func7ons. • modifyColumn()
• deleteColumn()
• listTables()
Monday, April 1, 13
40. Data
Model
It’s
not
a
rela7onal
database
system
39
Monday, April 1, 13
41. HBase
is
...
• Column
family
oriented
database
• Column
family
oriented
• Tables
consis2ng
of
rows
and
columns
• Persisted
Map
• Sparse
• Mul2
dimensional
• Sorted
• Indexed
by
rowkey,
column
and
2mestamp
• Key
Value
store
• [rowkey,
col
family,
col
qualifier,
2mestamp]
-‐>
cell
value
40
Monday, April 1, 13
42. HBase
is
not
...
• A
rela7onal
database
• No
SQL
query
language
• No
joins
• No
secondary
indexing
• No
transac7ons
41
Monday, April 1, 13
43. Tabular
representa7on
2
Column Family - Info
1 3
Rowkey name email password
GrandpaD Mark Twain samuel@clemens.org abc123
HMS_Surprise Patrick O'Brien aubrey@sea.com abc123
The table is lexicographically
sorted on the rowkeys
SirDoyle Fyodor Dostoyevsky fyodor@brothers.net abc123
TheRealMT Sir Arthur Conan Doyle art@TheQueensMen.co.uk Langhorne
abc123
4
Cells ts1=1329088321289 ts2=1329088818321
Each cell has multiple
The coordinates used to identify data in an HBase table are: versions,
(1) rowkey, (2) column family, (3) column qualifier, (4) version typically represented by
the timestamp
of when they were
inserted into the table
(ts2>ts1)
42
Monday, April 1, 13
44. Key-‐Value
store
Keys Values
[TheRealMT, info, password, 1329088818321] abc123
[TheRealMT, info, password, 1329088321289] Langhorne
A single KeyValue instance
43
Monday, April 1, 13
45. Key-‐Value
store
[TheRealMT, info, password, 1329088818321] abc123
1 Start with coordinates of full precision
{
1329088818321 : "abc123",
[TheRealMT, info, password]
1329088321289 : "Langhorne"
}
2 Drop version and you're left with a map of version to values
Keys {
"email" : {
1329088321289 : "samuel@clemens.org"
},
"name" : {
[TheRealMT, info] 1329088321289 : "Mark Twain"
},
Values
"password" : {
1329088818321 : "abc123",
1329088321289 : "Langhorne"
}
}
3 Omit qualifier and you have a map of qualifiers to the previous maps
{
"info" : {
"email" : {
1329088321289 : "samuel@clemens.org"
},
"name" : {
1329088321289 : "Mark Twain"
[TheRealMT]
},
"password" : {
1329088818321 : "abc123",
1329088321289 : "Langhorne"
}
}
}
4 Finally, drop the column family and you have a row, a map of maps
44
Monday, April 1, 13
47. HFiles
and
physical
data
model
• HFiles
are
• Immutable
• Sorted
on
rowkey
+
qualifier
+
7mestamp
• In
the
context
of
a
column
family
per
region
"TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org"
"TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain"
"TheRealMT" , "info" , "password" , 1329088818321 , "abc123",
"TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne"
HFile for the info column family in the users table
46
Monday, April 1, 13
48. HBase
in
distributed
mode
...
what
really
goes
on
under
the
hood
47
Monday, April 1, 13
49. Distributed
mode
• Table
is
split
into
Regions
T1R1
00001 John 415-111-1234
00002 Paul 408-432-9922
00001 John 415-111-1234 00003 Ron 415-993-2124
00001 John 415-111-1234 00002 Paul 408-432-9922 00004 Bob 818-243-9988
00002 Paul 408-432-9922 00003 Ron 415-993-2124
00003 Ron 415-993-2124 00004 Bob 818-243-9988
00004 Bob 818-243-9988 T1R2
00005 Carly 206-221-9123 00005 Carly 206-221-9123 00005 Carly 206-221-9123
00006 Scott 818-231-2566 00006 Scott 818-231-2566 00006 Scott 818-231-2566
00007 Simon 425-112-9877 00007 Simon 425-112-9877 00007 Simon 425-112-9877
00008 Lucas 415-992-4432 00008 Lucas 415-992-4432 00008 Lucas 415-992-4432
00009 Steve 530-288-9832
00010 Kelly 916-992-1234 00009 Steve 530-288-9832
T1R3
00011 Betty 650-241-1192 00010 Kelly 916-992-1234
00012 Anne 206-294-1298 00011 Betty 650-241-1192 00009 Steve 530-288-9832
00012 Anne 206-294-1298 00010 Kelly 916-992-1234
00011 Betty 650-241-1192
00012 Anne 206-294-1298
Full table T1 Table T1 split into 3 regions - R1, R2 and R3
48
Monday, April 1, 13
50. Distributed
mode
(con7nued)
• Regions
are
served
by
DataNode RegionServer
T1R1
00001
00002
John
Paul
415-111-1234
408-432-9922
RegionServers 00003
00004
Ron
Bob
415-993-2124
818-243-9988
• RegionServers
are
Java
Host1
T1R2
processes,
typically
00005 Carly 206-221-9123
00006 Scott 818-231-2566
DataNode RegionServer 00007 Simon 425-112-9877
00008 Lucas 415-992-4432
collocated
with
the
T1R3
00009
00010
Steve
Kelly
530-288-9832
916-992-1234
DataNodes Host2
00011
00012
Betty
Anne
650-241-1192
206-294-1298
DataNode RegionServer
Host3
49
Monday, April 1, 13
51. Finding
your
region
• Two
special
tables:
-‐ROOT-‐
and
.META.
-ROOT- table, consisting of only 1 region
-ROOT-
hosted on region server RS1
RS1
.META. table, consisting of 3 regions,
M1 M2 M3
hosted on RS1, RS2 and RS3
RS2 RS3 RS1
User tables T1 and T2, consisting of
T1R1 T1R2 T2R1 T1R3 T2R2 T2R3 T2R4 4 and 3 regions respectively,
distributed amongst RS1, RS2 and RS3
RS3 RS1 RS2 RS3 RS2 RS1 RS1
50
Monday, April 1, 13
52. Regions
distributed
across
the
cluster
T1-R1
00001 John 415-111-1234
DataNode RegionServer 00002 Paul 408-432-9922
00003 Ron 415-993-2124
RegionServer 1 (RS1), hosting regions R1 of
00004 Bob 818-243-9988
the user table T1 and M2 of .META.
.META. - M2
T1:00009-T1:00012 R3 RS2
Host1
T1-R2
00005 Carly 206-221-9123
00006 Scott 818-231-2566
00007 Simon 425-112-9877
DataNode RegionServer 00008 Lucas 415-992-4432
T1-R3
RegionServer 2 (RS2), hosting regions R2,
00009 Steve 530-288-9832 R3 of the user table and M1 of .META.
00010 Kelly 916-992-1234
00011 Betty 650-241-1192
00012 Anne 206-294-1298
Host2
.META. - M1
T1:00001-T1:00004 R1 RS1
T1:00005-T1:00008 R2 RS2
DataNode RegionServer -ROOT-
M:T100001-M:T100009 M1 RS2
RegionServer 3 (RS3), hosting just -ROOT-
M:T100010-M:T100012 M2 RS1
Host3
51
Monday, April 1, 13
53. What
the
client
does
MyTable
.META.
TheRealMT
-ROOT-
ZooKeeper
52
Monday, April 1, 13
54. What
the
client
does
MyTable
.META.
TheRealMT
-ROOT-
ZooKeeper
52
Monday, April 1, 13
55. What
the
client
does
MyTable
.META.
TheRealMT
-ROOT-
ZooKeeper
Row per META region
52
Monday, April 1, 13
56. What
the
client
does
MyTable
.META.
TheRealMT
-ROOT-
ZooKeeper
Row per META region
Row per table region
52
Monday, April 1, 13
57. What
the
client
does
MyTable
.META.
TheRealMT
-ROOT-
ZooKeeper
Row per META region
Row per table region
52
Monday, April 1, 13
58. HBase
and
Hadoop
...
they
play
well
together
53
Monday, April 1, 13
59. HBase
and
Hadoop
• HBase
has
7ght
integra7on
with
Hadoop
• Two
separate
concerns
• Access
via
MapReduce
• Storage
on
HDFS
54
Monday, April 1, 13
62. HBase
and
MapReduce
• HBase
is
7ghtly
integrated
with
the
Hadoop
stack
• MapReduce
• HDFS
• Source
for
MapReduce
jobs
• Read
from
an
HBase
Table
with
a
Map-‐Reduce
Job
(Export,
Row
Count,
...)
• Sink
for
MapReduce
jobs
• Wri7ng
to
an
HBase
Table
from
a
Map-‐Reduce
Job
• Lookups
during
MapReduce
jobs
• Join
between
hbase
and/or
another
source
57
Monday, April 1, 13
63. HBase
as
MR
source
• One
mapper
per
region
Map Tasks - one per region.
The tasks take the key range
of the region as their input split
and scan over it
Regions being
Region Region
…. Region
served by the
Region Server
Server Server Server
58
Monday, April 1, 13
64. HBase
as
MR
source
(con7nued)
• Mapper
protected void map(ImmutableBytesWritable
rowkey,
Result
result,
Context context) {
// mapper logic goes here
}
• Job
setup
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("twits"),
Bytes.toBytes("twit"));
TableMapReduceUtil.initTableMapperJob("twits", scan,
Map.class, ImmutableBytesWritable.class, Result.class,
job);
59
Monday, April 1, 13
65. HBase
as
MR
sink
• Write
to
regions
via
MR
job
Reduce tasks writing to HBase
regions. Reduce tasks don't necessarily
write to a region on the same physical host.
They will write to whichever region contains
the key range that they are writing into.
This could potentially mean that all reduce
tasks talk to all regions on the cluster
Region Region
…. Region
Regions being
served by the
Server Server Server Region Server
60
Monday, April 1, 13
67. HBase
as
a
shared
resource
• Lookups
from
MR
jobs
• Map-‐side
joins
• Simple
Get/Scan
calls
HBase
Regions being
served by the
Region Server
Region Region Region
Server Server Server
Map Tasks reading
files from HDFS as
their source
and doing random
gets or short scans
Datanode Datanode
…. Datanode
from HBase
62
Monday, April 1, 13
68. Integra7on
with
HDFS
• HDFS
is
a
filesystem
• Underlying
storage
fabric
for
HBase
• HBase
7ghtly
integrated
with
HDFS
and
leverages
the
seman7cs
• LSM
trees
fit
well
into
a
write
once
filesystem
• Availability
(fault
tolerance)
and
reliability
(durability)
at
scale
63
Monday, April 1, 13
69. Availability
and
fault
tolerance
Disaster strikes on
the host which was serving
region R1
Physical host Physical host Physical host Physical host
Region Server Region Server Region Server Region Server
Serving region R1
HFile1 HFile1 HFile1
Datanode Datanode Datanode Datanode
HDFS
Another server picks up the
region and starts serving it
from the persisted HFile
from HDFS. The lost copy of
the HFile will get replicated
to another Datanode
Physical host Physical host Physical host Physical host
Region Server Region Server Region Server Region Server
Serving region R1
HFile1 HFile1 HFile1
Datanode Datanode Datanode Datanode
HDFS
64
Monday, April 1, 13
70. Durability
• WAL
keeps
writes
intact
even
if
RS
fails
• HDFS
is
reliable
and
doesn’t
lose
data
if
individual
nodes
fail
65
Monday, April 1, 13
71. Summary
• Big
data
=
new
products,
applica7ons
• Hadoop
ecosystem
-‐
poster
child
• HBase
serves
many
use
cases
• Allows
to
solve
aspects
of
big
data
problems
• Simple
Java
API
• Tight
Hadoop
integra7on
• Low
latency
access
to
large
amounts
of
data
• Large
scale
computa7on
using
MR
66
Monday, April 1, 13
73. Schema
design
...
it’s
a
database
aper-‐all
68
Monday, April 1, 13
74. But
isn’t
HBase
schema-‐less?
• Number
of
tables
• Rowkey
design
• Number
of
column
families
per
table.
What
goes
into
what
column
family
• Column
qualifier
names
• What
goes
into
the
cells
• Number
of
versions
69
Monday, April 1, 13
75. Rowkeys
• Rowkey
design
is
the
single
most
important
aspect
of
HBase
table
designs
• The
only
way
to
address
rows
in
HBase
70
Monday, April 1, 13
76. TwitBase
rela7onships
• Users
follow
users
• Rela2onships
need
to
be
persisted
for
usage
later
on
• Model
tables
for
the
expected
access
paSerns
• Read
paSern
• Who
does
A
follow?
• Who
follows
A?
• Does
A
follow
B?
• Write
paSern
• A
follows
B
• A
unfollows
B
71
Monday, April 1, 13
77. Start
simple
• Adjacency
list
Column Family : follows
row key:
userid column qualifier: followed user number
cell value: followed userid
Cell value
Col Qualifier
follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers
TheRealMT 1:HRogers 2:Olivia
72
Monday, April 1, 13
78. Op7mizing
the
adjacency
list
• We
need
a
count
• Where
does
a
new
followed
user
go?
follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers count:4
TheRealMT 1:HRogers 2:Olivia count:2
73
Monday, April 1, 13
79. Adding
a
new
user
Row that needs to be updated
follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers count:4
TheRealMT 1:HRogers 2:Olivia count:2
1
TheFakeMT : follows: {count -> 4}
2 increment count
Client code: TheFakeMT : follows: {count -> 5}
Step 1: Get current count
Step 2: Update count
Step 3: Add new entry 3 add new entry
Step 4: Write the new data to HBase
TheFakeMT : follows: {5 -> MTFanBoy2, count -> 5}
4
follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers 5:MTFanBoy2 count:5
TheRealMT 1:HRogers 2:Olivia count:2
74
Monday, April 1, 13
80. Transac7ons
==
not
good
• HBase
doesn’t
have
na7ve
support
(think
scale)
• Don’t
want
to
complicate
client
side
logic
• Only
solu7on
-‐>
simplify
schema
follows
TheFakeMT TheRealMT:1 MTFanBoy:1 Olivia:1 HRogers:1
TheRealMT HRogers:1 Olivia:1
75
Monday, April 1, 13
81. Revisit
the
ques7ons
• Read
paGern
• Who
all
does
A
follow?
• Who
all
follows
A?
• Does
A
follow
B?
• Write
paGern
• A
follows
B
• A
unfollows
B
76
Monday, April 1, 13
83. Denormaliza7on
• Second
table
for
reverse
rela7onship
• Otherwise
scan
across
en7re
table
and
affect
read
performance
Normalization Dreamland
Write performance
Poor design Denormalization
Read performance
77
Monday, April 1, 13
84. More
op7miza7ons
• Convert
into
tall-‐narrow
table
• Leverage
rowkey
indexing
beGer
• Gets
-‐>
short
Scans
Keeping the column family and column qualifier
names short reduces the data transferred over the
network back to the client. The KeyValue objects
become smaller.
CF : f
The + in the row key refers to concatenating
row key: CQ: followed user's name the two values. You could delimitate
follower + followed using any character you like.
eg: A-B or A,B
cell value: 1
78
Monday, April 1, 13
85. Tall-‐narrow
table
example
• Denormaliza7on
is
the
way
to
go
f Putting the user name in the column
qualifier saves you from looking up
TheFakeMT+TheRealMT Mark Twain:1 the users table for the name of the
user given an id. You can simply
TheFakeMT+MTFanBoy Amandeep Khurana:1
list out names or ids while looking
TheFakeMT+Olivia Olivia Clemens:1 at relationships just from this table.
The downside of this is that you need
TheFakeMT+HRogers Henry Rogers:1
to update the name in all the cells
TheRealMT+Olivia Olivia Clemens:1 if the user updates their name in
their profile.
TheRealMT+HRogers Henry Rogers:1 This is classic Denormalization.
79
Monday, April 1, 13
86. Uniform
rowkey
length
• MD5
the
userids
-‐>
16
bytes
+
16
bytes
rowkeys
• BeGer
distribu7on
of
load
CF : f
Using MD5 of the user ids gives you
row key: CQ: followed userid fixed lengths instead of variable
md5(follower)md5(followed) length user ids. You don't need
concatenation logic anymore.
cell value: followed users name
80
Monday, April 1, 13
87. Uniform
rowkey
length
(con7nued)
f
MD5(TheFakeMT) MD5(TheRealMT) TheRealMT:Mark Twain
MD5(TheFakeMT) MD5(MTFanBoy) MTFanBoy:Amandeep Khurana
MD5(TheFakeMT) MD5(Olivia) Olivia:Olivia Clemens
MD5(TheFakeMT) MD5(HRogers) HRogers:Henry Rogers
MD5(TheRealMT) MD5(Olivia) Olivia:Olivia Clemens
MD5(TheRealMT) MD5(HRogers) HRogers:Henry Rogers
81
Monday, April 1, 13
88. Tall
v/s
Wide
tables
storage
footprint
Logical representation of an HBase table. Actual physical storage of the table
We'll look at what it means to Get() row r5 from this table.
CF1 CF2 HFile for CF1 HFile for CF2
r1 c1:v1 c1:v9 c6:v2
r1:CF1:c1:t1:v1
r2 c1:v2 c3:v6 r2:CF1:c1:t2:v2
r1:CF2:c1:t1:v9
r2:CF1:c3:t3:v6
r1:CF2:c6:t4:v2
r3 c2:v3 c5:v6 r3:CF1:c2:t1:v3
r3:CF2:c5:t4:v6
r4:CF1:c2:t1:v4
r5:CF2:c7:t3:v8
r4 c2:v4 r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5
r5 c1:v1 c3:v5 c7:v8
Result object returned for a Get() on row r5
r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5 KeyValue objects
r5:cf2:c7:t3:v8
Key Value
Row Col Col Time Cell
Key Fam Qual Stamp Value
Structure of a KeyValue object
82
Monday, April 1, 13
89. Rowkey
design
• Single
most
important
aspect
of
designing
tables
• Depends
on
expected
access
paGerns
• HFiles
are
sorted
on
Key
part
of
KeyValue
objects
"TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org"
"TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain"
"TheRealMT" , "info" , "password" , 1329088818321 , "abc123",
"TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne"
HFile for the info column family in the users table
83
Monday, April 1, 13
90. Write
op7mized
• Distribute
writes
across
the
cluster
• Issue
most
pronounced
with
7me
series
data
• Hashing
hash("TheRealMT") -> random byte[]
• Sal7ng
int salt = new Integer(new Long(timestamp).hashCode()).shortValue() % <number of region
servers>;
byte[] rowkey = Bytes.add(Bytes.toBytes(salt) + Bytes.toBytes("|") +
Bytes.toBytes(timestamp));
84
Monday, April 1, 13
91. Read
op7mized
• Data
to
be
accessed
together
should
be
stored
together
• eg:
twit
streams
-‐
last
10
twits
by
the
users
I
follow
Olivia1 1Olivia
Olivia2 1TheRealMT
Olivia5 2Olivia
Olivia7 2TheFakeMT
Olivia9 2TheRealMT
TheFakeMT2 3TheFakeMT
TheFakeMT3 4TheFakeMT
TheFakeMT4 5Olivia
TheFakeMT5 5TheFakeMT
TheFakeMT6 5TheRealMT
TheRealMT1 6TheFakeMT
TheRealMT2 7Olivia
TheRealMT5 8TheRealMT
TheRealMT8 9Olivia
85
Monday, April 1, 13
92. Rela7onal
to
Non
rela7onal
• Rela7onal
concepts
• En77es
• AGributes
• Rela7onships
• En77es
• Table
is
a
table.
Not
much
going
on
there
• Users
table
contains...
users.
Those
are
en77es
• Good
place
to
start.
Denormaliza7on
bends
that
rule
a
bit
86
Monday, April 1, 13
93. Rela7onal
to
Non
rela7onal
(con7nued)
• AGributes
• Iden7fying
• Primary
keys.
Compound
keys
• Maps
to
rowkeys
• Non-‐iden7fying
• Other
columns
• Maps
to
column
qualifiers
and
cells
• Rela7onships
87
Monday, April 1, 13
94. Nested
En77es
• Column
Qualifiers
can
contain
data
instead
of
just
a
column
name
hbase table
row key
column family
fixed qualifier → timestamp → value
Nested entities
repeating entity
variable qualifier → timestamp → value
88
Monday, April 1, 13
95. Schema
design
summary
• Schema
can
make
or
break
the
performance
you
get
• Rowkey
is
the
single
most
important
thing
• Use
tricks
like
hashing
and
sal2ng
• Denormalize
to
your
advantage
• There
are
no
joins
• Isolate
access
paSerns
• Separate
CFs
or
even
separate
tables
• Shorter
names
-‐>
lower
storage
footprint
• Column
qualifiers
can
be
used
to
store
data
and
not
just
column
names
• Big
difference
as
compared
to
RDBMS
89
Monday, April 1, 13
97. Components
• HDFS
(DataNodes)
• Storage
• ZooKeeper
• Membership
management
• RegionServers
• Serve
the
regions
• HBase
Masters
• Janitorial
work
91
Monday, April 1, 13
98. Hardware
• DataNodes
and
RegionServers
collocated
• Plenty
of
RAM
and
Disk
• 8-‐12
cores
• 48
GB
RAM
• 12
x
1
TB
disks
92
Monday, April 1, 13
99. Hardware
(con7nued)
• HBase
Master
and
ZooKeeper
collocated
• Don’t
necessarily
need
to
be
though
• Keep
odd
number
of
ZKs
• ~3
is
good
enough
for
upto
50-‐70
node
cluster
typically
• Give
ZK
independent
spindle
for
log
wri7ng
• I/O
latency
sensi7ve
• 4-‐6
cores
• 24
GB
RAM
93
Monday, April 1, 13
100. Types
of
deployments
• Standalone
for
dev
purpose
• Prototype
• <
10
nodes
• No
real
SLAs
• 1
ZK
+
Master,
rest
are
RS
• Small
produc2on
cluster
• 10-‐20
nodes
• 1
ZK
+
Master,
maybe
NNHA,
rest
are
RS
• Medium
produc2on
cluster
• 20-‐50
nodes
• 3
ZK
+
Master,
NNHA,
rest
are
RS
• Large
produc2on
cluster
• 50+
nodes
• 3
or
5
ZK
+
Master,
NNHA,
rest
are
RS
• Hardware
choices
remain
the
same
mostly
94
Monday, April 1, 13
101. Deploying
sopware
and
configura7on
• Automated
sopware
bits
deployment
highly
recommended
• Chef
• Puppet
• Cloudera
Manager
• Centralized
configura7on
management
highly
recommended
• Consistency
• Ease
of
management
95
Monday, April 1, 13
102. Configura7ons
• $HBASE_HOME/conf
• hbase-‐env.sh
• Environment
configura2ons
• hbase-‐site.xml
• HBase
system
configura2ons
• Linux
configura2ons
• Number
of
open
files
• swappiness
• HDFS
configura2ons
• $HADOOP_HOME/hdfs-‐site.xml
96
Monday, April 1, 13
103. Monitoring
• Metrics
context
from
Hadoop
• Ganglia
• File
context
• JMX
• Cac7
• Several
metrics
available
• Write
path
specific
• Read
path
specific
97
Monday, April 1, 13