Building apps with HBase - Data Days Texas March 2013

Building
applica2ons
with
HBase
Headline
Goes
Here
Amandeep
Khurana
|
Solu7ons
Architect
Speaker
Name
or
Subhead
Goes
Here
Data
Days
Texas,
Aus2n,
March
2013

1

Monday, April 1, 13

About
me
• Solu7ons
Architect,
Cloudera
Inc
• Amazon
Web
Services
• Interested
in
large
scale
distributed
systems
• Co-‐author,
HBase
In
Ac7on
• TwiGer:
amansk

2

Monday, April 1, 13

I’m
a
wanna-‐be
(author)

www.hbaseinac7on.com
Nick Dimiduk
Amandeep Khurana

MANNING

3

Monday, April 1, 13

About
the
workshop
• Install
HBase
• Basic
interac7ons
with
the
shell
• Use
of
HBase
Java
API
• Data
model
• HBase
and
Hadoop
integra7on

4

Monday, April 1, 13

Get
HBase
and
tutorial
instruc7ons
• Get
it
here
-‐>
hGp://hbase.apache.org/
• Downloads
-‐>
Choose
a
mirror
-‐>
0.94.1
• Download
the
ﬁle
hbase-‐0.94.1.tar.gz

• Get
the
tutorial
handout
-‐>

hGp://people.apache.org/~ak/ddtx13

5

Monday, April 1, 13

What’s
Big
Data?
...
except
another
marke7ng
term

6

Monday, April 1, 13

Data
management

7

Monday, April 1, 13

Data
has
changed

END-USER
APPLICATIONS
THE INTERNET
DATA GROWTH

MOBILE DEVICES
SOPHISTICATED
MACHINES

UNSTRUCTURED DATA – 90%

STRUCTURED DATA – 10%

1980 2012

8

Monday, April 1, 13

“Big
Data”

9

Monday, April 1, 13

Hadoop
-‐
Big
Data
poster
child

HDFS, MAPREDUCE

10

Monday, April 1, 13

Hadoop
-‐
Big
Data
poster
child
File System Mount UI Framework Machine Learning
FUSE-DFS HUE APACHE MAHOUT

Workflow Scheduling Metadata
APACHE OOZIE APACHE OOZIE APACHE HIVE

Languages / Compilers
APACHE PIG, APACHE HIVE, APACHE MAHOUT

Fast
Data Integration
Read/Write Access

APACHE FLUME, APACHE
SQOOP APACHE HBASE
HDFS, MAPREDUCE

APACHE ZOOKEEPER
Coordination

10

Monday, April 1, 13

What
is
HBase?
• Scalable,
distributed
data
store
• Sorted
map
of
maps
• Open
source
avatar
of
Google’s
Bigtable
• Sparse
• Mul7
dimensional
• Tightly
integrated
with
Hadoop
• Not
a
RDBMS

11

Monday, April 1, 13

So,
what’s
the
big
deal?
• Give
up
system
complexity
and
sophis7cated
features
for
ease

of
scale
and
manageability
• Simple
system
!=
easy
to
build
applica7ons
on
top
• Goal:
predictable
read
and
write
performance
at
scale
• Eﬀec7ve
API
usage
• Op7mal
schema
design
• Understand
internals
and
leverage
at
scale

12

Monday, April 1, 13

Use
cases
• Canonical
use
case:
storing
crawl
data
and
indices
for
search

Web Search
powered by Bigtable
Crawlers
Internets Crawlers Bigtable
Crawlers
Crawlers
1

2
MapReduce
Indexing the Internet
1 Crawlers constantly scour the Internet for new pages.
Those pages are stored as individual records in Bigtable. 4 3

2 A MapReduce job runs over the entire table, generating Search
search indexes for the Web Search application. Search
Index
Search Web Search
Index
Index

5 You

Searching the Internet
3 The user initiates a Web Search request.

4 The Web Search application queries the Search Indexes
and retries matching documents directly from Bigtable.

5 Search results are presented to the user.

13

Monday, April 1, 13

Use
cases
(con7nued)
• Capturing
incremental
data
• Content
serving
• Informa7on
exchange

14

Monday, April 1, 13

Ok,
enough
of
high
level
stuﬀ
• Big
data
is
a
buzzword
but
there
is
more
to
it
• New
products,
applica7ons,
types
of
data
• Hadoop
ecosystem
is
the
poster
child
of
big
data
architectures
• Consists
of
several
components
• HBase
is
a
part
of
the
ecosystem
• Scalable
database
• Lots
of
produc7on
deployments

15

Monday, April 1, 13

HBase
-‐
GeEng
Started
Installa7on
and
basic
interac7on

16

Monday, April 1, 13

Installing
HBase
• Get
it
here
-‐>
hGp://hbase.apache.org/
• Downloads
-‐>
Choose
a
mirror
-‐>
0.94.1
• Download
the
ﬁle
• Untar
it
• mkdir
~/ddtx
• cp
~/ddtx
• cd
~/ddtx
• tar
xvf

17

Monday, April 1, 13

Basics
• Modes
of
opera2on
• Standalone
• Pseudo-‐distributed
• Fully
distributed
• Start
up
in
standalone
mode
• cd
hbase-‐0.94.1
• bin/start-‐hbase.sh
• Logs
are
in
./logs/
• Check
out
hSp://localhost:60010
in
your
browser

18

Monday, April 1, 13

Shell
• Basic
interac7on
tool
• WriGen
in
JRuby
• Uses
the
Java
client
library
• Good
place
to
start
poking
around

19

Monday, April 1, 13

Shell
(con7nued)
• bin/hbase
shell
• Create
table
• create
‘users’
,
‘info’
• List
tables
• list
• Describe
table
• describe
‘users’

20

Monday, April 1, 13

Shell
(con7nued)
• Put
a
row
•put
‘users’
,
‘u1’,
‘info:name’
,
‘Mr.
Rogers’
• Get
a
row
• get
‘users’
,
‘u1’
• Put
more
• put
‘users’
,
‘u1’
,
‘info:email’
,
‘mail@rogers.com’
• put
‘users’
,
‘u1’
,
‘info:password’
,
‘foobar’
• Get
a
row
•get
‘users’
,
‘u1’
• Scan
table
• scan
‘users’

21

Monday, April 1, 13

Java
API
...
can’t
really
build
an
applica7on
just
with
the
shell

22

Monday, April 1, 13

Important
terms
• Table
• Consists
of
rows
and
columns
• Row
• Has
a
bunch
of
columns.
• Iden2ﬁed
by
a
rowkey
(primary
key)
• Column
Qualiﬁer
• Dynamic
column
name
• Column
Family
• Column
groups
-‐
logical
and
physical
(Similar
access
paSern)
• Cell
• The
actual
element
that
contains
the
data
for
a
row-‐column
intersec2on
• Version
• Every
cell
has
mul2ple
versions.

23

Monday, April 1, 13

Introducing
TwitBase
• TwiSer
clone
built
on
HBase
• Designed
to
scale
• It’s
an
example,
not
a
produc2on
app
• Start
with
users
and
twits
for
now
• Users
• Have
proﬁles
• Post
twits
• Twits
• Have
text
posted
by
users
• Get
sample
code
at:
hSps://github.com/hbaseinac2on/twitbase

24

Monday, April 1, 13

Connec7ng
to
HBase
• Using default confs:
HTableInterface usersTable = new HTable("users");

• Passing custom conf:
Configuration myConf = HBaseConfiguration.create();
myConf.set("parameter_name", "parameter_value");
HTableInterface usersTable = new HTable(myConf, "users");

• Connection pooling
HTablePool pool = new HTablePool();
HTableInterface usersTable = pool.getTable("users");
// work with the table
usersTable.close();

25

Monday, April 1, 13

Inser7ng
data
• Put
call
on
HTable
Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(Bytes.toBytes("info"),
Bytes.toBytes("name"),
Bytes.toBytes("Mark Twain"));
Bytes.toBytes("email"),
Bytes.toBytes("samuel@clemens.org"));
Bytes.toBytes("password"),
Bytes.toBytes("Langhorne"));
usersTable.put(p);

26

Monday, April 1, 13

Data
coordinates
• Row
is
addressed
using
rowkey
• Cell
is
addressed
using

[rowkey
+
family
+
qualiﬁer]

27

Monday, April 1, 13

Upda7ng
data
• Update
==
Write
Put p = new Put(Bytes.toBytes("TheRealMT"));
Bytes.toBytes("password"),
Bytes.toBytes("abc123"));
usersTable.put(p);

28

Monday, April 1, 13

Write
path
WAL

Write goes to the write-ahead log (WAL)
Memstore as well as the in memory write buffer called
the Memstore.
Client HFile Client's don't interact directly with the
underlying HFiles during writes.
HFile

Memstore

Memstore gets ﬂushed to disk to
New HFile
form a new HFile
when ﬁlled up with writes.
HFile

HFile

29

Monday, April 1, 13

Reading
data
• Get
the
en2re
row
Get g = new Get(Bytes.toBytes("TheRealMT"));
Result r = usersTable.get(g);

• Get
selected
columns
Get g = new Get(Bytes.toBytes("TheRealMT"));
g.addColumn(Bytes.toBytes("info"),
Bytes.toBytes("password"));
Result r = usersTable.get(g);

• Extract
the
value
out
of
the
Result
object
byte[] b = r.getValue(Bytes.toBytes("info"),
Bytes.toBytes("email"));
String email = Bytes.toString(b);

30

Monday, April 1, 13

Reading
data
(con7nued)
• Scan
Scan s = new Scan(startRow, stopRow);
ResultsScanner rs = twits.getScanner(s);
for(Result r : rs) {
// iterate over scanner results
}

• Single threaded iteration over defined set of rows. Could be the entire
table too.

31

Monday, April 1, 13

What
happens
when
you
read
data?

BlockCache

Memstore

Client HFile

HFile

32

Monday, April 1, 13

Dele7ng
data
• Another
simple
API
call
• Delete
en7re
row
Delete d = new Delete(Bytes.toBytes("TheRealMT"));
usersTable.delete(d);

• Delete
selected
columns
Delete d = new Delete(Bytes.toBytes("TheRealMT"));
d.deleteColumns(Bytes.toBytes("info"),
Bytes.toBytes("email"));
usersTable.delete(d);

33

Monday, April 1, 13

Delete
internals
• Tombstone
markers
• Dele7on
only
happens
during
major
compac7ons
• Compac7ons
• Minor
• Major

34

Monday, April 1, 13

Compac7ons

Memstore Memstore

HFile Two or more HFiles get combined
Compacted to form a single HFile during
HFile a compaction.
HFile

HFile HFile

35

Monday, April 1, 13

Hands-‐on:
Java
client
to
HBase
• Create
a
table
from
the
HBase
shell
• create
‘users’
,
‘info’
• Write
a
program
to
do
the
following
• Add
10
users
to
the
‘users’
table.
• userid
as
rowkey
• user’s
name
in
the
column
‘info:name’
• userid
could
just
be
user
+
serial
number.
eg:
user1,
user2,
user3
• Get
a
user
from
the
‘users’
table
and
display
on
screen
• Scan
the
en2re
‘users’
table
and
display
on
screen
36

Monday, April 1, 13

DAO
• Like
any
other
applica7on
• Makes
data
access
management
easy
• Cleaner
code
• Table
info
at
a
single
place
• Op7miza7ons
like
connec7on
pooling
etc
• Easier
management
of
database
conﬁgs

37

Monday, April 1, 13

Admin
HBaseAdmin Table
Opera7ons
• Administra7ve
API
to
the
cluster. • createTable()
• Provides
an
interface
to
manage
• deleteTable()
HBase
database
tables
metadata

• addColumn()
and
general
administra7ve

func7ons. • modifyColumn()
• deleteColumn()
• listTables()

Monday, April 1, 13

Data
Model
It’s
not
a
rela7onal
database
system

39

Monday, April 1, 13

HBase
is
...
• Column
family
oriented
database
• Column
family
oriented
• Tables
consis2ng
of
rows
and
columns
• Persisted
Map
• Sparse
• Mul2
dimensional
• Sorted
• Indexed
by
rowkey,
column
and
2mestamp
• Key
Value
store
• [rowkey,
col
family,
col
qualiﬁer,
2mestamp]
-‐>
cell
value

40

Monday, April 1, 13

HBase
is
not
...
• A
rela7onal
database
• No
SQL
query
language
• No
joins
• No
secondary
indexing
• No
transac7ons

41

Monday, April 1, 13

Tabular
representa7on

2
Column Family - Info
1 3
Rowkey name email password
GrandpaD Mark Twain samuel@clemens.org abc123

HMS_Surprise Patrick O'Brien aubrey@sea.com abc123
The table is lexicographically
sorted on the rowkeys
SirDoyle Fyodor Dostoyevsky fyodor@brothers.net abc123

TheRealMT Sir Arthur Conan Doyle art@TheQueensMen.co.uk Langhorne
abc123
4

Cells ts1=1329088321289 ts2=1329088818321

Each cell has multiple
The coordinates used to identify data in an HBase table are: versions,
(1) rowkey, (2) column family, (3) column qualiﬁer, (4) version typically represented by
the timestamp
of when they were
inserted into the table
(ts2>ts1)

42

Monday, April 1, 13

Key-‐Value
store

Keys Values
[TheRealMT, info, password, 1329088818321] abc123
[TheRealMT, info, password, 1329088321289] Langhorne

A single KeyValue instance

43

Monday, April 1, 13

Key-‐Value
store
[TheRealMT, info, password, 1329088818321] abc123

1 Start with coordinates of full precision

{
1329088818321 : "abc123",
[TheRealMT, info, password]
1329088321289 : "Langhorne"
}
2 Drop version and you're left with a map of version to values

Keys {
"email" : {
1329088321289 : "samuel@clemens.org"
},
"name" : {
[TheRealMT, info] 1329088321289 : "Mark Twain"
},
Values
"password" : {
1329088818321 : "abc123",
1329088321289 : "Langhorne"
}
}
3 Omit qualiﬁer and you have a map of qualiﬁers to the previous maps

{
"info" : {
"email" : {
},
"name" : {
1329088321289 : "Mark Twain"
[TheRealMT]
},
"password" : {
1329088818321 : "abc123",
1329088321289 : "Langhorne"
}
}
}
4 Finally, drop the column family and you have a row, a map of maps

44

Monday, April 1, 13

Sorted
map
of
maps

Rowkey
{
"TheRealMT" : {
Column family "info" : {
"email" : {
},
"name" : {
1329088321289 : "Mark Twain"
Column qualiﬁers },
"password" : { Values
1329088818321 : "abc123",
1329088321289 : "Langhorne"
}
}
} Versions
}

45

Monday, April 1, 13

HFiles
and
physical
data
model
• HFiles
are
• Immutable
• Sorted
on
rowkey
+
qualiﬁer
+
7mestamp
• In
the
context
of
a
column
family
per
region
"TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org"
"TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain"
"TheRealMT" , "info" , "password" , 1329088818321 , "abc123",
"TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne"

HFile for the info column family in the users table

46

Monday, April 1, 13

HBase
in
distributed
mode
...
what
really
goes
on
under
the
hood

47

Monday, April 1, 13

Distributed
mode
• Table
is
split
into
Regions
T1R1
00001 John 415-111-1234
00002 Paul 408-432-9922
00001 John 415-111-1234 00003 Ron 415-993-2124
00001 John 415-111-1234 00002 Paul 408-432-9922 00004 Bob 818-243-9988
00002 Paul 408-432-9922 00003 Ron 415-993-2124
00003 Ron 415-993-2124 00004 Bob 818-243-9988
00004 Bob 818-243-9988 T1R2
00005 Carly 206-221-9123 00005 Carly 206-221-9123 00005 Carly 206-221-9123
00006 Scott 818-231-2566 00006 Scott 818-231-2566 00006 Scott 818-231-2566
00007 Simon 425-112-9877 00007 Simon 425-112-9877 00007 Simon 425-112-9877
00008 Lucas 415-992-4432 00008 Lucas 415-992-4432 00008 Lucas 415-992-4432
00009 Steve 530-288-9832
00010 Kelly 916-992-1234 00009 Steve 530-288-9832
T1R3
00011 Betty 650-241-1192 00010 Kelly 916-992-1234
00012 Anne 206-294-1298 00011 Betty 650-241-1192 00009 Steve 530-288-9832
00012 Anne 206-294-1298 00010 Kelly 916-992-1234
00011 Betty 650-241-1192
00012 Anne 206-294-1298

Full table T1 Table T1 split into 3 regions - R1, R2 and R3

48

Monday, April 1, 13

Distributed
mode
(con7nued)
• Regions
are
served
by
DataNode RegionServer
T1R1
00001
00002
John
Paul
415-111-1234
408-432-9922

RegionServers 00003
00004
Ron
Bob
415-993-2124
818-243-9988

• RegionServers
are
Java

Host1

T1R2

processes,
typically

00005 Carly 206-221-9123
00006 Scott 818-231-2566
DataNode RegionServer 00007 Simon 425-112-9877
00008 Lucas 415-992-4432

collocated
with
the
T1R3
00009
00010
Steve
Kelly
530-288-9832
916-992-1234

DataNodes Host2
00011
00012
Betty
Anne
650-241-1192
206-294-1298

DataNode RegionServer

Host3

49

Monday, April 1, 13

Finding
your
region
• Two
special
tables:
-‐ROOT-‐
and
.META.
-ROOT- table, consisting of only 1 region
-ROOT-
hosted on region server RS1
RS1

.META. table, consisting of 3 regions,
M1 M2 M3
hosted on RS1, RS2 and RS3
RS2 RS3 RS1

User tables T1 and T2, consisting of
T1R1 T1R2 T2R1 T1R3 T2R2 T2R3 T2R4 4 and 3 regions respectively,
distributed amongst RS1, RS2 and RS3
RS3 RS1 RS2 RS3 RS2 RS1 RS1

50

Monday, April 1, 13

Regions
distributed
across
the
cluster
T1-R1
00001 John 415-111-1234
DataNode RegionServer 00002 Paul 408-432-9922
00003 Ron 415-993-2124
RegionServer 1 (RS1), hosting regions R1 of
00004 Bob 818-243-9988
the user table T1 and M2 of .META.
.META. - M2

T1:00009-T1:00012 R3 RS2
Host1

T1-R2
00005 Carly 206-221-9123
00006 Scott 818-231-2566
00007 Simon 425-112-9877
DataNode RegionServer 00008 Lucas 415-992-4432
T1-R3
RegionServer 2 (RS2), hosting regions R2,
00009 Steve 530-288-9832 R3 of the user table and M1 of .META.
00010 Kelly 916-992-1234
00011 Betty 650-241-1192
00012 Anne 206-294-1298
Host2
.META. - M1
T1:00001-T1:00004 R1 RS1
T1:00005-T1:00008 R2 RS2

DataNode RegionServer -ROOT-
M:T100001-M:T100009 M1 RS2
RegionServer 3 (RS3), hosting just -ROOT-
M:T100010-M:T100012 M2 RS1

Host3

51

Monday, April 1, 13

What
the
client
does
MyTable

.META.
TheRealMT

-ROOT-

ZooKeeper

52

Monday, April 1, 13

What
the
client
does
MyTable

.META.
TheRealMT

-ROOT-

ZooKeeper

Row per META region

52

Monday, April 1, 13

What
the
client
does
MyTable

.META.
TheRealMT

-ROOT-

ZooKeeper

Row per META region

Row per table region

52

Monday, April 1, 13

HBase
and
Hadoop
...
they
play
well
together

53

Monday, April 1, 13

HBase
and
Hadoop
• HBase
has
7ght
integra7on
with
Hadoop
• Two
separate
concerns
• Access
via
MapReduce
• Storage
on
HDFS

54

Monday, April 1, 13

Parallel
processing

55

Monday, April 1, 13

MapReduce

56

Monday, April 1, 13

HBase
and
MapReduce
• HBase
is
7ghtly
integrated
with
the
Hadoop
stack
• MapReduce
• HDFS
• Source
for
MapReduce
jobs
• Read
from
an
HBase
Table
with
a
Map-‐Reduce
Job
(Export,
Row
Count,
...)
• Sink
for
MapReduce
jobs
• Wri7ng
to
an
HBase
Table
from
a
Map-‐Reduce
Job
• Lookups
during
MapReduce
jobs
• Join
between
hbase
and/or
another
source

57

Monday, April 1, 13

HBase
as
MR
source
• One
mapper
per
region

Map Tasks - one per region.
The tasks take the key range
of the region as their input split
and scan over it
Regions being

Region Region
…. Region
served by the
Region Server

Server Server Server

58

Monday, April 1, 13

HBase
as
MR
source
(con7nued)
• Mapper
protected void map(ImmutableBytesWritable
rowkey,
Result
result,
Context context) {
// mapper logic goes here
}

• Job
setup
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("twits"),
Bytes.toBytes("twit"));
TableMapReduceUtil.initTableMapperJob("twits", scan,
Map.class, ImmutableBytesWritable.class, Result.class,
job);

59

Monday, April 1, 13

HBase
as
MR
sink
• Write
to
regions
via
MR
job

Reduce tasks writing to HBase
regions. Reduce tasks don't necessarily
write to a region on the same physical host.
They will write to whichever region contains
the key range that they are writing into.
This could potentially mean that all reduce
tasks talk to all regions on the cluster

Region Region
…. Region
Regions being
served by the
Server Server Server Region Server

60

Monday, April 1, 13

HBase
as
MR
sink
(con7nued)
• Reducer
protected void reduce(ImmutableBytesWritable
rowkey,
Iterable<Put> values, Context context) {
// reduce logic goes here
}

• Job
setup
TableMapReduceUtil.initTableReducerJob("users",
Reducer.class, job);

61

Monday, April 1, 13

HBase
as
a
shared
resource
• Lookups
from
MR
jobs
• Map-‐side
joins
• Simple
Get/Scan
calls
HBase
Regions being
served by the
Region Server
Region Region Region
Server Server Server

Map Tasks reading
ﬁles from HDFS as
their source
and doing random
gets or short scans

Datanode Datanode
…. Datanode
from HBase

62

Monday, April 1, 13

Integra7on
with
HDFS
• HDFS
is
a
filesystem
• Underlying
storage
fabric
for
HBase
• HBase
7ghtly
integrated
with
HDFS
and
leverages
the
seman7cs
• LSM
trees
fit
well
into
a
write
once
filesystem
• Availability
(fault
tolerance)
and
reliability
(durability)
at
scale

63

Monday, April 1, 13

Availability
and
fault
tolerance
Disaster strikes on
the host which was serving
region R1

Physical host Physical host Physical host Physical host

Region Server Region Server Region Server Region Server
Serving region R1

HFile1 HFile1 HFile1

Datanode Datanode Datanode Datanode

HDFS

Another server picks up the
region and starts serving it
from the persisted HFile
from HDFS. The lost copy of
the HFile will get replicated
to another Datanode

Physical host Physical host Physical host Physical host

Region Server Region Server Region Server Region Server
Serving region R1

HFile1 HFile1 HFile1

Datanode Datanode Datanode Datanode

HDFS

64

Monday, April 1, 13

Durability
• WAL
keeps
writes
intact
even
if
RS
fails
• HDFS
is
reliable
and
doesn’t
lose
data
if
individual
nodes
fail

65

Monday, April 1, 13

Summary
• Big
data
=
new
products,
applica7ons
• Hadoop
ecosystem
-‐
poster
child
• HBase
serves
many
use
cases
• Allows
to
solve
aspects
of
big
data
problems
• Simple
Java
API
• Tight
Hadoop
integra7on
• Low
latency
access
to
large
amounts
of
data
• Large
scale
computa7on
using
MR

66

Monday, April 1, 13

Schema
design
...
it’s
a
database
aper-‐all

68

Monday, April 1, 13

But
isn’t
HBase
schema-‐less?
• Number
of
tables
• Rowkey
design

• Number
of
column
families
per
table.
What
goes
into
what

column
family
• Column
qualiﬁer
names
• What
goes
into
the
cells
• Number
of
versions

69

Monday, April 1, 13

Rowkeys
• Rowkey
design
is
the
single
most
important
aspect
of
HBase

table
designs
• The
only
way
to
address
rows
in
HBase

70

Monday, April 1, 13

TwitBase
rela7onships
• Users
follow
users
• Rela2onships
need
to
be
persisted
for
usage
later
on
• Model
tables
for
the
expected
access
paSerns
• Read
paSern
• Who
does
A
follow?
• Who
follows
A?
• Does
A
follow
B?
• Write
paSern
• A
follows
B
• A
unfollows
B

71

Monday, April 1, 13

Start
simple
• Adjacency
list
Column Family : follows
row key:
userid column qualiﬁer: followed user number

cell value: followed userid

Cell value
Col Qualiﬁer
follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers
TheRealMT 1:HRogers 2:Olivia

72

Monday, April 1, 13

Op7mizing
the
adjacency
list
• We
need
a
count
• Where
does
a
new
followed
user
go?

follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers count:4
TheRealMT 1:HRogers 2:Olivia count:2

73

Monday, April 1, 13

Adding
a
new
user
Row that needs to be updated

follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers count:4

1

TheFakeMT : follows: {count -> 4}

2 increment count

Client code: TheFakeMT : follows: {count -> 5}
Step 1: Get current count
Step 2: Update count
Step 3: Add new entry 3 add new entry
Step 4: Write the new data to HBase
TheFakeMT : follows: {5 -> MTFanBoy2, count -> 5}

4

follows
TheFakeMT 1:TheRealMT 2:MTFanBoy 3:Olivia 4:HRogers 5:MTFanBoy2 count:5

74

Monday, April 1, 13

Transac7ons
==
not
good
• HBase
doesn’t
have
na7ve
support
(think
scale)
• Don’t
want
to
complicate
client
side
logic
• Only
solu7on
-‐>
simplify
schema

follows
TheFakeMT TheRealMT:1 MTFanBoy:1 Olivia:1 HRogers:1
TheRealMT HRogers:1 Olivia:1

75

Monday, April 1, 13

Revisit
the
ques7ons
• Read
paGern
• Who
all
does
A
follow?
• Who
all
follows
A?
• Does
A
follow
B?
• Write
paGern
• A
follows
B
• A
unfollows
B

76

Monday, April 1, 13

Revisit
the
ques7ons

76

Monday, April 1, 13

Denormaliza7on
• Second
table
for
reverse
rela7onship
• Otherwise
scan
across
en7re
table
and
aﬀect
read
performance

Normalization Dreamland

Write performance

Poor design Denormalization

Read performance

77

Monday, April 1, 13

More
op7miza7ons
• Convert
into
tall-‐narrow
table
• Leverage
rowkey
indexing
beGer
• Gets
-‐>
short
Scans
Keeping the column family and column qualiﬁer
names short reduces the data transferred over the
network back to the client. The KeyValue objects
become smaller.

CF : f
The + in the row key refers to concatenating
row key: CQ: followed user's name the two values. You could delimitate
follower + followed using any character you like.
eg: A-B or A,B

cell value: 1

78

Monday, April 1, 13

Tall-‐narrow
table
example
• Denormaliza7on
is
the
way
to
go

f Putting the user name in the column
qualiﬁer saves you from looking up
TheFakeMT+TheRealMT Mark Twain:1 the users table for the name of the
user given an id. You can simply
TheFakeMT+MTFanBoy Amandeep Khurana:1
list out names or ids while looking
TheFakeMT+Olivia Olivia Clemens:1 at relationships just from this table.
The downside of this is that you need
TheFakeMT+HRogers Henry Rogers:1
to update the name in all the cells
TheRealMT+Olivia Olivia Clemens:1 if the user updates their name in
their proﬁle.
TheRealMT+HRogers Henry Rogers:1 This is classic Denormalization.

79

Monday, April 1, 13

Uniform
rowkey
length
• MD5
the
userids
-‐>
16
bytes
+
16
bytes
rowkeys
• BeGer
distribu7on
of
load

CF : f
Using MD5 of the user ids gives you
row key: CQ: followed userid ﬁxed lengths instead of variable
md5(follower)md5(followed) length user ids. You don't need
concatenation logic anymore.

cell value: followed users name

80

Monday, April 1, 13

Uniform
rowkey
length
(con7nued)

f
MD5(TheFakeMT) MD5(TheRealMT) TheRealMT:Mark Twain
MD5(TheFakeMT) MD5(MTFanBoy) MTFanBoy:Amandeep Khurana
MD5(TheFakeMT) MD5(Olivia) Olivia:Olivia Clemens
MD5(TheFakeMT) MD5(HRogers) HRogers:Henry Rogers
MD5(TheRealMT) MD5(Olivia) Olivia:Olivia Clemens
MD5(TheRealMT) MD5(HRogers) HRogers:Henry Rogers

81

Monday, April 1, 13

Tall
v/s
Wide
tables
storage
footprint
Logical representation of an HBase table. Actual physical storage of the table
We'll look at what it means to Get() row r5 from this table.

CF1 CF2 HFile for CF1 HFile for CF2
r1 c1:v1 c1:v9 c6:v2
r1:CF1:c1:t1:v1
r2 c1:v2 c3:v6 r2:CF1:c1:t2:v2
r1:CF2:c1:t1:v9
r2:CF1:c3:t3:v6
r1:CF2:c6:t4:v2
r3 c2:v3 c5:v6 r3:CF1:c2:t1:v3
r3:CF2:c5:t4:v6
r4:CF1:c2:t1:v4
r5:CF2:c7:t3:v8
r4 c2:v4 r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5
r5 c1:v1 c3:v5 c7:v8

Result object returned for a Get() on row r5
r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5 KeyValue objects
r5:cf2:c7:t3:v8

Key Value
Row Col Col Time Cell
Key Fam Qual Stamp Value

Structure of a KeyValue object

82

Monday, April 1, 13

Rowkey
design
• Single
most
important
aspect
of
designing
tables
• Depends
on
expected
access
paGerns
• HFiles
are
sorted
on
Key
part
of
KeyValue
objects

"TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org"
"TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain"
"TheRealMT" , "info" , "password" , 1329088818321 , "abc123",
"TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne"

HFile for the info column family in the users table

83

Monday, April 1, 13

Write
op7mized
• Distribute
writes
across
the
cluster
• Issue
most
pronounced
with
7me
series
data
• Hashing
hash("TheRealMT") -> random byte[]

• Sal7ng
int salt = new Integer(new Long(timestamp).hashCode()).shortValue() % <number of region
servers>;
byte[] rowkey = Bytes.add(Bytes.toBytes(salt) + Bytes.toBytes("|") +
Bytes.toBytes(timestamp));

84

Monday, April 1, 13

Read
op7mized
• Data
to
be
accessed
together
should
be
stored
together
• eg:
twit
streams
-‐
last
10
twits
by
the
users
I
follow

Olivia1 1Olivia
Olivia2 1TheRealMT
Olivia5 2Olivia
Olivia7 2TheFakeMT
Olivia9 2TheRealMT
TheFakeMT2 3TheFakeMT
TheFakeMT4 5Olivia
TheFakeMT6 5TheRealMT
TheRealMT1 6TheFakeMT
TheRealMT2 7Olivia
TheRealMT5 8TheRealMT
TheRealMT8 9Olivia

85

Monday, April 1, 13

Rela7onal
to
Non
rela7onal
• Rela7onal
concepts
• En77es
• AGributes
• Rela7onships
• En77es
• Table
is
a
table.
Not
much
going
on
there
• Users
table
contains...
users.
Those
are
en77es
• Good
place
to
start.
Denormaliza7on
bends
that
rule
a
bit

86

Monday, April 1, 13

Rela7onal
to
Non
rela7onal
(con7nued)
• AGributes
• Iden7fying
• Primary
keys.
Compound
keys
• Maps
to
rowkeys

• Non-‐iden7fying
• Other
columns
• Maps
to
column
qualiﬁers
and
cells

• Rela7onships

87

Monday, April 1, 13

Nested
En77es
• Column
Qualifiers
can
contain
data
instead
of
just
a
column

name

hbase table
row key

column family
fixed qualifier → timestamp → value
Nested entities
repeating entity

variable qualifier → timestamp → value

88

Monday, April 1, 13

Schema
design
summary
• Schema
can
make
or
break
the
performance
you
get
• Rowkey
is
the
single
most
important
thing
• Use
tricks
like
hashing
and
sal2ng
• Denormalize
to
your
advantage
• There
are
no
joins
• Isolate
access
paSerns
• Separate
CFs
or
even
separate
tables
• Shorter
names
-‐>
lower
storage
footprint
• Column
qualiﬁers
can
be
used
to
store
data
and
not
just
column
names
• Big
diﬀerence
as
compared
to
RDBMS

89

Monday, April 1, 13

Deployments
and
Opera2ons
...
for
some
real
stuﬀ,
beyond
your
laptop

90

Monday, April 1, 13

Components
• HDFS
(DataNodes)
• Storage
• ZooKeeper
• Membership
management
• RegionServers
• Serve
the
regions
• HBase
Masters
• Janitorial
work

91

Monday, April 1, 13

Hardware
• DataNodes
and
RegionServers
collocated
• Plenty
of
RAM
and
Disk
• 8-‐12
cores
• 48
GB
RAM
• 12
x
1
TB
disks

92

Monday, April 1, 13

Hardware
(con7nued)
• HBase
Master
and
ZooKeeper
collocated
• Don’t
necessarily
need
to
be
though
• Keep
odd
number
of
ZKs
• ~3
is
good
enough
for
upto
50-‐70
node
cluster
typically
• Give
ZK
independent
spindle
for
log
wri7ng
• I/O
latency
sensi7ve
• 4-‐6
cores
• 24
GB
RAM

93

Monday, April 1, 13

Types
of
deployments
• Standalone
for
dev
purpose
• Prototype
• <
10
nodes
• No
real
SLAs
• 1
ZK
+
Master,
rest
are
RS
• Small
produc2on
cluster
• 10-‐20
nodes
• 1
ZK
+
Master,
maybe
NNHA,
rest
are
RS
• Medium
produc2on
cluster
• 20-‐50
nodes
• 3
ZK
+
Master,
NNHA,
rest
are
RS
• Large
produc2on
cluster
• 50+
nodes
• 3
or
5
ZK
+
Master,
NNHA,
rest
are
RS
• Hardware
choices
remain
the
same
mostly

94

Monday, April 1, 13

Deploying
sopware
and
conﬁgura7on
• Automated
sopware
bits
deployment
highly
recommended
• Chef
• Puppet
• Cloudera
Manager
• Centralized
conﬁgura7on
management
highly
recommended
• Consistency
• Ease
of
management

95

Monday, April 1, 13

Configura7ons
• $HBASE_HOME/conf
• hbase-‐env.sh
• Environment
configura2ons
• hbase-‐site.xml
• HBase
system
configura2ons
• Linux
configura2ons
• Number
of
open
files
• swappiness
• HDFS
configura2ons
• $HADOOP_HOME/hdfs-‐site.xml

96

Monday, April 1, 13

Monitoring
• Metrics
context
from
Hadoop
• Ganglia
• File
context
• JMX
• Cac7
• Several
metrics
available
• Write
path
speciﬁc
• Read
path
speciﬁc

97

Monday, April 1, 13

Monitoring
(con7nued)

98

Monday, April 1, 13

Performance
tuning
• Performance
tes2ng
• Use
real
(or
synthe2c)
workloads
• YCSB
• Tuning
• Write
op2mized
• Memstore
• Random-‐read
op2mized
• Block
cache
• Smaller
block
size

• Sequen2al-‐read
op2mized
• Block
cache
• Higher
block
size

• GC
tuning

99

Monday, April 1, 13

Building apps with HBase - Data Days Texas March 2013

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Building apps with HBase - Data Days Texas March 2013

Ähnlich wie Building apps with HBase - Data Days Texas March 2013 (20)

Building apps with HBase - Data Days Texas March 2013