SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Downloaden Sie, um offline zu lesen
1©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Apache	
  HBase:
Overview,	
  Hands-­‐On,	
  and	
  Use	
  
Cases
Apekshit Sharma
Esteban	
  Gutierrez
2©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Apekshit Sharma
• Distributed	
  Software	
  Engineer,	
  
Cloudera
• Software	
  Engineer,	
  Google
• Apache	
  HBase contributor
• Performance	
  improvements	
  and	
  
configuration	
  framework
Esteban	
  Gutierrez
• Software	
  Engineer,	
  Cloudera
• Apache	
  HBase	
  committer
• +15	
  years	
  of	
  experience	
  on	
  Internet	
  
scale	
  platforms.
Who	
  we	
  are
3©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Contents
• Motivation
• Introduction	
  to	
  Apache	
  HBase
• Data	
  model
• Hands-­‐On:	
  Installation,	
  HBase	
  shell
• A	
  more	
  in-­‐depth	
  introduction	
  to	
  Apache	
  HBase
• Apache	
  Hadoop	
  HDFS
• HBase	
  Architecture
4©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Motivation
http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
5©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Motivation
• Indexing	
  the	
  internet	
  has	
  challenges:
• Scale
• Volume
• Rate
• Diversity	
  of	
  content
• URLs
• High-­‐resolution	
  images
• Video
6©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Motivation
7©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• What	
  if	
  you’re	
  not	
  trying	
  to	
  index	
  the	
  
internet?
Motivation
8©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• Data	
  for	
  analytical	
  processing
• User-­‐facing	
  real-­‐time	
  platforms
Motivation
9©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Apache	
  HBase
• Open	
  source
• Random	
  access,	
  Low	
  latency	
  and	
  
Consistent	
  data	
  store
• Horizontally scalable
• Built	
  on	
  top	
  of	
  Apache	
  Hadoop
10©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Apache	
  HBase	
  is	
  open	
  source
• Apache	
  2.0	
  License
• A	
  community	
  project	
  with	
  committers	
  and	
  contributors	
  from	
  diverse	
  
organizations
• Cloudera,	
  Facebook,	
  Salesforce.com,	
  Huawei,	
  TrendMicro,	
  eBay,	
  HortonWorks,	
  
Intel,	
  Twitter,	
  …
• Code	
  license	
  means	
  anyone	
  can	
  modify	
  and	
  use	
  the	
  code.
11©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
•CAP	
  theorem
•Consistency
•Availability
•Partition	
  tolerance
Apache	
  HBase	
  is	
  consistent
HBase
12©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• Adding	
  more	
  servers	
  linearly increases	
  
performance	
  and	
  capacity
• Storage	
  capacity	
  
• Input/output	
   operations
• Store	
  and	
  access	
  data	
  on	
  commodity	
  
servers • Largest	
  cluster:	
  >	
  3000	
  nodes,	
  >	
  100	
  PB
• Average	
  cluster:	
  10-­‐40	
  nodes,	
  100-­‐400TB
Apache	
  HBase	
  is	
  horizontally	
  scalable
0
100
200
300
400
500
600
Performance	
  (IOPs/Storage/Throughput)
#	
  of	
  servers
13©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• Commodity	
  servers	
  (circa 2015)
• 12-­‐24	
  2-­‐4TB	
  hard disks
• 2	
  (quad/hexa/octa)-­‐core	
  CPUs,	
  2-­‐3
GHz
• 64	
  -­‐ 512 GBs	
  of	
  RAM
• 10	
  Gbps ethernet
• $5,000	
  -­‐ $10,000	
  /	
  machine
Apache	
  HBase	
  is	
  horizontally	
  scalable
14©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Data	
  model
Row	
  key info:name info:age comp:base comp:stocks
121 ‘tom’ ‘28’ ‘125k’
145 ‘bob’ ‘32’ ‘110k’
‘50’	
  (ts=2012)
‘100’	
   (ts=2014)
Columns
<Column	
   family>:<column	
   qualifier>
Cells
Row	
  keys
15©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Data	
  model
• Data	
  is	
  stored…	
  in	
  a	
  big	
  table
• Sorted	
  rows with	
  primary	
  row	
  keys.	
  Supports	
  billions	
  of	
  rows.
• Column	
  :	
  <column	
  family>:<column	
  qualifier>;	
  Supports	
  millions	
  of	
  columns
• Cell	
  :	
  intersection	
  of	
  row	
  and	
  column.
• Can	
  be	
  empty.	
  No	
  storage/processing	
  overheads
• Can	
  have	
  different	
  time-­‐stamped	
  values
16©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Hands	
  On	
  
Apache	
  HBase installation	
  
Play	
  with	
  HBase	
  shell
http://pastebin.com/60EUN75f	
  
17©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Understanding	
  basic	
  architecture	
  of
Hadoop	
  (HDFS)
18©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Apache	
  Hadoop
Open	
  source
Commodity servers
Horizontally scalable
Highly fault-­‐tolerant
Massive processing	
  power	
  
19©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Apache	
  Hadoop
MapReduce	
  +	
  
YARN
2	
  Core	
  Components
HDFS
(Hadoop	
  Distributed	
  File	
  
System)
20©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
History
2003
21©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• Distributed file	
  system
• Commodity servers
• horizontally scalable
• Highly fault-­‐tolerant
• Proprietary
GFS
22©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• Distributed file	
  system
• Commodity servers
• Horizontally scalable
• Highly fault-­‐tolerant
• Open	
  source
HDFS
23©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HDFS	
  API
• File
• Open,	
  Close,	
  Read,	
  Write,	
  Move,	
  etc
• Directories
• Create,	
  Delete,	
  etc
• Permissions
• Owners,	
  Groups,	
  rwx permissions
24©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Basic	
  Architecture	
  of	
  HDFS
25©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
File
B1 B2 B3
File	
  system	
  will	
  split
the	
  file	
  into	
  blocks
DiskB1
B2
B3
Local	
  file	
  system
26©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
DataNode	
  1 DataNode	
  2 DataNode	
  3 DataNode	
  4
HDFS
File	
  distributed	
   across	
  machines
B1 B2 B3
27©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
B1
DataNode	
  1
B2
DataNode	
  2
B3
DataNode	
  3 DataNode	
  4
HDFS	
  	
  DataNode
28©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
DataNode	
  1 DataNode	
  2 DataNode	
  3 DataNode	
  4
HDFS	
  	
  NameNode
NameNode
B1 B2 B3
29©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
DataNode	
  1 DataNode	
  2 DataNode	
  3 DataNode	
  4
HDFS	
  	
  Reading	
  a	
  file
NameNode
B1 B2 B3
Client
1.	
  File	
  ‘foo’ 2.	
  Verify	
   client	
  has	
  permissions
to	
  read	
  the	
  file
3.	
  List	
  of	
  foo’s
bocks	
   and	
  datanodes
30©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
DataNode	
  1 DataNode	
  2 DataNode	
  3 DataNode	
  4
HDFS	
  	
  Fault	
  tolerance
NameNode
B1 B2 B3
31©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
DataNode	
  1 DataNode	
  2 DataNode	
  3 DataNode	
  4
HDFS	
  	
  Redundancy
NameNode
B1 B2 B3B1 B1B2 B2 B3B3
32©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
DataNode	
  1 DataNode	
  2 DataNode	
  3 DataNode	
  4
HDFS	
  	
  Horizontal	
  Scalability
NameNode
B1 B2 B3B1 B1B2 B2 B3B3
DataNode	
  5
33©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Let’s	
  look	
  at	
  some	
  existing	
  HDFS	
  systems...
34©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• Yahoo!	
  HDFS	
  Clusters
40k+	
  servers,	
  100k+	
  CPUs,	
  450PB	
  data
• Spam filtering,	
  web	
  indexing,	
  personaliztion,	
  etc.
• Facebook	
  HDFS	
  Cluster
15TB	
  new	
  data	
  per	
  day
1200+	
  machines,	
  30PB	
  in	
  one	
  cluster
• Datawarehouse,	
  Messages,	
  etc.
35©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
• 1-­‐10PB HDFS	
  Clusters
Production	
  critical	
  workloads	
  for	
  websites,	
  retail,	
  finance,	
  telcom,	
  
government,	
  research	
  …
• Sub	
  PB	
  HDFS	
  Clusters
PoCs,	
  R&D,	
  production	
  in	
  early	
  stages	
  …
36©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
But….	
  there	
  are	
  restrictions!
It’s	
  not	
  a	
  magic	
  wand!
37©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Files	
  are append	
  only
• Access	
  Model	
  :	
  Write-­‐once-­‐read-­‐many	
  
• Can	
  not	
  change	
  existing	
  contents
38©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Not	
  designed	
  for	
  small	
  files
• HDFS	
  block	
  sizes	
  are	
  in	
  MB	
  (default	
  128MB)
• Designed	
  for	
  typical	
  GBs	
  /	
  TBs	
  of	
  file	
  sizes
• Normal	
  files	
  system	
  have	
  4kb	
  block	
  size!
• Name	
  node	
  becomes	
  bottleneck	
  because	
  metadata	
  grows	
  too	
  big
39©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Summary
HDFS	
  is	
  a	
  great	
  distributed	
  file	
  system!
• Store	
  massive	
  data
• Scalable
• High	
  throughput
• Fault	
  tolerance
40©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
MapReduce
• Distributed	
  	
  processing	
  
framework
• Commodity	
  machines
• Fault	
  tolerance
41©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
MapReduce
• HBase	
  uses	
  and	
  extends	
  
MR	
  APIs	
  for	
  data	
  access
Further	
  reading:	
  http://hadoop.apache.org
42©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
43©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Unique	
   id Name price weight store1 store2 store3
“1000000” snickers $9.99 4	
  Oz Yes Yes Yes
“3000000” almonds $9.99 8	
  Oz Yes No Yes
“8000000” coke $9.99 16	
  Oz Yes Yes Yes
44©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Unique	
   id Name price weight store1 store2 store3
“1000000” snickers $9.99 4	
  Oz Yes Yes Yes
“3000000” almonds $9.99 8	
  Oz Yes No Yes
“8000000” coke $9.99 16	
  Oz Yes Yes Yes
“4000000” new $34.63 16	
  Oz No Yes Yes
45©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Unique	
   id Name price weight store1 store2 store3
“1000000” snickers $9.99 4	
  Oz Yes Yes Yes
“3000000” almonds $9.99 8	
  Oz Yes No Yes
“8000000” coke $9.99 16	
  Oz Yes Yes Yes
“4000000” foo $34.63 16	
  Oz No Yes Yes
“5000000” bar $22.54 16	
  Oz Yes Yes Yes
46©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Unique	
   id Name price weight store1 store2 store3
“1000000” snickers $9.99 4	
  Oz Yes Yes Yes
“3000000” almonds $9.99 8	
  Oz Yes No Yes
“8000000” coke $9.99 16	
  Oz Yes Yes Yes
“4000000” foo $34.63 16	
  Oz No Yes Yes
“5000000” bar $22.54 16	
  Oz Yes Yes Yes
“9000000” new1 $2.5 16	
  Oz Yes Yes Yes
“7000000” new2 $6.4 16	
  Oz Yes Yes Yes
“2000000” new3 $6.4 16	
  Oz Yes Yes Yes
47©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Row Key Name brand price weight store1 store2 store3
“1000000” snickers xxx $9.99 4	
  Oz Yes Yes Yes
“2000000” new3 xxx $6.4 16	
  Oz Yes Yes Yes
“3000000” almonds xxx $9.99 8	
  Oz Yes No Yes
“4000000” foo xxx $34.63 16	
  Oz No Yes Yes
Row	
  Key Name brand price weight store1 store2 store3
“5000000” bar xxx $22.54 16	
  Oz Yes Yes Yes
“7000000” new2 xxx $6.4 16	
  Oz Yes Yes Yes
“8000000” coke xxx $9.99 16	
  Oz Yes Yes Yes
“9000000” new1 xxx $2.5 16	
  Oz Yes Yes Yes
[	
  “”,	
  “5000000”)	
  
[	
  “5000000”,	
   “”)	
  
HBase	
  Architecture	
  |	
  Regions
48©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Row Key Name price weight ….
“1000000” snickers $9.99 4	
  Oz ….
“2000000” new3 $6.4 16	
  Oz ….
“3000000” almonds $9.99 8	
  Oz ….
“4000000” foo $34.63 16	
  Oz ….
Row Key Name price weight ….
“5000000” bar $22.54 16	
  Oz ….
“7000000” new2 $6.4 16	
  Oz ….
“8000000” coke $9.99 16	
  Oz ….
“9000000” new1 $2.5 16	
  Oz ….
HBase	
  Architecture	
  |	
  RegionServer
Server	
  12
Server	
  7
49©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Regions
• Horizontal	
  split	
  of	
  tables
• Regions	
  are	
  served	
  by	
  RegionServers
• Automatically	
  (based	
  on	
  configurations).	
  Can	
  be	
  done	
  manually	
  too.
Relationship:
• A	
  Region	
  is	
  only	
  served	
  by	
  a	
  single	
  RegionServer	
  at	
  a	
  time.
• A	
  RegionServer	
  can	
  serve	
  multiple	
  Regions.
Region	
  
Server	
  X
Region	
  
[""	
  ,	
  10)
Region	
  
[40,	
  50)
Region	
  
Server	
  Y
Region	
  
[10,	
  30)
Region	
  
[30,	
  40)
Region	
  
[50	
  "")
50©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Row Key Name price weight store1 store2 store3
“1000000” snickers $9.99 4	
  Oz Yes Yes Yes
“2000000” new3 $6.4 16	
  Oz Yes Yes Yes
“3000000” almonds $9.99 8	
  Oz Yes No Yes
“4000000” foo $34.63 16	
  Oz No Yes Yes
“5000000” bar $22.54 16	
  Oz Yes Yes Yes
“7000000” new2 $6.4 16	
  Oz Yes Yes Yes
“8000000” coke $9.99 16	
  Oz Yes Yes Yes
“9000000” new1 $2.5 16	
  Oz Yes Yes Yes
HBase	
  Architecture
What	
  if	
  you	
  have	
  many	
  more	
  columns?
51©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Row Key info:
name
info:
price
info:
weight
availability:
store1
availability:
store2
availability:
store3
“1000000” snickers $9.99 4	
  Oz Yes Yes Yes
“2000000” new3 $6.4 16	
  Oz Yes Yes Yes
“3000000” almonds $9.99 8	
  Oz Yes No Yes
“4000000” foo $34.63 16	
  Oz No Yes Yes
“5000000” bar $22.54 16	
  Oz Yes Yes Yes
“7000000” new2 $6.4 16	
  Oz Yes Yes Yes
“8000000” coke $9.99 16	
  Oz Yes Yes Yes
“9000000” new1 $2.5 16	
  Oz Yes Yes Yes
HBase	
  Architecture	
  |	
  Column	
  family
52©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Column	
  family
• Set	
  of	
  columns	
  that	
  have	
  similar	
  access	
  patterns
• Data	
  stored	
  in	
  separate	
  files
• Tune	
  performance
• In-­‐memory
• Compression
• Version	
  retention	
  policies
• Cache	
  priority
• Needs	
  to	
  be	
  specified	
  by	
  the	
  user
53©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
info:
name
info:
price
info:
weight
“1000000” snickers $9.99 4	
  Oz
“2000000” new3 $6.4 16	
  Oz
“3000000” almonds $9.99 8	
  Oz
available:	
  
store1
available:	
  
store2
available:	
  
store3
“1000000” Yes Yes Yes
“2000000” Yes Yes Yes
“3000000” Yes No Yes
Region
Column
Family
Column
Family
54©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Put
1. Client	
  creates	
  a	
  row	
  to	
  put.
2. Client	
  checks	
  which	
  RegionServer	
  hosts	
  this	
  row
3. Row	
  is	
  written	
  into	
  write-­‐ahead	
  log	
  (WAL)	
  on	
  disk	
  for	
  persistence
4. Row	
  is	
  written	
  to	
  MemStore	
  (in	
  memory	
  storage)
55©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Put
RegionServer	
  2
Put
Region
Column	
  Family Column	
  Family
MemStore (RAM) MemStore (RAM)
WAL	
  (disk)
56©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  HFile
Region
RegionServer	
  2
Put
Column	
  Family Column	
  Family
FlushFlush
MemStore (RAM) MemStore (RAM)
WAL	
  (disk)
On	
  Disk On	
  Disk
HFiles
57©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  HFile
• As	
  more	
  rows	
  are	
  added	
  to	
  the	
  table
• Write	
  to	
  WAL
• Write	
  to	
  MemStore
• When	
  MemStore	
  gets	
  full,	
  its	
  contents	
  are	
  flushed	
  to	
  HDFS	
  as	
  Hfiles
58©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  HFile
Region
RegionServer	
  2
Column	
  Family Column	
  Family
MemStore (RAM) MemStore (RAM)
On	
  Disk On	
  Disk
• Side	
  effect	
  of	
  more	
  HFiles in	
  
a	
  Region
• More	
  disk	
  seeks	
  on	
  read
• More	
  disk	
  usage	
  
(overwritten	
  value,	
  
deletes,	
  etc)
59©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Compactions
Merging	
  multiple	
  Hfiles into	
  a	
  single	
  Hfile.
• Minor	
  compactions
• Major	
  compactions
60©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Region
RegionServer	
  2
Column	
  Family Column	
  Family
On	
  Disk On	
  Disk
MemStore (RAM) MemStore (RAM)
61©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Region
RegionServer	
  2
Column	
  Family Column	
  Family
On	
  Disk On	
  Disk
MemStore (RAM) MemStore (RAM)
62©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture
Region
RegionServer	
  2
Column	
  Family Column	
  Family
On	
  Disk On	
  Disk
MemStore (RAM) MemStore (RAM)
63©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Compactions
• Minor	
  compactions
• Can	
  be	
  finely	
  tuned	
  using	
  multiple	
  configurations
• Controlled	
  by	
  policy	
  (pluggable)	
  
• Major	
  compactions
• Periodic	
  (set	
  from	
  configuration)	
  or	
  manually	
  triggered.
• Tend	
  to	
  be	
  run	
  during	
  off-­‐peak	
  times.
64©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Splits
• Eventually,	
  Regions	
  become	
  imbalanced.
• HBase	
  can	
  split	
  a	
  Region	
  into	
  two.
• Daughter	
  Regions	
  are	
  can	
  be	
  moved	
  to	
  a	
  different	
  RegionServer,	
  if	
  necessary.
Gives	
  HBase	
  horizontal	
  scalability!
65©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Splits
Region
RegionServer	
  2
10
20
30
40
50
60
70
80
90
100
• Region	
  has	
  rows	
  [0,	
  100)
• Splitting
• Middle	
  of	
  current	
  Region’s	
  
row	
  range
• All	
  column	
  families	
  are	
  split
66©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Splits
Region
RegionServer	
  2
10
20
30
40
50
• Splits
• [0,	
  50)
• [50,	
  100)
60
70
80
90
100
67©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Bulk	
  Import
• This	
  write	
  path	
  is	
  durable,	
  but	
  if	
  you’re	
  importing	
  a	
  lot	
  of	
  data
• Every	
  put	
  goes	
  into	
  WAL,	
  which	
  means	
  disk	
  seeks.	
  Lots	
  of	
  puts	
  mean	
  lots	
  of	
  disk	
  
seeks.
• Lots	
  of	
  data	
  into	
  MemStores means	
  lots	
  of	
  flushing	
  to	
  disk.
• Lots	
  of	
  flushing	
  to	
  disk	
  might	
  mean	
  lots	
  of	
  compactions.
So	
  we	
  have	
  other	
  better	
  options	
  like….
68©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  Bulk	
  Loading
• Bypass	
  conventional	
  write	
  path
• Extract	
  data	
  from	
  source
• Transform	
  data	
  into	
  HFiles (done	
  with	
  MapReduce job)	
  directly
• Tell	
  RegionServers	
  to	
  serve	
  these	
  HFiles
69©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
HBase	
  Architecture	
  |	
  APIs
• Java	
  API
• Most	
  full-­‐featured.
• REST	
  API
• Easily	
  accessible.
• Thrift	
  API
• Support	
  for	
  many	
  languages	
  (e.g.	
  C,	
  C++,	
  Perl,	
  Ruby,	
  Python).
70©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Enough	
  of	
  
Architecture
71©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
That’s	
  all	
  folks!
72©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.10/17/14  Strata+Hadoop   world  2014.     George   and  Hsieh
Try	
  Hadoop	
  Now
cloudera.com/live
73©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.10/17/14  Strata+Hadoop   world  2014.     George   and  Hsieh
Join	
  the	
  Discussion
Get  community  
help  or  provide  
feedback
cloudera.com/community
74©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Sources
• A	
  Survey	
  of	
  HBase	
  Application	
  Archetypes
• Lars	
  George,	
  Jon	
  Hsieh
• http://www.slideshare.net/HBaseCon/case-­‐studies-­‐session-­‐7
• Hadoop	
   and	
  HBase:	
  Motivations,	
   Use	
  cases	
  and	
  Trade-­‐offs
• Jon	
  Hsieh’s	
   slides
75©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.
Questions	
  ?

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 

Was ist angesagt? (20)

Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Introducing Kudu
Introducing KuduIntroducing Kudu
Introducing Kudu
 

Andere mochten auch

Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 

Andere mochten auch (6)

Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

Ähnlich wie Introduction to HBase

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxManish Maheshwari
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Mladen Kovacevic
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoopjdcryans
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...Dataconomy Media
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Timothy Spann
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Dataconomy Media
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 

Ähnlich wie Introduction to HBase (20)

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 

Kürzlich hochgeladen

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 

Kürzlich hochgeladen (20)

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Introduction to HBase

  • 1. 1©  Cloudera,  Inc.  All  rights  reserved. Apache  HBase: Overview,  Hands-­‐On,  and  Use   Cases Apekshit Sharma Esteban  Gutierrez
  • 2. 2©  Cloudera,  Inc.  All  rights  reserved. Apekshit Sharma • Distributed  Software  Engineer,   Cloudera • Software  Engineer,  Google • Apache  HBase contributor • Performance  improvements  and   configuration  framework Esteban  Gutierrez • Software  Engineer,  Cloudera • Apache  HBase  committer • +15  years  of  experience  on  Internet   scale  platforms. Who  we  are
  • 3. 3©  Cloudera,  Inc.  All  rights  reserved. Contents • Motivation • Introduction  to  Apache  HBase • Data  model • Hands-­‐On:  Installation,  HBase  shell • A  more  in-­‐depth  introduction  to  Apache  HBase • Apache  Hadoop  HDFS • HBase  Architecture
  • 4. 4©  Cloudera,  Inc.  All  rights  reserved. Motivation http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
  • 5. 5©  Cloudera,  Inc.  All  rights  reserved. Motivation • Indexing  the  internet  has  challenges: • Scale • Volume • Rate • Diversity  of  content • URLs • High-­‐resolution  images • Video
  • 6. 6©  Cloudera,  Inc.  All  rights  reserved. Motivation
  • 7. 7©  Cloudera,  Inc.  All  rights  reserved. • What  if  you’re  not  trying  to  index  the   internet? Motivation
  • 8. 8©  Cloudera,  Inc.  All  rights  reserved. • Data  for  analytical  processing • User-­‐facing  real-­‐time  platforms Motivation
  • 9. 9©  Cloudera,  Inc.  All  rights  reserved. Apache  HBase • Open  source • Random  access,  Low  latency  and   Consistent  data  store • Horizontally scalable • Built  on  top  of  Apache  Hadoop
  • 10. 10©  Cloudera,  Inc.  All  rights  reserved. Apache  HBase  is  open  source • Apache  2.0  License • A  community  project  with  committers  and  contributors  from  diverse   organizations • Cloudera,  Facebook,  Salesforce.com,  Huawei,  TrendMicro,  eBay,  HortonWorks,   Intel,  Twitter,  … • Code  license  means  anyone  can  modify  and  use  the  code.
  • 11. 11©  Cloudera,  Inc.  All  rights  reserved. •CAP  theorem •Consistency •Availability •Partition  tolerance Apache  HBase  is  consistent HBase
  • 12. 12©  Cloudera,  Inc.  All  rights  reserved. • Adding  more  servers  linearly increases   performance  and  capacity • Storage  capacity   • Input/output   operations • Store  and  access  data  on  commodity   servers • Largest  cluster:  >  3000  nodes,  >  100  PB • Average  cluster:  10-­‐40  nodes,  100-­‐400TB Apache  HBase  is  horizontally  scalable 0 100 200 300 400 500 600 Performance  (IOPs/Storage/Throughput) #  of  servers
  • 13. 13©  Cloudera,  Inc.  All  rights  reserved. • Commodity  servers  (circa 2015) • 12-­‐24  2-­‐4TB  hard disks • 2  (quad/hexa/octa)-­‐core  CPUs,  2-­‐3 GHz • 64  -­‐ 512 GBs  of  RAM • 10  Gbps ethernet • $5,000  -­‐ $10,000  /  machine Apache  HBase  is  horizontally  scalable
  • 14. 14©  Cloudera,  Inc.  All  rights  reserved. Data  model Row  key info:name info:age comp:base comp:stocks 121 ‘tom’ ‘28’ ‘125k’ 145 ‘bob’ ‘32’ ‘110k’ ‘50’  (ts=2012) ‘100’   (ts=2014) Columns <Column   family>:<column   qualifier> Cells Row  keys
  • 15. 15©  Cloudera,  Inc.  All  rights  reserved. Data  model • Data  is  stored…  in  a  big  table • Sorted  rows with  primary  row  keys.  Supports  billions  of  rows. • Column  :  <column  family>:<column  qualifier>;  Supports  millions  of  columns • Cell  :  intersection  of  row  and  column. • Can  be  empty.  No  storage/processing  overheads • Can  have  different  time-­‐stamped  values
  • 16. 16©  Cloudera,  Inc.  All  rights  reserved. Hands  On   Apache  HBase installation   Play  with  HBase  shell http://pastebin.com/60EUN75f  
  • 17. 17©  Cloudera,  Inc.  All  rights  reserved. Understanding  basic  architecture  of Hadoop  (HDFS)
  • 18. 18©  Cloudera,  Inc.  All  rights  reserved. Apache  Hadoop Open  source Commodity servers Horizontally scalable Highly fault-­‐tolerant Massive processing  power  
  • 19. 19©  Cloudera,  Inc.  All  rights  reserved. Apache  Hadoop MapReduce  +   YARN 2  Core  Components HDFS (Hadoop  Distributed  File   System)
  • 20. 20©  Cloudera,  Inc.  All  rights  reserved. History 2003
  • 21. 21©  Cloudera,  Inc.  All  rights  reserved. • Distributed file  system • Commodity servers • horizontally scalable • Highly fault-­‐tolerant • Proprietary GFS
  • 22. 22©  Cloudera,  Inc.  All  rights  reserved. • Distributed file  system • Commodity servers • Horizontally scalable • Highly fault-­‐tolerant • Open  source HDFS
  • 23. 23©  Cloudera,  Inc.  All  rights  reserved. HDFS  API • File • Open,  Close,  Read,  Write,  Move,  etc • Directories • Create,  Delete,  etc • Permissions • Owners,  Groups,  rwx permissions
  • 24. 24©  Cloudera,  Inc.  All  rights  reserved. Basic  Architecture  of  HDFS
  • 25. 25©  Cloudera,  Inc.  All  rights  reserved. File B1 B2 B3 File  system  will  split the  file  into  blocks DiskB1 B2 B3 Local  file  system
  • 26. 26©  Cloudera,  Inc.  All  rights  reserved. DataNode  1 DataNode  2 DataNode  3 DataNode  4 HDFS File  distributed   across  machines B1 B2 B3
  • 27. 27©  Cloudera,  Inc.  All  rights  reserved. B1 DataNode  1 B2 DataNode  2 B3 DataNode  3 DataNode  4 HDFS    DataNode
  • 28. 28©  Cloudera,  Inc.  All  rights  reserved. DataNode  1 DataNode  2 DataNode  3 DataNode  4 HDFS    NameNode NameNode B1 B2 B3
  • 29. 29©  Cloudera,  Inc.  All  rights  reserved. DataNode  1 DataNode  2 DataNode  3 DataNode  4 HDFS    Reading  a  file NameNode B1 B2 B3 Client 1.  File  ‘foo’ 2.  Verify   client  has  permissions to  read  the  file 3.  List  of  foo’s bocks   and  datanodes
  • 30. 30©  Cloudera,  Inc.  All  rights  reserved. DataNode  1 DataNode  2 DataNode  3 DataNode  4 HDFS    Fault  tolerance NameNode B1 B2 B3
  • 31. 31©  Cloudera,  Inc.  All  rights  reserved. DataNode  1 DataNode  2 DataNode  3 DataNode  4 HDFS    Redundancy NameNode B1 B2 B3B1 B1B2 B2 B3B3
  • 32. 32©  Cloudera,  Inc.  All  rights  reserved. DataNode  1 DataNode  2 DataNode  3 DataNode  4 HDFS    Horizontal  Scalability NameNode B1 B2 B3B1 B1B2 B2 B3B3 DataNode  5
  • 33. 33©  Cloudera,  Inc.  All  rights  reserved. Let’s  look  at  some  existing  HDFS  systems...
  • 34. 34©  Cloudera,  Inc.  All  rights  reserved. • Yahoo!  HDFS  Clusters 40k+  servers,  100k+  CPUs,  450PB  data • Spam filtering,  web  indexing,  personaliztion,  etc. • Facebook  HDFS  Cluster 15TB  new  data  per  day 1200+  machines,  30PB  in  one  cluster • Datawarehouse,  Messages,  etc.
  • 35. 35©  Cloudera,  Inc.  All  rights  reserved. • 1-­‐10PB HDFS  Clusters Production  critical  workloads  for  websites,  retail,  finance,  telcom,   government,  research  … • Sub  PB  HDFS  Clusters PoCs,  R&D,  production  in  early  stages  …
  • 36. 36©  Cloudera,  Inc.  All  rights  reserved. But….  there  are  restrictions! It’s  not  a  magic  wand!
  • 37. 37©  Cloudera,  Inc.  All  rights  reserved. Files  are append  only • Access  Model  :  Write-­‐once-­‐read-­‐many   • Can  not  change  existing  contents
  • 38. 38©  Cloudera,  Inc.  All  rights  reserved. Not  designed  for  small  files • HDFS  block  sizes  are  in  MB  (default  128MB) • Designed  for  typical  GBs  /  TBs  of  file  sizes • Normal  files  system  have  4kb  block  size! • Name  node  becomes  bottleneck  because  metadata  grows  too  big
  • 39. 39©  Cloudera,  Inc.  All  rights  reserved. Summary HDFS  is  a  great  distributed  file  system! • Store  massive  data • Scalable • High  throughput • Fault  tolerance
  • 40. 40©  Cloudera,  Inc.  All  rights  reserved. MapReduce • Distributed    processing   framework • Commodity  machines • Fault  tolerance
  • 41. 41©  Cloudera,  Inc.  All  rights  reserved. MapReduce • HBase  uses  and  extends   MR  APIs  for  data  access Further  reading:  http://hadoop.apache.org
  • 42. 42©  Cloudera,  Inc.  All  rights  reserved.
  • 43. 43©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Unique   id Name price weight store1 store2 store3 “1000000” snickers $9.99 4  Oz Yes Yes Yes “3000000” almonds $9.99 8  Oz Yes No Yes “8000000” coke $9.99 16  Oz Yes Yes Yes
  • 44. 44©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Unique   id Name price weight store1 store2 store3 “1000000” snickers $9.99 4  Oz Yes Yes Yes “3000000” almonds $9.99 8  Oz Yes No Yes “8000000” coke $9.99 16  Oz Yes Yes Yes “4000000” new $34.63 16  Oz No Yes Yes
  • 45. 45©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Unique   id Name price weight store1 store2 store3 “1000000” snickers $9.99 4  Oz Yes Yes Yes “3000000” almonds $9.99 8  Oz Yes No Yes “8000000” coke $9.99 16  Oz Yes Yes Yes “4000000” foo $34.63 16  Oz No Yes Yes “5000000” bar $22.54 16  Oz Yes Yes Yes
  • 46. 46©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Unique   id Name price weight store1 store2 store3 “1000000” snickers $9.99 4  Oz Yes Yes Yes “3000000” almonds $9.99 8  Oz Yes No Yes “8000000” coke $9.99 16  Oz Yes Yes Yes “4000000” foo $34.63 16  Oz No Yes Yes “5000000” bar $22.54 16  Oz Yes Yes Yes “9000000” new1 $2.5 16  Oz Yes Yes Yes “7000000” new2 $6.4 16  Oz Yes Yes Yes “2000000” new3 $6.4 16  Oz Yes Yes Yes
  • 47. 47©  Cloudera,  Inc.  All  rights  reserved. Row Key Name brand price weight store1 store2 store3 “1000000” snickers xxx $9.99 4  Oz Yes Yes Yes “2000000” new3 xxx $6.4 16  Oz Yes Yes Yes “3000000” almonds xxx $9.99 8  Oz Yes No Yes “4000000” foo xxx $34.63 16  Oz No Yes Yes Row  Key Name brand price weight store1 store2 store3 “5000000” bar xxx $22.54 16  Oz Yes Yes Yes “7000000” new2 xxx $6.4 16  Oz Yes Yes Yes “8000000” coke xxx $9.99 16  Oz Yes Yes Yes “9000000” new1 xxx $2.5 16  Oz Yes Yes Yes [  “”,  “5000000”)   [  “5000000”,   “”)   HBase  Architecture  |  Regions
  • 48. 48©  Cloudera,  Inc.  All  rights  reserved. Row Key Name price weight …. “1000000” snickers $9.99 4  Oz …. “2000000” new3 $6.4 16  Oz …. “3000000” almonds $9.99 8  Oz …. “4000000” foo $34.63 16  Oz …. Row Key Name price weight …. “5000000” bar $22.54 16  Oz …. “7000000” new2 $6.4 16  Oz …. “8000000” coke $9.99 16  Oz …. “9000000” new1 $2.5 16  Oz …. HBase  Architecture  |  RegionServer Server  12 Server  7
  • 49. 49©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Regions • Horizontal  split  of  tables • Regions  are  served  by  RegionServers • Automatically  (based  on  configurations).  Can  be  done  manually  too. Relationship: • A  Region  is  only  served  by  a  single  RegionServer  at  a  time. • A  RegionServer  can  serve  multiple  Regions. Region   Server  X Region   [""  ,  10) Region   [40,  50) Region   Server  Y Region   [10,  30) Region   [30,  40) Region   [50  "")
  • 50. 50©  Cloudera,  Inc.  All  rights  reserved. Row Key Name price weight store1 store2 store3 “1000000” snickers $9.99 4  Oz Yes Yes Yes “2000000” new3 $6.4 16  Oz Yes Yes Yes “3000000” almonds $9.99 8  Oz Yes No Yes “4000000” foo $34.63 16  Oz No Yes Yes “5000000” bar $22.54 16  Oz Yes Yes Yes “7000000” new2 $6.4 16  Oz Yes Yes Yes “8000000” coke $9.99 16  Oz Yes Yes Yes “9000000” new1 $2.5 16  Oz Yes Yes Yes HBase  Architecture What  if  you  have  many  more  columns?
  • 51. 51©  Cloudera,  Inc.  All  rights  reserved. Row Key info: name info: price info: weight availability: store1 availability: store2 availability: store3 “1000000” snickers $9.99 4  Oz Yes Yes Yes “2000000” new3 $6.4 16  Oz Yes Yes Yes “3000000” almonds $9.99 8  Oz Yes No Yes “4000000” foo $34.63 16  Oz No Yes Yes “5000000” bar $22.54 16  Oz Yes Yes Yes “7000000” new2 $6.4 16  Oz Yes Yes Yes “8000000” coke $9.99 16  Oz Yes Yes Yes “9000000” new1 $2.5 16  Oz Yes Yes Yes HBase  Architecture  |  Column  family
  • 52. 52©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Column  family • Set  of  columns  that  have  similar  access  patterns • Data  stored  in  separate  files • Tune  performance • In-­‐memory • Compression • Version  retention  policies • Cache  priority • Needs  to  be  specified  by  the  user
  • 53. 53©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture info: name info: price info: weight “1000000” snickers $9.99 4  Oz “2000000” new3 $6.4 16  Oz “3000000” almonds $9.99 8  Oz available:   store1 available:   store2 available:   store3 “1000000” Yes Yes Yes “2000000” Yes Yes Yes “3000000” Yes No Yes Region Column Family Column Family
  • 54. 54©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Put 1. Client  creates  a  row  to  put. 2. Client  checks  which  RegionServer  hosts  this  row 3. Row  is  written  into  write-­‐ahead  log  (WAL)  on  disk  for  persistence 4. Row  is  written  to  MemStore  (in  memory  storage)
  • 55. 55©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Put RegionServer  2 Put Region Column  Family Column  Family MemStore (RAM) MemStore (RAM) WAL  (disk)
  • 56. 56©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  HFile Region RegionServer  2 Put Column  Family Column  Family FlushFlush MemStore (RAM) MemStore (RAM) WAL  (disk) On  Disk On  Disk HFiles
  • 57. 57©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  HFile • As  more  rows  are  added  to  the  table • Write  to  WAL • Write  to  MemStore • When  MemStore  gets  full,  its  contents  are  flushed  to  HDFS  as  Hfiles
  • 58. 58©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  HFile Region RegionServer  2 Column  Family Column  Family MemStore (RAM) MemStore (RAM) On  Disk On  Disk • Side  effect  of  more  HFiles in   a  Region • More  disk  seeks  on  read • More  disk  usage   (overwritten  value,   deletes,  etc)
  • 59. 59©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Compactions Merging  multiple  Hfiles into  a  single  Hfile. • Minor  compactions • Major  compactions
  • 60. 60©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Region RegionServer  2 Column  Family Column  Family On  Disk On  Disk MemStore (RAM) MemStore (RAM)
  • 61. 61©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Region RegionServer  2 Column  Family Column  Family On  Disk On  Disk MemStore (RAM) MemStore (RAM)
  • 62. 62©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture Region RegionServer  2 Column  Family Column  Family On  Disk On  Disk MemStore (RAM) MemStore (RAM)
  • 63. 63©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Compactions • Minor  compactions • Can  be  finely  tuned  using  multiple  configurations • Controlled  by  policy  (pluggable)   • Major  compactions • Periodic  (set  from  configuration)  or  manually  triggered. • Tend  to  be  run  during  off-­‐peak  times.
  • 64. 64©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Splits • Eventually,  Regions  become  imbalanced. • HBase  can  split  a  Region  into  two. • Daughter  Regions  are  can  be  moved  to  a  different  RegionServer,  if  necessary. Gives  HBase  horizontal  scalability!
  • 65. 65©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Splits Region RegionServer  2 10 20 30 40 50 60 70 80 90 100 • Region  has  rows  [0,  100) • Splitting • Middle  of  current  Region’s   row  range • All  column  families  are  split
  • 66. 66©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Splits Region RegionServer  2 10 20 30 40 50 • Splits • [0,  50) • [50,  100) 60 70 80 90 100
  • 67. 67©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Bulk  Import • This  write  path  is  durable,  but  if  you’re  importing  a  lot  of  data • Every  put  goes  into  WAL,  which  means  disk  seeks.  Lots  of  puts  mean  lots  of  disk   seeks. • Lots  of  data  into  MemStores means  lots  of  flushing  to  disk. • Lots  of  flushing  to  disk  might  mean  lots  of  compactions. So  we  have  other  better  options  like….
  • 68. 68©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  Bulk  Loading • Bypass  conventional  write  path • Extract  data  from  source • Transform  data  into  HFiles (done  with  MapReduce job)  directly • Tell  RegionServers  to  serve  these  HFiles
  • 69. 69©  Cloudera,  Inc.  All  rights  reserved. HBase  Architecture  |  APIs • Java  API • Most  full-­‐featured. • REST  API • Easily  accessible. • Thrift  API • Support  for  many  languages  (e.g.  C,  C++,  Perl,  Ruby,  Python).
  • 70. 70©  Cloudera,  Inc.  All  rights  reserved. Enough  of   Architecture
  • 71. 71©  Cloudera,  Inc.  All  rights  reserved. That’s  all  folks!
  • 72. 72©  Cloudera,  Inc.  All  rights  reserved.10/17/14  Strata+Hadoop   world  2014.    George   and  Hsieh Try  Hadoop  Now cloudera.com/live
  • 73. 73©  Cloudera,  Inc.  All  rights  reserved.10/17/14  Strata+Hadoop   world  2014.    George   and  Hsieh Join  the  Discussion Get  community   help  or  provide   feedback cloudera.com/community
  • 74. 74©  Cloudera,  Inc.  All  rights  reserved. Sources • A  Survey  of  HBase  Application  Archetypes • Lars  George,  Jon  Hsieh • http://www.slideshare.net/HBaseCon/case-­‐studies-­‐session-­‐7 • Hadoop   and  HBase:  Motivations,   Use  cases  and  Trade-­‐offs • Jon  Hsieh’s   slides
  • 75. 75©  Cloudera,  Inc.  All  rights  reserved. Questions  ?