SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Headline Goes Here
Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY
PRIOR TO 10/23/12
Apache HBase Application
Archetypes
Lars George | @larsgeorge | Cloudera EMEA Chief Architect | HBase PMC
Jonathan Hsieh | @jmhsieh | Cloudera HBase Tech lead | HBase PMC
HBaseCon 2014
May 5th , 2014
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
1
About Lars and Jon
Lars George
• EMEA Chief Architect
@Cloudera
• Apache HBase PMC
• O’Reilly Author of HBase – The
Definitive Guide
• Contact
• lars@cloudera.com
• @larsgeorge
Jon Hsieh
• Tech Lead HBase Team
@Cloudera
• Apache HBase PMC
• Apache Flume founder
• Contact:
• jon@cloudera.com
• @jmhsieh
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
2
About Supporting HBase at Cloudera
• Supporting Customers using HBase since 2011
• HBase Training
• Professional Services
• Team has experience supporting and running HBase since 2009
• 8 committers on staff
• 2 HBase book authors
• As of Jan 2014, ~20,000 HBase nodes (in aggregate) under
management
• Information in this presentation is either aggregated customer data
or from public sources.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
3
An Apache HBase Timeline
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
20142008 2009 2010 2011 20132012
Apr’11: CDH3 GA
with HBase 0.90.1
May ‘12:
HBaseCon 2012
Jun ‘13:
HBaseCon 2013
Summer‘11:
Messages
on HBaseSummer ‘09
StumbleUpon
goes production on
HBase ~0.20
Nov ‘11:
Cassini
on HBase
Jan ‘13
Phoenix
on HBase
Summer‘11:
Web Crawl
Cache
4
Sept’11:
HBase TDG
published
Nov’12:
HBase in
Action
published
2015
May ‘14:
HBaseCon 2014
Aug ‘13
Flurry 1k-1k node
cluster replication
Summer ‘14
HBase v1.0.0
released
Jan’14: Cloudera
has ~20k Hbase
nodes under
management
Apache HBase “Nascar” Slide
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
5
Outline
• Definitions
• Archetypes
• The Good
• The Bad
• The Maybe
• Conclusion
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
6
A vocabulary for HBase Archetypes
Definitions
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
7
Defining HBase Archetypes
• There are a lot of HBase applications
• Some successful, some less so
• They have common architecture patterns
• They have common tradeoffs
• Archetypes are common architecture patterns
• Common across multiple use-cases
• Extracted to be repeatable
• Our Goal: Define patterns à la “Gang of Four” (Gamma, Helm,
Johnson, Vlissides)
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
8
So you want to use HBase?
• What data is being stored?
• Entity data
• Event data
• Why is the data being stored?
• Operational use cases
• Analytical use cases
• How does the data get in and out?
• Real time vs. Batch
• Random vs. Sequential
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
9
What is being stored?
There are primarly two kinds of big data workloads. They have
different storage requirements.
Entities Events
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
10
Entity Centric Data
• Entity data is information about current state
• Generally real time reads and writes
• Examples:
• Accounts
• Users
• Geolocation points
• Click Counts and Metrics
• Current Sensors Reading
• Scales up with # of Humans and # of Machines/Sensors
• Billions of distinct entities
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
11
Event Centric Data
• Event centric data are time-series data points recording successive points
spaced over time intervals.
• Generally real time write, some combination of real time read or batch read
• Examples:
• Sensor data over time
• Historical Stock Ticker data
• Historical Metrics
• Clicks time-series
• Scales up due to finer grained intervals, retention policies, and the passage
of time
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
12
Events about Entities
• Majority Big Data use cases are dealing with event-based data
• |Entities| * |Events| = Big data
• When you ask questions, do you hone in on entity first?
• When you ask questions, do you hone in on time ranges first?
• Your answer will help you determine where and how to store
your data.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
13
Why are you storing the data?
• So what kind of questions are you asking the data?
• Entity-centric questions
• Give me everything about entity e
• Give me the most recent event v about entity e
• Give me the n most recent events V about entity e
• Give me all events V about e between time [t1,t2]
• Event and Time-centric questions
• Give me an aggregates on each entity between time [t1,t2]
• Give me an aggregate on each time interval for entity e
• Find events V that match some other given criteria
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
14
How does data get in and out of HBase?
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Bulk Import
HBase Client
15
HBase
Replication
HBase
Replication
How does data get in and out of HBase?
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
16
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
What system is most efficient?
• It is all physics
• You have a limited I/O budget
• Use all your I/O by parallelizing access
and read/write sequentially.
• Choose the system and features that
reduces I/O in general
• Pick the systems best for your workload
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
17
IOPs/s/disk
The physics of Hadoop Storage Systems
Workload HBase HDFS
Low latency ms, cached mins, MR
+ seconds, Impala
Random Read primary index - index?, small files problem
Short Scan sorted + partition
Full Scan 0 live table
+ (MR on snapshots)
MR, Hive, Impala
Random Write log structured - Not supported
Sequential Write hbase overhead
bulk load
minimal overhead
Updates log structured - Not supported
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
18
The physics of Hadoop Storage Systems
Workload HBase HDFS
Low latency ms, cached mins, MR
+ seconds, Impala
Random Read primary index - index?, small files problem
Short Scan sorted + partition
Full Scan 0 live table
+ (MR on snapshots)
MR, Hive, Impala
Random Write log structured - Not supported
Sequential Write hbase overhead
bulk load
minimal overhead
Updates log structured - Not supported
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
19
The physics of Hadoop Storage Systems
Workload HBase HDFS
Low latency ms, cached mins, MR
+ seconds, Impala
Random Read primary index - index?, small files problem
Short Scan sorted + partition
Full Scan 0 live table
+ (MR on snapshots)
MR, Hive, Impala
Random Write log structured - not supported
Sequential Write HBase overhead
bulk load
minimal overhead
Updates log structured - not supported
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
20
The Archetypes
HBase Applications
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
21
HBase application use cases
• The Good
• Simple Entities
• Messaging Store
• Graph Store
• Metrics Store
• The Bad
• Large Blobs
• Naïve RDBMS port
• Analytic Archive
• The Maybe
• Time series DB
• Combined workloads
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
22
Archetypes: The Good
HBase, you are my soul mate.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
23
Archetype: Simple Entities
• Purely entity data, no relation between entities
• Batch or real-time, random writes
• Real-time, random reads
• Could be a well-done denormalized RDBMS port.
• Often from many different sources, with poly-structured data
• Schema:
• Row per entity
• Row key => entity ID, or hash of entity ID
• Col qualifier => Property / field, possibly time stamp
• Geolocation data
• Search index building
• Use solr to make text data searchable.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
24
Simple Entities access pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
25
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
HBase
Replication
Solr
Archetype: Messaging Store
• Messaging Data:
• Realtime Random writes: Emails, SMS, MMS, IM
• Realtime random updates: Msg read, starred, moved, deleted
• Reading of top-N entries, sorted by time
• Records are of varying size
• Some time series, but mostly random read/write
• Schema:
• Row = users/feed/inbox
• Row key = UID or UID + time
• Column Qualifier = time or conversation id + time.
• Use CF’s for indexes.
• Examples:
• Facebook Messages, Xiaomi Messages
• Telco SMS/MMS services
• Feeds like tumblr, pinterest
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
26
Facebook Messages - Statistics
Source: HBaseCon 2012 - Anshuman Singh
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
27
Messages Access Pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
28
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Archetype: Graph Data
• Graph Data: All entities and relations
• Batch or realtime, random writes
• Batch or realtime, random reads
• Its an entity with relation edges
• Schema:
• Row = Node.
• Row key => Node ID.
• Col qualifier => Edge ID, or properties:values
• Examples:
• Web Caches – Yahoo!, Trend Micro
• Titan Graph DB with HBase storage backend
• Sessionization (financial transactions, clicks streams, network traffic)
• Government (connect the bad guy)
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
29
Graph Data Access Pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
30
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Archetype: Metrics
• Frequently updated Metrics
• Increments
• Roll ups generated by MR and bulk loaded to HBase
• Poor man’s datacubes
• Examples
• Campaign Impression/Click counts (Ad tech)
• Sensor data (Energy, Manufacturing, Auto)
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
31
Metrics Access Pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
32
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
CONFIDENTIAL - RESTRICTED
Archetypes: The Bad
These are not the droids you are looking for
33
Current HBase weak spots
• HBase’s architecture can handle a lot
• We make engineering trade offs to optimize for them.
• HBase can still do things it is not optimal for.
• However, other systems are fundamentally more efficient for some
workloads.
• We’ve often seen some folks forcing apps into HBase.
• If one of these is your only workloads on this data, use another system
• If you are in a mixed workload case, some of these become “maybes”.
• Just because it is not good today, doesn’t mean it cant be better
tomorrow.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
34
Bad Archetype: Large Blob Store
• Saving large objects >3MB per cell
• Schema:
• Normal entity pattern, but with some columns with large cells.
• Examples
• Raw photo or video storage in HBase
• Large frequently updated structs as a single cell
• Problems:
• Will get crushed due to write amplification when reoptimizing data for read.
(compactions on large unchanging data)
• Will crush write pipeline if there are large structs with frequently updated subfields.
Cells are atomic, and hbase must rewrite an entire cell.
• Some work adding LOB support
• This requires new architecture elements
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
35
Bad Archetype: NaĂŻve RDBMS port
• A naïve port the RDBMS onto HBase, directly copying the schema.
• Schema
• Many tables, just like an RDBMS schema.
• Row key: primary key or auto-incrementing key, like RDBMS schema
• Column qualifiers: field names
• Manually do joins, or secondary indexes (not consistent)
• Solution:
• HBase is not a SQL Database.
• No multi-region/multi-table in HBase transactions (yet).
• Must to denormalize your schema to use Hbase.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
36
Large blob store, NaĂŻve RDBMS port access patterns
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
37
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Bad Archetype: Analytic archive
• Store purely chronological data, partitioned by time
• Real time writes, chronological time as primary index
• Column-centric aggregations over all rows.
• Bulk reads out, generally for generating periodic reports
• Schema
• Row key: date+xxx or salt+date+xxx
• Column qualifiers: properties with data or counters
• Example
• Machine logs organized by date.
• Full fidelity clickstream
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
38
Bad Archetype: Analytic archive Problems
• HBase non-optimal as primary use case.
• Will get crushed by frequent full table scans.
• Will get crushed by large compactions.
• Will get crushed by write-side region hot spotting.
• Instead
• Store in HDFS; Use Parquet columnar data storage + Impala/Hive
• Build rollups in HDFS+MR; store and serve rollups in HBase
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
39
Analytic Archive access patterns
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
40
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
And this is crazy | But here’s my data, | serve it, maybe!
Archetypes: The Maybe
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
41
The Maybe’s
• For some applications, doing it right gets complicated.
• These more sophisticated or nuanced cases require considing
these questions:
• When do you choose HBase vs HDFS storage for time series data?
• Are there times where bad archetypes are ok?
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
42
Time Series: in HBase or HDFS?
• IO Patterns:
• Reads: Collocate related data
• Make reads cheap and fast.
• Writes: Spread writes out as much as possible
• Maximize write throughput
• HBase: Tension between these goals
• Spreading writes spreads data making reads inefficient
• Colocating on write causes hotspots, underutilizes resources by limiting write
throughput
• HDFS: The sweet spot.
• Sequential writes and and sequential read.
• Just write more files in date-dirs; physically spreads writes but logically groups data.
• Reads for time centric quieries just read files in date-dir
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
43
Time Series data flows
• Ingest
• Flume or similar direct tool via app
• HDFS
• Batch queries and generate rollups in Hive/MR
• Faster queries in Impala
• No user time serving
• HBase for recent, HDFS for historical
• HBase
• Serve individual events
• Serve pre-computed aggregates
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
44
Archetype: Entity Time Series
• A time series access pattern suitable for HBase
• Random write to event data, random read specific event or aggregate data
• Generate aggregates via counters, don’t directly compute aggregate on
query
• HBase is system of record
• Schema:
• Rowkey: entity-timestamp or hash(entity)-timestamp, possibly with salt
added after entity.
• Col qualifiers: property
• Use custom aggretation to consolidate old data
• Use TTL’s to bound and age off old data
• Examples:
• OpenTSDB does this well for numeric values; Lazily aggregates cells for
better performance.
• Facebook Insights, ODS
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
45
Entity Time Series access pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
46
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Flume
Custom App
Archetypes: Hybrid Entity Time Series
• Essentially a combo of the Metric Archetype and Entity Time
Series Archetype, with bulk loads of rollups via HDFS.
• Land data in HDFS and HBase
• Keep all data in HDFS for future use
• Aggregate in HDFS and write to HBase
• HBase can do some aggregates too (counters)
• Keep serve-able data in HBase.
• Use TTL to discard old values from Hbase.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
47
Hybrid time series access pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Hive or MR:
Bulk Import
HBase Client
48
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
HDFS
Flume
Meta Archetype: Combined workloads
• In these cases, the use of HBase depends on workload
• Cases where we have multiple workloads styles.
• Many cases we want to do multiple things with the same
data
• primary use case (real time, random access)
• secondary use case (analytical)
• Pick for your primary, here’s some patterns on how to do
your secondary.
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
49
Real time workloads and Analytical access
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
50
poor latency!
full scans
interfere with
latency!
high throughput
MapReduce
HBase Scanner
HBase Client
Put, Incr,
Append
Bulk Import
HBase Client
HBase
Replication
Real time workloads and Analytical access
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
51
HBase
Replication
low latency
Isolated from full scans
high throughput
MapReduce
HBase Scanner
HBase Client
Put, Incr,
Append
Bulk Import
HBase Client
HBase
Replication
high throughput
MR over Table Snapshots (0.98, CDH5.0)
• Previously MapReduce jobs over
HBase required online full table
scan
• Take a snapshot and run MR job
over snapshot files
• Doesn’t use HBase client
• Avoid affecting HBase caches
• 3-5x perf boost.
• Still requires more IOPs than hdfs
raw files
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
map
map
map
map
map
map
map
map
reduce
reduce
reduce
map
map
map
map
map
map
map
map
reduce
reduce
reduce
snapshot
52
Analytic Archive access pattern
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Get, Scan
Bulk Import
HBase Client
53
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Analytic Archive Snapshot access pattern
HDFS
HBase Client
Put, Incr,
Append
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
HBase Client
Snapshot
Scan, MR
HBase Scanner
Bulk Import
HBase Client
54
HBase
Replication
HBase
Replication
low latency
Higher throughput
Table snapshot
Gets
Short scan
Multitenancy (in progress)
• We want to MR for analytics while
serving low-latency requests in one
cluster.
• Performance Isolation
• Limit performance impact load on
one table has on others. (HBASE-
6721)
• Request prioritization and scheduling
• Toda default is FIFO
• Need to schedule some requests
before others (HBASE-10994)
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
55
1 1 2 1 1 3 1
1 1 21 1 31
Delayed by long
scan requests
Rescheduled so
new request get
priority
Mixed workload
Isolated
workload
Conclusions
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
56
Big Data Workloads
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
57
Low
latency
Batch
Random Access Full ScanShort Scan
HDFS + MR
(Hive/pig)
HBase
HBase + Snapshots
-> HDFS + MR
HDFS
+ Impala
HBase + MR
Big Data Workloads
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
58
Low
latency
Batch
Random Access Full ScanShort Scan
HDFS + MR
(Hive/pig)
HBase
HBase + Snapshots
-> HDFS + MR
HDFS
+ Impala
HBase + MR
Current Metrics
Graph data
Simple Entities
Hybrid Entity Time series
+ Rollup serving
Messages
Analytic archive
Hybrid Entity Time series
+ Rollup generation
Index building
Entity Time series
HBase is evolving to be an Operational Database
• Excels at consistent single row centric operations
• Dev efforts aimed at using all machine resources efficiently,
reducing MTTR, and improving latency predictability.
• Projects built on HBase that enable secondary indexing and
multi-row transactions
• Apache Phoenix (incubating) or Impala provide a SQL skin for
simplified application development
• Analytic workloads?
• Can be done but will be beaten by direct HDFS +
MR/Spark/Impala
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
59
Questions?
@larsgeorge
@jmhsieh
5/5/14 HBaseCon 2014; Lars George,
Jon Hsieh
60

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage InternalsDataWorks Summit
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB IntroductionMorgan Tocker
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 

Was ist angesagt? (20)

Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 

Andere mochten auch

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search HBaseCon
 
HBase杂谈
HBase杂谈HBase杂谈
HBase杂谈Joseph Pan
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudHBaseCon
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...HBaseCon
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBaseHBaseCon
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBaseHBaseCon
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
 

Andere mochten auch (20)

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search
 
HBase杂谈
HBase杂谈HBase杂谈
HBase杂谈
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
 

Ähnlich wie A Survey of HBase Application Archetypes

Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application ArchetypesCloudera, Inc.
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBaseSematext Group, Inc.
 
HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General SessionCloudera, Inc.
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...Chris Huang
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopSSandip Patil
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...WebExpo
 
Rebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBaseRebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBaseRobert Roland
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBaseFelipe Ferreira
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 

Ähnlich wie A Survey of HBase Application Archetypes (20)

Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
 
HBase ArcheTypes
HBase ArcheTypesHBase ArcheTypes
HBase ArcheTypes
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
 
HBase/PHOENIX @ Scale
HBase/PHOENIX @ ScaleHBase/PHOENIX @ Scale
HBase/PHOENIX @ Scale
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
 
HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General Session
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
 
Rebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBaseRebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBase
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 

Mehr von HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comHBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at HuaweiHBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 

Mehr von HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

KĂźrzlich hochgeladen

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

KĂźrzlich hochgeladen (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

A Survey of HBase Application Archetypes

  • 1. Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Apache HBase Application Archetypes Lars George | @larsgeorge | Cloudera EMEA Chief Architect | HBase PMC Jonathan Hsieh | @jmhsieh | Cloudera HBase Tech lead | HBase PMC HBaseCon 2014 May 5th , 2014 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 1
  • 2. About Lars and Jon Lars George • EMEA Chief Architect @Cloudera • Apache HBase PMC • O’Reilly Author of HBase – The Definitive Guide • Contact • lars@cloudera.com • @larsgeorge Jon Hsieh • Tech Lead HBase Team @Cloudera • Apache HBase PMC • Apache Flume founder • Contact: • jon@cloudera.com • @jmhsieh 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 2
  • 3. About Supporting HBase at Cloudera • Supporting Customers using HBase since 2011 • HBase Training • Professional Services • Team has experience supporting and running HBase since 2009 • 8 committers on staff • 2 HBase book authors • As of Jan 2014, ~20,000 HBase nodes (in aggregate) under management • Information in this presentation is either aggregated customer data or from public sources. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 3
  • 4. An Apache HBase Timeline 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 20142008 2009 2010 2011 20132012 Apr’11: CDH3 GA with HBase 0.90.1 May ‘12: HBaseCon 2012 Jun ‘13: HBaseCon 2013 Summer‘11: Messages on HBaseSummer ‘09 StumbleUpon goes production on HBase ~0.20 Nov ‘11: Cassini on HBase Jan ‘13 Phoenix on HBase Summer‘11: Web Crawl Cache 4 Sept’11: HBase TDG published Nov’12: HBase in Action published 2015 May ‘14: HBaseCon 2014 Aug ‘13 Flurry 1k-1k node cluster replication Summer ‘14 HBase v1.0.0 released Jan’14: Cloudera has ~20k Hbase nodes under management
  • 5. Apache HBase “Nascar” Slide 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 5
  • 6. Outline • Definitions • Archetypes • The Good • The Bad • The Maybe • Conclusion 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 6
  • 7. A vocabulary for HBase Archetypes Definitions 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 7
  • 8. Defining HBase Archetypes • There are a lot of HBase applications • Some successful, some less so • They have common architecture patterns • They have common tradeoffs • Archetypes are common architecture patterns • Common across multiple use-cases • Extracted to be repeatable • Our Goal: Define patterns Ă  la “Gang of Four” (Gamma, Helm, Johnson, Vlissides) 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 8
  • 9. So you want to use HBase? • What data is being stored? • Entity data • Event data • Why is the data being stored? • Operational use cases • Analytical use cases • How does the data get in and out? • Real time vs. Batch • Random vs. Sequential 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 9
  • 10. What is being stored? There are primarly two kinds of big data workloads. They have different storage requirements. Entities Events 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 10
  • 11. Entity Centric Data • Entity data is information about current state • Generally real time reads and writes • Examples: • Accounts • Users • Geolocation points • Click Counts and Metrics • Current Sensors Reading • Scales up with # of Humans and # of Machines/Sensors • Billions of distinct entities 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 11
  • 12. Event Centric Data • Event centric data are time-series data points recording successive points spaced over time intervals. • Generally real time write, some combination of real time read or batch read • Examples: • Sensor data over time • Historical Stock Ticker data • Historical Metrics • Clicks time-series • Scales up due to finer grained intervals, retention policies, and the passage of time 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 12
  • 13. Events about Entities • Majority Big Data use cases are dealing with event-based data • |Entities| * |Events| = Big data • When you ask questions, do you hone in on entity first? • When you ask questions, do you hone in on time ranges first? • Your answer will help you determine where and how to store your data. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 13
  • 14. Why are you storing the data? • So what kind of questions are you asking the data? • Entity-centric questions • Give me everything about entity e • Give me the most recent event v about entity e • Give me the n most recent events V about entity e • Give me all events V about e between time [t1,t2] • Event and Time-centric questions • Give me an aggregates on each entity between time [t1,t2] • Give me an aggregate on each time interval for entity e • Find events V that match some other given criteria 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 14
  • 15. How does data get in and out of HBase? HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Gets Short scan Full Scan, MapReduce HBase Scanner Bulk Import HBase Client 15 HBase Replication HBase Replication
  • 16. How does data get in and out of HBase? HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 16 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 17. What system is most efficient? • It is all physics • You have a limited I/O budget • Use all your I/O by parallelizing access and read/write sequentially. • Choose the system and features that reduces I/O in general • Pick the systems best for your workload 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 17 IOPs/s/disk
  • 18. The physics of Hadoop Storage Systems Workload HBase HDFS Low latency ms, cached mins, MR + seconds, Impala Random Read primary index - index?, small files problem Short Scan sorted + partition Full Scan 0 live table + (MR on snapshots) MR, Hive, Impala Random Write log structured - Not supported Sequential Write hbase overhead bulk load minimal overhead Updates log structured - Not supported 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 18
  • 19. The physics of Hadoop Storage Systems Workload HBase HDFS Low latency ms, cached mins, MR + seconds, Impala Random Read primary index - index?, small files problem Short Scan sorted + partition Full Scan 0 live table + (MR on snapshots) MR, Hive, Impala Random Write log structured - Not supported Sequential Write hbase overhead bulk load minimal overhead Updates log structured - Not supported 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 19
  • 20. The physics of Hadoop Storage Systems Workload HBase HDFS Low latency ms, cached mins, MR + seconds, Impala Random Read primary index - index?, small files problem Short Scan sorted + partition Full Scan 0 live table + (MR on snapshots) MR, Hive, Impala Random Write log structured - not supported Sequential Write HBase overhead bulk load minimal overhead Updates log structured - not supported 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 20
  • 21. The Archetypes HBase Applications 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 21
  • 22. HBase application use cases • The Good • Simple Entities • Messaging Store • Graph Store • Metrics Store • The Bad • Large Blobs • NaĂŻve RDBMS port • Analytic Archive • The Maybe • Time series DB • Combined workloads 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 22
  • 23. Archetypes: The Good HBase, you are my soul mate. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 23
  • 24. Archetype: Simple Entities • Purely entity data, no relation between entities • Batch or real-time, random writes • Real-time, random reads • Could be a well-done denormalized RDBMS port. • Often from many different sources, with poly-structured data • Schema: • Row per entity • Row key => entity ID, or hash of entity ID • Col qualifier => Property / field, possibly time stamp • Geolocation data • Search index building • Use solr to make text data searchable. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 24
  • 25. Simple Entities access pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 25 HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner HBase Replication Solr
  • 26. Archetype: Messaging Store • Messaging Data: • Realtime Random writes: Emails, SMS, MMS, IM • Realtime random updates: Msg read, starred, moved, deleted • Reading of top-N entries, sorted by time • Records are of varying size • Some time series, but mostly random read/write • Schema: • Row = users/feed/inbox • Row key = UID or UID + time • Column Qualifier = time or conversation id + time. • Use CF’s for indexes. • Examples: • Facebook Messages, Xiaomi Messages • Telco SMS/MMS services • Feeds like tumblr, pinterest 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 26
  • 27. Facebook Messages - Statistics Source: HBaseCon 2012 - Anshuman Singh 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 27
  • 28. Messages Access Pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 28 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 29. Archetype: Graph Data • Graph Data: All entities and relations • Batch or realtime, random writes • Batch or realtime, random reads • Its an entity with relation edges • Schema: • Row = Node. • Row key => Node ID. • Col qualifier => Edge ID, or properties:values • Examples: • Web Caches – Yahoo!, Trend Micro • Titan Graph DB with HBase storage backend • Sessionization (financial transactions, clicks streams, network traffic) • Government (connect the bad guy) 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 29
  • 30. Graph Data Access Pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 30 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 31. Archetype: Metrics • Frequently updated Metrics • Increments • Roll ups generated by MR and bulk loaded to HBase • Poor man’s datacubes • Examples • Campaign Impression/Click counts (Ad tech) • Sensor data (Energy, Manufacturing, Auto) 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 31
  • 32. Metrics Access Pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 32 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 33. CONFIDENTIAL - RESTRICTED Archetypes: The Bad These are not the droids you are looking for 33
  • 34. Current HBase weak spots • HBase’s architecture can handle a lot • We make engineering trade offs to optimize for them. • HBase can still do things it is not optimal for. • However, other systems are fundamentally more efficient for some workloads. • We’ve often seen some folks forcing apps into HBase. • If one of these is your only workloads on this data, use another system • If you are in a mixed workload case, some of these become “maybes”. • Just because it is not good today, doesn’t mean it cant be better tomorrow. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 34
  • 35. Bad Archetype: Large Blob Store • Saving large objects >3MB per cell • Schema: • Normal entity pattern, but with some columns with large cells. • Examples • Raw photo or video storage in HBase • Large frequently updated structs as a single cell • Problems: • Will get crushed due to write amplification when reoptimizing data for read. (compactions on large unchanging data) • Will crush write pipeline if there are large structs with frequently updated subfields. Cells are atomic, and hbase must rewrite an entire cell. • Some work adding LOB support • This requires new architecture elements 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 35
  • 36. Bad Archetype: NaĂŻve RDBMS port • A naĂŻve port the RDBMS onto HBase, directly copying the schema. • Schema • Many tables, just like an RDBMS schema. • Row key: primary key or auto-incrementing key, like RDBMS schema • Column qualifiers: field names • Manually do joins, or secondary indexes (not consistent) • Solution: • HBase is not a SQL Database. • No multi-region/multi-table in HBase transactions (yet). • Must to denormalize your schema to use Hbase. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 36
  • 37. Large blob store, NaĂŻve RDBMS port access patterns HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 37 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 38. Bad Archetype: Analytic archive • Store purely chronological data, partitioned by time • Real time writes, chronological time as primary index • Column-centric aggregations over all rows. • Bulk reads out, generally for generating periodic reports • Schema • Row key: date+xxx or salt+date+xxx • Column qualifiers: properties with data or counters • Example • Machine logs organized by date. • Full fidelity clickstream 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 38
  • 39. Bad Archetype: Analytic archive Problems • HBase non-optimal as primary use case. • Will get crushed by frequent full table scans. • Will get crushed by large compactions. • Will get crushed by write-side region hot spotting. • Instead • Store in HDFS; Use Parquet columnar data storage + Impala/Hive • Build rollups in HDFS+MR; store and serve rollups in HBase 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 39
  • 40. Analytic Archive access patterns HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 40 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 41. And this is crazy | But here’s my data, | serve it, maybe! Archetypes: The Maybe 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 41
  • 42. The Maybe’s • For some applications, doing it right gets complicated. • These more sophisticated or nuanced cases require considing these questions: • When do you choose HBase vs HDFS storage for time series data? • Are there times where bad archetypes are ok? 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 42
  • 43. Time Series: in HBase or HDFS? • IO Patterns: • Reads: Collocate related data • Make reads cheap and fast. • Writes: Spread writes out as much as possible • Maximize write throughput • HBase: Tension between these goals • Spreading writes spreads data making reads inefficient • Colocating on write causes hotspots, underutilizes resources by limiting write throughput • HDFS: The sweet spot. • Sequential writes and and sequential read. • Just write more files in date-dirs; physically spreads writes but logically groups data. • Reads for time centric quieries just read files in date-dir 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 43
  • 44. Time Series data flows • Ingest • Flume or similar direct tool via app • HDFS • Batch queries and generate rollups in Hive/MR • Faster queries in Impala • No user time serving • HBase for recent, HDFS for historical • HBase • Serve individual events • Serve pre-computed aggregates 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 44
  • 45. Archetype: Entity Time Series • A time series access pattern suitable for HBase • Random write to event data, random read specific event or aggregate data • Generate aggregates via counters, don’t directly compute aggregate on query • HBase is system of record • Schema: • Rowkey: entity-timestamp or hash(entity)-timestamp, possibly with salt added after entity. • Col qualifiers: property • Use custom aggretation to consolidate old data • Use TTL’s to bound and age off old data • Examples: • OpenTSDB does this well for numeric values; Lazily aggregates cells for better performance. • Facebook Insights, ODS 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 45
  • 46. Entity Time Series access pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 46 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner Flume Custom App
  • 47. Archetypes: Hybrid Entity Time Series • Essentially a combo of the Metric Archetype and Entity Time Series Archetype, with bulk loads of rollups via HDFS. • Land data in HDFS and HBase • Keep all data in HDFS for future use • Aggregate in HDFS and write to HBase • HBase can do some aggregates too (counters) • Keep serve-able data in HBase. • Use TTL to discard old values from Hbase. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 47
  • 48. Hybrid time series access pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Hive or MR: Bulk Import HBase Client 48 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner HDFS Flume
  • 49. Meta Archetype: Combined workloads • In these cases, the use of HBase depends on workload • Cases where we have multiple workloads styles. • Many cases we want to do multiple things with the same data • primary use case (real time, random access) • secondary use case (analytical) • Pick for your primary, here’s some patterns on how to do your secondary. 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 49
  • 50. Real time workloads and Analytical access 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan 50 poor latency! full scans interfere with latency! high throughput MapReduce HBase Scanner HBase Client Put, Incr, Append Bulk Import HBase Client HBase Replication
  • 51. Real time workloads and Analytical access 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan 51 HBase Replication low latency Isolated from full scans high throughput MapReduce HBase Scanner HBase Client Put, Incr, Append Bulk Import HBase Client HBase Replication high throughput
  • 52. MR over Table Snapshots (0.98, CDH5.0) • Previously MapReduce jobs over HBase required online full table scan • Take a snapshot and run MR job over snapshot files • Doesn’t use HBase client • Avoid affecting HBase caches • 3-5x perf boost. • Still requires more IOPs than hdfs raw files 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh map map map map map map map map reduce reduce reduce map map map map map map map map reduce reduce reduce snapshot 52
  • 53. Analytic Archive access pattern HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Get, Scan Bulk Import HBase Client 53 HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner
  • 54. Analytic Archive Snapshot access pattern HDFS HBase Client Put, Incr, Append 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh HBase Client Snapshot Scan, MR HBase Scanner Bulk Import HBase Client 54 HBase Replication HBase Replication low latency Higher throughput Table snapshot Gets Short scan
  • 55. Multitenancy (in progress) • We want to MR for analytics while serving low-latency requests in one cluster. • Performance Isolation • Limit performance impact load on one table has on others. (HBASE- 6721) • Request prioritization and scheduling • Toda default is FIFO • Need to schedule some requests before others (HBASE-10994) 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 55 1 1 2 1 1 3 1 1 1 21 1 31 Delayed by long scan requests Rescheduled so new request get priority Mixed workload Isolated workload
  • 56. Conclusions 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 56
  • 57. Big Data Workloads 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 57 Low latency Batch Random Access Full ScanShort Scan HDFS + MR (Hive/pig) HBase HBase + Snapshots -> HDFS + MR HDFS + Impala HBase + MR
  • 58. Big Data Workloads 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 58 Low latency Batch Random Access Full ScanShort Scan HDFS + MR (Hive/pig) HBase HBase + Snapshots -> HDFS + MR HDFS + Impala HBase + MR Current Metrics Graph data Simple Entities Hybrid Entity Time series + Rollup serving Messages Analytic archive Hybrid Entity Time series + Rollup generation Index building Entity Time series
  • 59. HBase is evolving to be an Operational Database • Excels at consistent single row centric operations • Dev efforts aimed at using all machine resources efficiently, reducing MTTR, and improving latency predictability. • Projects built on HBase that enable secondary indexing and multi-row transactions • Apache Phoenix (incubating) or Impala provide a SQL skin for simplified application development • Analytic workloads? • Can be done but will be beaten by direct HDFS + MR/Spark/Impala 5/5/14 HBaseCon 2014; Lars George, Jon Hsieh 59