SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
1
The State of HBase Replication
Jean-Daniel Cryans
May 5th, 2014
©2014 Cloudera, Inc. All rights reserved.
About me
2
• Software Engineer at Cloudera, Storage team
• Apache HBase committer since 2008, PMC
member
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
• highly available; and
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
• highly available; and
• almost magic.
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
• highly available; and
• almost magic.
3
©2014 Cloudera, Inc. All rights reserved.
The Current State
• It’s production-ready.
4
©2014 Cloudera, Inc. All rights reserved.
The Current State
• It’s production-ready.
• It’s used to replicate data between thousands
of nodes across continents.
4
©2014 Cloudera, Inc. All rights reserved.
The Current State
• It’s production-ready.
• It’s used to replicate data between thousands
of nodes across continents.
• It’s used for Disaster Recovery, geo-
distributed serving, and more.
4
©2014 Cloudera, Inc. All rights reserved.
5
Agenda
• Four Years of Replication
• Use Cases in Production
• Roadmap
©2014 Cloudera, Inc. All rights reserved.
Design
• Clusters are distinct
• Pull VS push
• Sync VS Async
6
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
• .META. operations aren’t replicated
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
• .META. operations aren’t replicated
• Regions can be different
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
• .META. operations aren’t replicated
• Regions can be different
• Security has to be configured for each cluster
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Push instead of Pull
8
MySQL
Master
MySQL
Slave
Get binlog
Apply locally
MySQL Replication uses Pull
Cluster A Cluster B
©2014 Cloudera, Inc. All rights reserved.
Push instead of Pull
9
RS RSreplicate entries
Apply to cluster
HBase Replication uses Push
Cluster A Cluster B
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
10
Cluster A Cluster B
RS
HLog
MemStore
RS
HLog
MemStore
Synchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
10
Cluster A Cluster B
RS
HLog
MemStore
RS
HLog
MemStore
Put
2
3
1
Synchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
10
Cluster A Cluster B
RS
HLog
MemStore
RS
HLog
MemStore
Put
2
3
1
Ack Ack
Put
5
6
4
78
Synchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
11
Asynchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
11
Asynchronous Replication
Cluster A
RS
HLog
MemStore
Put
Ack
2
3
1
4
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
11
Asynchronous Replication
Cluster A
RS
HLog
MemStore
Put
Ack
2
3
1
4
Cluster B
RS
HLog
MemStore
Ack
Put
3
4
2
5
HLog
Tailing
Thread
1
©2014 Cloudera, Inc. All rights reserved.
First Release - 0.90.0
• Simple master-slave (only one)
• Disabled by default
• Uses ZK as a metadata store
12
©2014 Cloudera, Inc. All rights reserved.
Original Implementation
13
replicateLogEntries()Replication
Source
ZooKeeper
Watcher
Region Server on
Master Cluster
Replication
Sink
HTable
Put
Delete
Region Server on
Slave Cluster
©2014 Cloudera, Inc. All rights reserved.
First Lesson Learned
• HDFS doesn’t support tailing files being
written to. It requires:
• open()
• seek()// go where we stopped last time
• while (not EOF || enoughData)
• read()
• close()
• repeat
14
©2014 Cloudera, Inc. All rights reserved.
Second Lesson Learned
• Single threaded, non-batched ZK is slow
• ZK didn’t have an atomic move operation
• Doubles # ops needed, race conditions
15
©2014 Cloudera, Inc. All rights reserved.
Second Lesson Learned
• Single threaded, non-batched ZK is slow
• ZK didn’t have an atomic move operation
• Doubles # ops needed, race conditions
15
/hbase
/replication
/RS1
/1
/hlog1
/hlog2
...
/hbase
/replication
/RS2
/1-RS1
/hlog1
1. create new hlog2
2. delete old hlog2
©2014 Cloudera, Inc. All rights reserved.
Second Release - 0.92.0
• Cyclic replication
• Multi-slave (scope LOCAL or GLOBAL)
• Enable / disable peer
• Special configurations
16
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
Put Row X
Row X is from 1
Don’t replicate!
©2014 Cloudera, Inc. All rights reserved.
Multi-Slave
18
Cluster
1
Cluster
2
Cluster
3
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Multi-Slave
18
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Multi-Slave
18
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X Put Row X
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
HLog
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Special Configurations
• KEEP_DELETED_CELLS
• Must be used on slaves with replication when
deleting data.
20
©2014 Cloudera, Inc. All rights reserved.
Special Configurations
• KEEP_DELETED_CELLS
• Must be used on slaves with replication when
deleting data.
• MIN_VERSION
• With TTL, makes it easy to configure a slave that
contains only the last few days of data.
20
©2014 Cloudera, Inc. All rights reserved.
Third Lesson Learned
• It’s easy to DDOS yourself.
• Replication was using the normal handlers...
• ... and using them to write back!
21
Handler1: Put
Handler2: Delete
Handler3: Replicate
Handler4: Get
Handler5: Put
Replicated Put goes in the queue
©2014 Cloudera, Inc. All rights reserved.
Fourth Lesson Learned
• Instinctively, what would something called
stop_replication do?
22
©2014 Cloudera, Inc. All rights reserved.
Fourth Lesson Learned
• Instinctively, what would something called
stop_replication do?
• Good intentions, bad outcomes, HBASE-8861
22
start/stop_replication
X
©2014 Cloudera, Inc. All rights reserved.
Third Release - 0.96.0 / 0.98.0
• Replication enabled by default!
• Completely refactored for readability/
extensibility (Chris Trezzo)
• ReplicationSyncUp tool (HBASE-9047)
• Throttling (HBASE-9501)
• Finer grained replication controls
(HBASE-8751)
23
©2014 Cloudera, Inc. All rights reserved.
ReplicationSyncUp Tool
• Works on an offline cluster
• Can finish replicating the queues in ZK
• Useful to finish draining a master cluster
24
HBase
HDFS
ZooKeeper
HBase
HDFS
ZooKeeper
ReplicationSyncUp
©2014 Cloudera, Inc. All rights reserved.
Finer Grained Replication Controls
> set_peer_tableCFs '2', "table1;
table2:cf1,cf2; table3:cfA,cfB"
• Meaning: enable replication to peer #2 for:
• All of table1
• cf1 and cf2 from table2
• cfA and cfB from table3
25
©2014 Cloudera, Inc. All rights reserved.
26
Agenda
• Four Years of Replication
• Use Cases in Production
• Roadmap
©2014 Cloudera, Inc. All rights reserved.
Flurry
• Two data centers, coast to coast
• Three clusters, in master-master pairs
• 1200 nodes
• 800 nodes
• 30 nodes
• Replication traffic: 2Gbps
• Latency between DCs: 85ms
27
©2014 Cloudera, Inc. All rights reserved.
Opower
• Two clusters, same data center
• Master: tens of nodes
• Slave: tens of nodes
• Replication traffic: 1GB/day
• Bulk load replication traffic: 180GB/day
• Recent use case
28
©2014 Cloudera, Inc. All rights reserved.
Lily HBase Indexer
• Collaboration between NGData & Cloudera.
• NGData are the creators of the Lily data
management platform.
• Lily HBase Indexer
• Service which acts as a HBase replication listener.
• Custom sink writes to SolrCloud.
• Integrates Cloudera Morphlines library for ETL of
rows.
29
©2014 Cloudera, Inc. All rights reserved.
30
Agenda
• Four Years of Replication
• Use Cases in Production
• Roadmap
©2014 Cloudera, Inc. All rights reserved.
Stop Relying on Permanent Znodes
• Current rule is to never rely on znodes to
survive cluster restarts, upgrades, etc.
• State data should be kept in an HBase table.
• Notification done through a new mechanism
• See: https://issues.apache.org/jira/browse/
HBASE-10295
31
©2014 Cloudera, Inc. All rights reserved.
Define a Replication Interface
• Replication is somewhat extendable but it
lacks stable interfaces.
• The HBase Indexer is such an extension and it
required surgery every time a committer
sneezed.
• See: https://issues.apache.org/jira/browse/
HBASE-10504
32
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
3.Put’ing the newly incremented value.
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
3.Put’ing the newly incremented value.
• This breaks in Master-Master because the
Puts are overwriting each other.
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
3.Put’ing the newly incremented value.
• This breaks in Master-Master because the
Puts are overwriting each other.
• See https://issues.apache.org/jira/browse/
HBASE-2804
33
©2014 Cloudera, Inc. All rights reserved.
More Tooling
• Replication management console, one shell to
rule all the clusters!
• Replication bootstrapping tool.
• Tool that can move queues between region
servers.
• Tool that can throttle replication on a live
cluster.
34
©2014 Cloudera, Inc. All rights reserved.
Questions?
• Or ping me async:
• @jdcryans
• jdcryans@cloudera.com
• jdcryans on #hbase irc.freenode.net
35

Weitere ähnliche Inhalte

Was ist angesagt?

Nginx A High Performance Load Balancer, Web Server & Reverse Proxy
Nginx A High Performance Load Balancer, Web Server & Reverse ProxyNginx A High Performance Load Balancer, Web Server & Reverse Proxy
Nginx A High Performance Load Balancer, Web Server & Reverse ProxyAmit Aggarwal
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernelguest547d74
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
Linux booting process - Linux System Administration
Linux booting process - Linux System AdministrationLinux booting process - Linux System Administration
Linux booting process - Linux System AdministrationSreenatha Reddy K R
 
Course 102: Lecture 26: FileSystems in Linux (Part 1)
Course 102: Lecture 26: FileSystems in Linux (Part 1) Course 102: Lecture 26: FileSystems in Linux (Part 1)
Course 102: Lecture 26: FileSystems in Linux (Part 1) Ahmed El-Arabawy
 
Intro to Linux Shell Scripting
Intro to Linux Shell ScriptingIntro to Linux Shell Scripting
Intro to Linux Shell Scriptingvceder
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFSDataWorks Summit
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
 
NGINX: Basics and Best Practices EMEA
NGINX: Basics and Best Practices EMEANGINX: Basics and Best Practices EMEA
NGINX: Basics and Best Practices EMEANGINX, Inc.
 
NGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA BroadcastNGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA BroadcastNGINX, Inc.
 

Was ist angesagt? (20)

Linux: LVM
Linux: LVMLinux: LVM
Linux: LVM
 
Nginx A High Performance Load Balancer, Web Server & Reverse Proxy
Nginx A High Performance Load Balancer, Web Server & Reverse ProxyNginx A High Performance Load Balancer, Web Server & Reverse Proxy
Nginx A High Performance Load Balancer, Web Server & Reverse Proxy
 
Nfs
NfsNfs
Nfs
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Linux systems - Linux Commands and Shell Scripting
Linux systems - Linux Commands and Shell ScriptingLinux systems - Linux Commands and Shell Scripting
Linux systems - Linux Commands and Shell Scripting
 
Linux
LinuxLinux
Linux
 
Linux
LinuxLinux
Linux
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Linux booting process - Linux System Administration
Linux booting process - Linux System AdministrationLinux booting process - Linux System Administration
Linux booting process - Linux System Administration
 
Course 102: Lecture 26: FileSystems in Linux (Part 1)
Course 102: Lecture 26: FileSystems in Linux (Part 1) Course 102: Lecture 26: FileSystems in Linux (Part 1)
Course 102: Lecture 26: FileSystems in Linux (Part 1)
 
Intro to Linux Shell Scripting
Intro to Linux Shell ScriptingIntro to Linux Shell Scripting
Intro to Linux Shell Scripting
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 
Linux
LinuxLinux
Linux
 
NGINX: Basics and Best Practices EMEA
NGINX: Basics and Best Practices EMEANGINX: Basics and Best Practices EMEA
NGINX: Basics and Best Practices EMEA
 
NGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA BroadcastNGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA Broadcast
 

Ähnlich wie The State of HBase Replication

Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Cloudera, Inc.
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataCloudera, Inc.
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoopjdcryans
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 

Ähnlich wie The State of HBase Replication (20)

Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
YARN
YARNYARN
YARN
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Mehr von HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comHBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at HuaweiHBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 

Mehr von HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Kürzlich hochgeladen

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 

Kürzlich hochgeladen (20)

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 

The State of HBase Replication

  • 1. 1 The State of HBase Replication Jean-Daniel Cryans May 5th, 2014
  • 2. ©2014 Cloudera, Inc. All rights reserved. About me 2 • Software Engineer at Cloudera, Storage team • Apache HBase committer since 2008, PMC member
  • 3. ©2014 Cloudera, Inc. All rights reserved. Motivation for HBase Replication • Even though HBase is: 3
  • 4. ©2014 Cloudera, Inc. All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; 3
  • 5. ©2014 Cloudera, Inc. All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; 3
  • 6. ©2014 Cloudera, Inc. All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; • highly available; and 3
  • 7. ©2014 Cloudera, Inc. All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; • highly available; and • almost magic. 3
  • 8. ©2014 Cloudera, Inc. All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; • highly available; and • almost magic. 3
  • 9. ©2014 Cloudera, Inc. All rights reserved. The Current State • It’s production-ready. 4
  • 10. ©2014 Cloudera, Inc. All rights reserved. The Current State • It’s production-ready. • It’s used to replicate data between thousands of nodes across continents. 4
  • 11. ©2014 Cloudera, Inc. All rights reserved. The Current State • It’s production-ready. • It’s used to replicate data between thousands of nodes across continents. • It’s used for Disaster Recovery, geo- distributed serving, and more. 4
  • 12. ©2014 Cloudera, Inc. All rights reserved. 5 Agenda • Four Years of Replication • Use Cases in Production • Roadmap
  • 13. ©2014 Cloudera, Inc. All rights reserved. Design • Clusters are distinct • Pull VS push • Sync VS Async 6
  • 14. ©2014 Cloudera, Inc. All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs 7 Master 20 RS Slave 15 RS
  • 15. ©2014 Cloudera, Inc. All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs • .META. operations aren’t replicated 7 Master 20 RS Slave 15 RS
  • 16. ©2014 Cloudera, Inc. All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs • .META. operations aren’t replicated • Regions can be different 7 Master 20 RS Slave 15 RS
  • 17. ©2014 Cloudera, Inc. All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs • .META. operations aren’t replicated • Regions can be different • Security has to be configured for each cluster 7 Master 20 RS Slave 15 RS
  • 18. ©2014 Cloudera, Inc. All rights reserved. Push instead of Pull 8 MySQL Master MySQL Slave Get binlog Apply locally MySQL Replication uses Pull Cluster A Cluster B
  • 19. ©2014 Cloudera, Inc. All rights reserved. Push instead of Pull 9 RS RSreplicate entries Apply to cluster HBase Replication uses Push Cluster A Cluster B
  • 20. ©2014 Cloudera, Inc. All rights reserved. Async instead of Sync 10 Cluster A Cluster B RS HLog MemStore RS HLog MemStore Synchronous Replication
  • 21. ©2014 Cloudera, Inc. All rights reserved. Async instead of Sync 10 Cluster A Cluster B RS HLog MemStore RS HLog MemStore Put 2 3 1 Synchronous Replication
  • 22. ©2014 Cloudera, Inc. All rights reserved. Async instead of Sync 10 Cluster A Cluster B RS HLog MemStore RS HLog MemStore Put 2 3 1 Ack Ack Put 5 6 4 78 Synchronous Replication
  • 23. ©2014 Cloudera, Inc. All rights reserved. Async instead of Sync 11 Asynchronous Replication
  • 24. ©2014 Cloudera, Inc. All rights reserved. Async instead of Sync 11 Asynchronous Replication Cluster A RS HLog MemStore Put Ack 2 3 1 4
  • 25. ©2014 Cloudera, Inc. All rights reserved. Async instead of Sync 11 Asynchronous Replication Cluster A RS HLog MemStore Put Ack 2 3 1 4 Cluster B RS HLog MemStore Ack Put 3 4 2 5 HLog Tailing Thread 1
  • 26. ©2014 Cloudera, Inc. All rights reserved. First Release - 0.90.0 • Simple master-slave (only one) • Disabled by default • Uses ZK as a metadata store 12
  • 27. ©2014 Cloudera, Inc. All rights reserved. Original Implementation 13 replicateLogEntries()Replication Source ZooKeeper Watcher Region Server on Master Cluster Replication Sink HTable Put Delete Region Server on Slave Cluster
  • 28. ©2014 Cloudera, Inc. All rights reserved. First Lesson Learned • HDFS doesn’t support tailing files being written to. It requires: • open() • seek()// go where we stopped last time • while (not EOF || enoughData) • read() • close() • repeat 14
  • 29. ©2014 Cloudera, Inc. All rights reserved. Second Lesson Learned • Single threaded, non-batched ZK is slow • ZK didn’t have an atomic move operation • Doubles # ops needed, race conditions 15
  • 30. ©2014 Cloudera, Inc. All rights reserved. Second Lesson Learned • Single threaded, non-batched ZK is slow • ZK didn’t have an atomic move operation • Doubles # ops needed, race conditions 15 /hbase /replication /RS1 /1 /hlog1 /hlog2 ... /hbase /replication /RS2 /1-RS1 /hlog1 1. create new hlog2 2. delete old hlog2
  • 31. ©2014 Cloudera, Inc. All rights reserved. Second Release - 0.92.0 • Cyclic replication • Multi-slave (scope LOCAL or GLOBAL) • Enable / disable peer • Special configurations 16
  • 32. ©2014 Cloudera, Inc. All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X
  • 33. ©2014 Cloudera, Inc. All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X
  • 34. ©2014 Cloudera, Inc. All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X Put Row X
  • 35. ©2014 Cloudera, Inc. All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X Put Row X Row X is from 1 Don’t replicate!
  • 36. ©2014 Cloudera, Inc. All rights reserved. Multi-Slave 18 Cluster 1 Cluster 2 Cluster 3 Put Row X
  • 37. ©2014 Cloudera, Inc. All rights reserved. Multi-Slave 18 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X
  • 38. ©2014 Cloudera, Inc. All rights reserved. Multi-Slave 18 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X Put Row X
  • 39. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread
  • 40. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread Is the peer enabled?
  • 41. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog Is the peer enabled?
  • 42. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog Is the peer enabled?
  • 43. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog HLog Is the peer enabled?
  • 44. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog HLog HLog Is the peer enabled?
  • 45. ©2014 Cloudera, Inc. All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog HLog HLog HLog Is the peer enabled?
  • 46. ©2014 Cloudera, Inc. All rights reserved. Special Configurations • KEEP_DELETED_CELLS • Must be used on slaves with replication when deleting data. 20
  • 47. ©2014 Cloudera, Inc. All rights reserved. Special Configurations • KEEP_DELETED_CELLS • Must be used on slaves with replication when deleting data. • MIN_VERSION • With TTL, makes it easy to configure a slave that contains only the last few days of data. 20
  • 48. ©2014 Cloudera, Inc. All rights reserved. Third Lesson Learned • It’s easy to DDOS yourself. • Replication was using the normal handlers... • ... and using them to write back! 21 Handler1: Put Handler2: Delete Handler3: Replicate Handler4: Get Handler5: Put Replicated Put goes in the queue
  • 49. ©2014 Cloudera, Inc. All rights reserved. Fourth Lesson Learned • Instinctively, what would something called stop_replication do? 22
  • 50. ©2014 Cloudera, Inc. All rights reserved. Fourth Lesson Learned • Instinctively, what would something called stop_replication do? • Good intentions, bad outcomes, HBASE-8861 22 start/stop_replication X
  • 51. ©2014 Cloudera, Inc. All rights reserved. Third Release - 0.96.0 / 0.98.0 • Replication enabled by default! • Completely refactored for readability/ extensibility (Chris Trezzo) • ReplicationSyncUp tool (HBASE-9047) • Throttling (HBASE-9501) • Finer grained replication controls (HBASE-8751) 23
  • 52. ©2014 Cloudera, Inc. All rights reserved. ReplicationSyncUp Tool • Works on an offline cluster • Can finish replicating the queues in ZK • Useful to finish draining a master cluster 24 HBase HDFS ZooKeeper HBase HDFS ZooKeeper ReplicationSyncUp
  • 53. ©2014 Cloudera, Inc. All rights reserved. Finer Grained Replication Controls > set_peer_tableCFs '2', "table1; table2:cf1,cf2; table3:cfA,cfB" • Meaning: enable replication to peer #2 for: • All of table1 • cf1 and cf2 from table2 • cfA and cfB from table3 25
  • 54. ©2014 Cloudera, Inc. All rights reserved. 26 Agenda • Four Years of Replication • Use Cases in Production • Roadmap
  • 55. ©2014 Cloudera, Inc. All rights reserved. Flurry • Two data centers, coast to coast • Three clusters, in master-master pairs • 1200 nodes • 800 nodes • 30 nodes • Replication traffic: 2Gbps • Latency between DCs: 85ms 27
  • 56. ©2014 Cloudera, Inc. All rights reserved. Opower • Two clusters, same data center • Master: tens of nodes • Slave: tens of nodes • Replication traffic: 1GB/day • Bulk load replication traffic: 180GB/day • Recent use case 28
  • 57. ©2014 Cloudera, Inc. All rights reserved. Lily HBase Indexer • Collaboration between NGData & Cloudera. • NGData are the creators of the Lily data management platform. • Lily HBase Indexer • Service which acts as a HBase replication listener. • Custom sink writes to SolrCloud. • Integrates Cloudera Morphlines library for ETL of rows. 29
  • 58. ©2014 Cloudera, Inc. All rights reserved. 30 Agenda • Four Years of Replication • Use Cases in Production • Roadmap
  • 59. ©2014 Cloudera, Inc. All rights reserved. Stop Relying on Permanent Znodes • Current rule is to never rely on znodes to survive cluster restarts, upgrades, etc. • State data should be kept in an HBase table. • Notification done through a new mechanism • See: https://issues.apache.org/jira/browse/ HBASE-10295 31
  • 60. ©2014 Cloudera, Inc. All rights reserved. Define a Replication Interface • Replication is somewhat extendable but it lacks stable interfaces. • The HBase Indexer is such an extension and it required surgery every time a committer sneezed. • See: https://issues.apache.org/jira/browse/ HBASE-10504 32
  • 61. ©2014 Cloudera, Inc. All rights reserved. Distributed Counters • Incrementing consists of: 33
  • 62. ©2014 Cloudera, Inc. All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 33
  • 63. ©2014 Cloudera, Inc. All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 33
  • 64. ©2014 Cloudera, Inc. All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 3.Put’ing the newly incremented value. 33
  • 65. ©2014 Cloudera, Inc. All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 3.Put’ing the newly incremented value. • This breaks in Master-Master because the Puts are overwriting each other. 33
  • 66. ©2014 Cloudera, Inc. All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 3.Put’ing the newly incremented value. • This breaks in Master-Master because the Puts are overwriting each other. • See https://issues.apache.org/jira/browse/ HBASE-2804 33
  • 67. ©2014 Cloudera, Inc. All rights reserved. More Tooling • Replication management console, one shell to rule all the clusters! • Replication bootstrapping tool. • Tool that can move queues between region servers. • Tool that can throttle replication on a live cluster. 34
  • 68. ©2014 Cloudera, Inc. All rights reserved. Questions? • Or ping me async: • @jdcryans • jdcryans@cloudera.com • jdcryans on #hbase irc.freenode.net 35