SlideShare a Scribd company logo
1 of 37
1© Cloudera, Inc. All rights reserved.
Hadoop Summit EU, 16 Apr 2015
Jonathan Hsieh| HBase Tech Lead @ Cloudera, Apache HBase PMC
Dima Spivak | HBase QE Lead @ Cloudera
Multi-tenant, Multi-cluster and
Multi-container Apache HBase
Deployments
2© Cloudera, Inc. All rights reserved.
• Jonathan Hsieh
• Tech Lead, HBase Team @ Cloudera
• Apache HBase PMC Member
• Apache Flume founder
• Contact
• jon@cloudera.com
• @jmhsieh
• Dima Spivak
• QE Lead, HBase Team @Cloudera
• Contact
• dspivak@cloudera.com
• @dimaspivak
Who are we?
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
3© Cloudera, Inc. All rights reserved.
What is Apache HBase?
Apache HBase is an
consistent, low latency,
random access, non-
relational database built
on Apache Hadoop.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
4© Cloudera, Inc. All rights reserved.
Some HBase Contributors, Users, and Providers
5© Cloudera, Inc. All rights reserved.
Challenges as usage increases
• How does one:
• Isolate different application workloads.
• Share datasets between different workloads.
• Prepare for geographic redundancy and availability.
• Manage cluster migrations.
• Test and prototype (multi-)cluster deployments.
• There are multiple solutions!
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
6© Cloudera, Inc. All rights reserved.
Multiple Multi- Solutions
Using more than one cluster for
an application.
Using one cluster for more than
one application.
Using one machine to run [one
or more] multi-node clusters.
Multi-Cluster Multi-Tenant Multi-Container
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
7© Cloudera, Inc. All rights reserved.
Multi-Cluster
Safety in numbers
8© Cloudera, Inc. All rights reserved.
Multi-Cluster Deployments
• Deploy multiple HBase cluster instances.
• Motivation:
• Isolating different workloads from each other.
• Geographic disaster recovery, redundancy, and availability.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
9© Cloudera, Inc. All rights reserved.
Isolation
• Isolation is usually done in were many apps share one data center.
• Two different workloads on the same dataset.
• Perform latency-sensitive workloads on the same set of data as analytic MR
workload.
• Two disjoint applications workloads and datasets.
• Deploy OpenTSDB on HBase in same data center, but as cluster to monitor
production HBase cluster.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
10© Cloudera, Inc. All rights reserved.
Isolation: Operational with Analytical access pattern
HBase Client
Get, Scan
HBase Replication
low latency
Isolated from full scans
high throughput
MapReduce
HBase Scanner
HBase Client
Put, Incr, Append
Bulk Import
HBase Client
HBase Replication
high throughput
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
11© Cloudera, Inc. All rights reserved.
Geographic Recovery, Redundancy, and Availability
• Run multiple HBase clusters in multiple data centers.
• Often using “Podding” schemes.
• Primarily for backups of data in case data center outages.
• Locality for Performance.
• Locality for Compliance.
• Availability while a datacenter is down.
• Deploy with:
• HBase replication - master master, master slave.
• Multicluster clients.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
12© Cloudera, Inc. All rights reserved.
Master-Master Replication
logs logs
logs
Replicating data reduces chances of data loss.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
13© Cloudera, Inc. All rights reserved.
HBase Multi-Cluster Client
• High Availability with Eventual
Consistency when using replication.
• Simple implementation.
• Hedged operations. If primary takes
too long, go to the failover cluster.
• Same HConnection interface just a
different factory
HConnectionManagerMultiClusterWrapper.get
Connection(conf)
• HBase.MCC to be available in Cloudera
Labs.
Work by Ted Malaska (Cloudera Solution Architect)
https://github.com/tmalaska/HBase.MCC
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
14© Cloudera, Inc. All rights reserved.
Multi-Tenant
We’re all in this together
15© Cloudera, Inc. All rights reserved.
Multi-tenant deployments
• Deploy multiple workloads on one cluster.
• Motivation:
• Better Resource utilization.
• Cost efficiency.
• Simpler operations.
• Shared data.
• Multiple services on one cluster.
• Running HBase, Spark, Impala and MR on the same cluster.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
16© Cloudera, Inc. All rights reserved.
Security and namespaces
• Challenges:
• Resource management, prioritizing and fairness.
• Authentication and Authorization.
• Mechanisms:
• HBase Security – Authentication, Authorization for commands via ACLs.
• Namespaces – Isolate administrative domains for ACLs.
• Proxy Impersonation – Thrift proxy doAs, and REST proxy doAs.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
17© Cloudera, Inc. All rights reserved.
Request Throttling
• Idea: some tables or users get a limited
budget of ops or throughput, while others
do not.
• Multiple workloads on one dataset.
• Production/real-time user: unthrottled.
• Analytic/adhoc workloads user: throttled.
• Caveat: if all users throttled, we may not use
all machine resources.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
18© Cloudera, Inc. All rights reserved.
Request Scheduling
• Idea: gets should have high priority while
scans should get deprioritized the more
they are used (HBASE-10994).
• Multiple workloads on one dataset .
• Production real-time gets: immediately
scheduled.
• Analytic scan workloads: delay
scheduled.
• All resources are used.
• Caveat: requires manual tuning .
1 1 2 1 1 3 1
1 1 21 1 31
Delayed by long
scan requests
Rescheduled so
new request get
priority
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
19© Cloudera, Inc. All rights reserved.
Performance Isolation inside a cluster
• Region Server Groups (under review).
• Limit performance impact load on one
table has on others (HBASE-6721).
• Multiple workloads on multiple data sets
on one HBase cluster.
• Two separate apps on one cluster.
Mixed workload
Isolated
workload
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
20© Cloudera, Inc. All rights reserved.
• Today, the easiest strategy for isolating latency-sensitive HBase deployment from
other services is static partitioning.
• Future:
• Improve IO isolation via YARN/Slider/Mesos.
• Separate HBase actions into separate processes.
• e.g. externalize compaction for better resource management.
Service Isolation
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
HBase RS
HDFS DN
Yarn NM/MR
impalad
HDFS DN
HBase RS
HDFS DN
HBase RS
HDFS DN
Yarn NM/MR
impalad
HDFS DN
Yarn NM/MR
impalad
HDFS DN
Multi service deployment Statically partitioned service deployment
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
21© Cloudera, Inc. All rights reserved.
Multi-Container
My name is Jonah
22© Cloudera, Inc. All rights reserved.
Multi-container deployments
• Run a distributed HBase cluster on a single host.
• Testing applications.
• Use cases requiring quick cluster stand-up.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
23© Cloudera, Inc. All rights reserved.
Linux containers
• cgroups (2.6.24+).
• Isolating resources (memory, CPU, networking).
• Namespace isolation (filesystems, process trees).
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
24© Cloudera, Inc. All rights reserved.
Virtual Machines vs Linux Containers
Hypervisor
Host Operating System
Guest OS Guest OS Guest OS Guest OS
Libraries Libraries Libraries Libraries
User
processes
User
processes
User
processes
User
processes
Virtual Machines
Host Operating System
Libraries
User
processes
User
processes
User
processes
User
processes
Containers
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
25© Cloudera, Inc. All rights reserved.
Docker
• User front-end for containers.
• Container management (start, stop,
pause).
• docker run
• Images (templates for containers).
• docker commit
• Registries (repository for images).
• docker push
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
26© Cloudera, Inc. All rights reserved.
Integration testing
• Automate long-running tests from hbase-it module.
• $ hbase org.apache.hadoop.hbase.IntegrationTest…
• Integration with fault injection framework (Chaos Monkey).
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
27© Cloudera, Inc. All rights reserved.
Starting container cluster
DNS server
dnsserver
(10.0.0.2)
Node
node-1
(10.0.0.3)
Node
node-2
(10.0.0.4)
Start cluster
Master Slave
Node
node-3
(10.0.0.5)
Slave
Node
node-4
(10.0.0.6)
Slave
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
28© Cloudera, Inc. All rights reserved.
Automation
• Replace fragile infrastructure.
• Setup distributed cluster as part of test execution.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
29© Cloudera, Inc. All rights reserved.
In progress
• Extend this workflow to upstream Apache HBase (HBASE-12721)
• Upstream integration testing (builds.apache.org)
• Multi-cluster use cases (e.g. MCC, replication)
• Upgrades
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
30© Cloudera, Inc. All rights reserved.
Conclusions
Multi multi multi
31© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talk
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
Goal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
32© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talk
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
Goal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
Multiple workloads on
same dataset
(real-time vs analytic
workload)
Separate cluster per
workload.
Request throttling,
request scheduling.
Containers as “VMs” or
microservices.
33© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talk
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
Goal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
Multiple workloads on
same dataset
(real-time vs analytic
workload)
Separate cluster per
workload.
Request throttling,
request scheduling.
Containers as “VMs” or
microservices.
Reliability and
Availability
Disaster recovery,
master-master replication,
multi-cluster client.
Multiple tables with Region
Server Groups.
More realistic testing.
34© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talkGoal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
Multiple workloads on
same dataset
(real-time vs analytic
workload)
Separate cluster per
workload.
Request throttling,
request scheduling.
Containers as “VMs” or
microservices.
Reliability and
Availability
Disaster recovery,
master-master replication,
multi-cluster client.
Multiple tables with Region
Server Groups.
More realistic testing.
Cost Savings Disaster recovery. One cluster, multiple use
cases.
One machine, multiple
nodes.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
35© Cloudera, Inc. All rights reserved.
Futures
• We are seeing more and more deployments that are multi cluster and/or multi-
tenant.
• Traditional workflows are giving way to hybrid ones
• More knobs to turn to optimize for performance and value
• Multi-container deployments are a way forward to make prototyping and testing
these deployments easier.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
36© Cloudera, Inc. All rights reserved.
Thank you!
37© Cloudera, Inc. All rights reserved.
HBaseCon 2015 is Coming!
Thurs., May 7, in San Francisco
Presentations from the world’s biggest HBase operators:
Bloomberg, Dropbox, eBay, Facebook, Google, Pinterest, Xiaomi, Yahoo!, more!
Seats are limited; register at hbasecon.com
Community Sponsor

More Related Content

What's hot

HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Tips and Tricks for SAP Sybase IQ
Tips and Tricks for SAP  Sybase IQTips and Tricks for SAP  Sybase IQ
Tips and Tricks for SAP Sybase IQDon Brizendine
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformrhatr
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...Amazon Web Services
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Accelerate spring boot application with apache ignite
Accelerate spring boot application with apache igniteAccelerate spring boot application with apache ignite
Accelerate spring boot application with apache igniteYEON BOK LEE
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 

What's hot (20)

HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Tips and Tricks for SAP Sybase IQ
Tips and Tricks for SAP  Sybase IQTips and Tricks for SAP  Sybase IQ
Tips and Tricks for SAP Sybase IQ
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Accelerate spring boot application with apache ignite
Accelerate spring boot application with apache igniteAccelerate spring boot application with apache ignite
Accelerate spring boot application with apache ignite
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 

Viewers also liked

HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
Making Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and MeteredMaking Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and MeteredPaul Fremantle
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementDataWorks Summit
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseCloudera, Inc.
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2Afkham Azeez
 
CFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud FoundryCFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud FoundryIan Huston
 
HBase Incremental Backup
HBase Incremental BackupHBase Incremental Backup
HBase Incremental BackupLee neal
 
Federated HDFS
Federated HDFSFederated HDFS
Federated HDFShuguk
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 

Viewers also liked (20)

HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Making Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and MeteredMaking Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and Metered
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2
 
CFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud FoundryCFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud Foundry
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
HBase Incremental Backup
HBase Incremental BackupHBase Incremental Backup
HBase Incremental Backup
 
Federated HDFS
Federated HDFSFederated HDFS
Federated HDFS
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 

Similar to Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Apekshit Sharma
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 
Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersDataWorks Summit
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcominghuguk
 

Similar to Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase Clusters
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcoming
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

  • 1. 1© Cloudera, Inc. All rights reserved. Hadoop Summit EU, 16 Apr 2015 Jonathan Hsieh| HBase Tech Lead @ Cloudera, Apache HBase PMC Dima Spivak | HBase QE Lead @ Cloudera Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
  • 2. 2© Cloudera, Inc. All rights reserved. • Jonathan Hsieh • Tech Lead, HBase Team @ Cloudera • Apache HBase PMC Member • Apache Flume founder • Contact • jon@cloudera.com • @jmhsieh • Dima Spivak • QE Lead, HBase Team @Cloudera • Contact • dspivak@cloudera.com • @dimaspivak Who are we? 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 3. 3© Cloudera, Inc. All rights reserved. What is Apache HBase? Apache HBase is an consistent, low latency, random access, non- relational database built on Apache Hadoop. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 4. 4© Cloudera, Inc. All rights reserved. Some HBase Contributors, Users, and Providers
  • 5. 5© Cloudera, Inc. All rights reserved. Challenges as usage increases • How does one: • Isolate different application workloads. • Share datasets between different workloads. • Prepare for geographic redundancy and availability. • Manage cluster migrations. • Test and prototype (multi-)cluster deployments. • There are multiple solutions! 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 6. 6© Cloudera, Inc. All rights reserved. Multiple Multi- Solutions Using more than one cluster for an application. Using one cluster for more than one application. Using one machine to run [one or more] multi-node clusters. Multi-Cluster Multi-Tenant Multi-Container 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 7. 7© Cloudera, Inc. All rights reserved. Multi-Cluster Safety in numbers
  • 8. 8© Cloudera, Inc. All rights reserved. Multi-Cluster Deployments • Deploy multiple HBase cluster instances. • Motivation: • Isolating different workloads from each other. • Geographic disaster recovery, redundancy, and availability. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 9. 9© Cloudera, Inc. All rights reserved. Isolation • Isolation is usually done in were many apps share one data center. • Two different workloads on the same dataset. • Perform latency-sensitive workloads on the same set of data as analytic MR workload. • Two disjoint applications workloads and datasets. • Deploy OpenTSDB on HBase in same data center, but as cluster to monitor production HBase cluster. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 10. 10© Cloudera, Inc. All rights reserved. Isolation: Operational with Analytical access pattern HBase Client Get, Scan HBase Replication low latency Isolated from full scans high throughput MapReduce HBase Scanner HBase Client Put, Incr, Append Bulk Import HBase Client HBase Replication high throughput 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 11. 11© Cloudera, Inc. All rights reserved. Geographic Recovery, Redundancy, and Availability • Run multiple HBase clusters in multiple data centers. • Often using “Podding” schemes. • Primarily for backups of data in case data center outages. • Locality for Performance. • Locality for Compliance. • Availability while a datacenter is down. • Deploy with: • HBase replication - master master, master slave. • Multicluster clients. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 12. 12© Cloudera, Inc. All rights reserved. Master-Master Replication logs logs logs Replicating data reduces chances of data loss. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 13. 13© Cloudera, Inc. All rights reserved. HBase Multi-Cluster Client • High Availability with Eventual Consistency when using replication. • Simple implementation. • Hedged operations. If primary takes too long, go to the failover cluster. • Same HConnection interface just a different factory HConnectionManagerMultiClusterWrapper.get Connection(conf) • HBase.MCC to be available in Cloudera Labs. Work by Ted Malaska (Cloudera Solution Architect) https://github.com/tmalaska/HBase.MCC 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 14. 14© Cloudera, Inc. All rights reserved. Multi-Tenant We’re all in this together
  • 15. 15© Cloudera, Inc. All rights reserved. Multi-tenant deployments • Deploy multiple workloads on one cluster. • Motivation: • Better Resource utilization. • Cost efficiency. • Simpler operations. • Shared data. • Multiple services on one cluster. • Running HBase, Spark, Impala and MR on the same cluster. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 16. 16© Cloudera, Inc. All rights reserved. Security and namespaces • Challenges: • Resource management, prioritizing and fairness. • Authentication and Authorization. • Mechanisms: • HBase Security – Authentication, Authorization for commands via ACLs. • Namespaces – Isolate administrative domains for ACLs. • Proxy Impersonation – Thrift proxy doAs, and REST proxy doAs. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 17. 17© Cloudera, Inc. All rights reserved. Request Throttling • Idea: some tables or users get a limited budget of ops or throughput, while others do not. • Multiple workloads on one dataset. • Production/real-time user: unthrottled. • Analytic/adhoc workloads user: throttled. • Caveat: if all users throttled, we may not use all machine resources. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 18. 18© Cloudera, Inc. All rights reserved. Request Scheduling • Idea: gets should have high priority while scans should get deprioritized the more they are used (HBASE-10994). • Multiple workloads on one dataset . • Production real-time gets: immediately scheduled. • Analytic scan workloads: delay scheduled. • All resources are used. • Caveat: requires manual tuning . 1 1 2 1 1 3 1 1 1 21 1 31 Delayed by long scan requests Rescheduled so new request get priority 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 19. 19© Cloudera, Inc. All rights reserved. Performance Isolation inside a cluster • Region Server Groups (under review). • Limit performance impact load on one table has on others (HBASE-6721). • Multiple workloads on multiple data sets on one HBase cluster. • Two separate apps on one cluster. Mixed workload Isolated workload 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 20. 20© Cloudera, Inc. All rights reserved. • Today, the easiest strategy for isolating latency-sensitive HBase deployment from other services is static partitioning. • Future: • Improve IO isolation via YARN/Slider/Mesos. • Separate HBase actions into separate processes. • e.g. externalize compaction for better resource management. Service Isolation Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN HBase RS HDFS DN Yarn NM/MR impalad HDFS DN HBase RS HDFS DN HBase RS HDFS DN Yarn NM/MR impalad HDFS DN Yarn NM/MR impalad HDFS DN Multi service deployment Statically partitioned service deployment 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 21. 21© Cloudera, Inc. All rights reserved. Multi-Container My name is Jonah
  • 22. 22© Cloudera, Inc. All rights reserved. Multi-container deployments • Run a distributed HBase cluster on a single host. • Testing applications. • Use cases requiring quick cluster stand-up. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 23. 23© Cloudera, Inc. All rights reserved. Linux containers • cgroups (2.6.24+). • Isolating resources (memory, CPU, networking). • Namespace isolation (filesystems, process trees). 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 24. 24© Cloudera, Inc. All rights reserved. Virtual Machines vs Linux Containers Hypervisor Host Operating System Guest OS Guest OS Guest OS Guest OS Libraries Libraries Libraries Libraries User processes User processes User processes User processes Virtual Machines Host Operating System Libraries User processes User processes User processes User processes Containers 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 25. 25© Cloudera, Inc. All rights reserved. Docker • User front-end for containers. • Container management (start, stop, pause). • docker run • Images (templates for containers). • docker commit • Registries (repository for images). • docker push 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 26. 26© Cloudera, Inc. All rights reserved. Integration testing • Automate long-running tests from hbase-it module. • $ hbase org.apache.hadoop.hbase.IntegrationTest… • Integration with fault injection framework (Chaos Monkey). 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 27. 27© Cloudera, Inc. All rights reserved. Starting container cluster DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) Start cluster Master Slave Node node-3 (10.0.0.5) Slave Node node-4 (10.0.0.6) Slave 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 28. 28© Cloudera, Inc. All rights reserved. Automation • Replace fragile infrastructure. • Setup distributed cluster as part of test execution. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 29. 29© Cloudera, Inc. All rights reserved. In progress • Extend this workflow to upstream Apache HBase (HBASE-12721) • Upstream integration testing (builds.apache.org) • Multi-cluster use cases (e.g. MCC, replication) • Upgrades 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 30. 30© Cloudera, Inc. All rights reserved. Conclusions Multi multi multi
  • 31. 31© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups.
  • 32. 32© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices.
  • 33. 33© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices. Reliability and Availability Disaster recovery, master-master replication, multi-cluster client. Multiple tables with Region Server Groups. More realistic testing.
  • 34. 34© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talkGoal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices. Reliability and Availability Disaster recovery, master-master replication, multi-cluster client. Multiple tables with Region Server Groups. More realistic testing. Cost Savings Disaster recovery. One cluster, multiple use cases. One machine, multiple nodes. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 35. 35© Cloudera, Inc. All rights reserved. Futures • We are seeing more and more deployments that are multi cluster and/or multi- tenant. • Traditional workflows are giving way to hybrid ones • More knobs to turn to optimize for performance and value • Multi-container deployments are a way forward to make prototyping and testing these deployments easier. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 36. 36© Cloudera, Inc. All rights reserved. Thank you!
  • 37. 37© Cloudera, Inc. All rights reserved. HBaseCon 2015 is Coming! Thurs., May 7, in San Francisco Presentations from the world’s biggest HBase operators: Bloomberg, Dropbox, eBay, Facebook, Google, Pinterest, Xiaomi, Yahoo!, more! Seats are limited; register at hbasecon.com Community Sponsor

Editor's Notes

  1. Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable. Open-source: Apache HBase is an open source project with an Apache 2.0 license. Distributed: HBase is designed to use multiple machines to store and serve data. Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk. HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  2. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.