SlideShare ist ein Scribd-Unternehmen logo
1 von 24
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Toward Better Multi-
Tenancy Support from
HDFS
Xiaoyu Yao
Email: xyao@hortonworks.com
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About myself
⬢ Member of Technical Staff at Hortonworks since 2014
⬢ Apache Hadoop Committer and PMC member.
⬢ Currently working on HDFS.
⬢ This talk is to help better understanding of HDFS multi-tenancy support and ongoing
work for better resource management.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
⬢ Overview
⬢ Hadoop multi-tenancy features
⬢ HDFS resources and multi-tenancy offerings
⬢ HDFS resource management via resource coupon
⬢ Q&A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Overview
⬢ Centrally managed infrastructure
–Consolidate to simplify management and lower TCO
–Better utilization and efficiency
⬢ Requirement
–Resource Sharing
–Resource Isolation
–Resource Control
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-Tenancy Support from Hadoop
Resource
Sharing
Resource
Isolation
Resource
Management
HBASE Y Namespace,
Region Server
Group
Quota
YARN Y Queue, Node Label
...
Capacity Scheduler,
...
HDFS Y Federation Quota,
FairCallQueue,
Backoff
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resources
⬢ Capacity
–Namespace
–Storage Space
–Storage Type
⬢ Operational Resources
–Namenode
•RPC
–Datanode
•Disk & Network
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Sharing/Isolation – Federation
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Capacity Management – Quota
⬢ Quota
–Namespace
–StorageSpace
–HDFS-7584 Quota by Storage Types
⬢ Limitations
–Static
–Per directory
–No per user/job control
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Operational Resource Management – Namenode RPC
Isolation (1)
⬢Internal RPC
–DN->NN block report, heartbeat, etc.
–ZKFC->NN liveness check
⬢External RPC
–Client RPCs from HDFSClients such as MR jobs/Hive queries/HBase
Client Listener
Reader
Reader
Call Queue
Handler
Handler
Handler
FSN
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Operational Resource Management – Namenode RPC
Isolation (2)
⬢Use case:
–HFDS access from normal jobs impacted by offending jobs
–Internal RPCs impacted by External RPCs
–One blocked RPC method could affect others
⬢Protect HDFS internal RPCs:
–Dedicated service RPC server/port
•Isolate DN->NN block report, heartbeat, etc.
–Dedicated lifeline RPC server/port
•Protect ZKFC->NN liveness check
⬢All external RPCs go to the default port (e.g., 8020)
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Management – Name Node RPC Call Queue
⬢ In multi-tenancy scenario, call queue should play an important role like a shock
absorber to accommodate different workload, converting busty arrivals into smooth,
steady departures.
⬢ Good call queue
–queue without call bloat
–catches and handles bursts with no more than a temporary increase of queue delay
–maximum server utilization
⬢ Bad call queue
–queue that exhibits call bloat
–queue filled up and stay filled upon bursts
–low utilization and high queue latency
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Management - Fair Call Queue
⬢ Before HADOOP-9640 LinkedBlockingQueue
–Single queue
–Client blocked and timeout/fail when queue is full
⬢ HADOOP-9640 - Fair Call Queue
–Multiple priority levels and call queues with different processing priority
–Each RPC is assigned a priority by scheduler
–High priority RPC calls are put into call queue with higher probability of being executed.
Scheduler
Queue 0
Queue ...
Queue 2
Multiplexer (WRR)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Management – Namenode RPC Throttling <1>
⬢ HADOOP-10597 Backoff when the call queue is full
–Send back a Retriable exception
–Let the client do exponential wait and retry instead of blocking/timeout/failed
the call.
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Management – Namenode RPC Throttling <2>
⬢ HADOOP-12916 Backoff based on response time
–The basic idea: Backoff earlier to avoid call queue overload so that namenode
can recover quickly.
–Low priority calls get backed off if response time of high priority call is over
predefined threshold.
–More per user/queue metrics added for trouble shooting.
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Management – Namenode RPC Throttling <3>
⬢ Abstract scheduler interface from call queue for pluggable RPC priority assignment
–DefaultRpcScheduler: all RPC calls with same priority
–DecayRpcScheduler: from original FairCallQueue priority assigned based on
previous call volumes of users.
–Other experimental schedulers: configurable list of high priority user/group for
low latency jobs, medium priority user/group for normal jobs and low priority
user/group for batch jobs.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS resource management - QoS
⬢ Use case:
–Allow high performance QoS mechanism with minimum decoding effort on server side
⬢ HADOOP-9194 QoS support for Hadoop RPC
–One bytes in RPC header to facilitate QoS mechanism
–E.g., differentiate OLTP/OLAP, batch/streaming against the same HDFS
⬢ Limitation
–No mechanism level implementation yet
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS resource management with YARN
⬢ Use Case
–Priority inversion without centralized resource management (e.g., RPC calls from high priority
YARN jobs may be put into low priority HDFS namenode call queue)
–Identify and manage ”bad” caller effectively
⬢ Namenode – RPC handler
–FairCallQueue offers the fairness use of namenode RPC handlers
–No guarantee of differentiation
⬢ Datanode – I/O bandwidth
–No differentiation of writer/reader and bandwidth usage.
–Datanode allows static throttling balancer I/O.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Namenode Resource Reservation
⬢ HADOOP-13128 propose HDFS namenode resource reservation via resource coupon
–From throttling to manage
–Similar to delegation token in many aspects
–Works for both Kerberos and non-Kerberos cluster
–Allows only privileged service user to request resource coupons from namenode.
–Coupon can be serialized/de-serialized for use within container.
–Coupon can be renewed for long running jobs or canceled after the intended job is finished.
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Namenode Resource Coupon
⬢ Coupon Identifier
–Finer grain owner (MR job ID, Hive Query ID) to help identify and manage “good” and “bad”
callers
–Resource type (Namenode RPC or Datanode I/O bandwidth)
–Flexible management unit for different resources.
•Min/Max percentage (e.g. Namenode RPC)
•Absolute value (Datanode I/O bandwidth)
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Namenode Resource Coupon Manager (RCM)
⬢ Grant/Renew/Cancel resource coupon
⬢ Monitor and report resource usage
⬢ Check and validate resource use requests
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Namenode Resource Pool
HDFS Namenode
Resource Pool
Fairness Pool Managed Pool
Applications supporting
Resource Coupon
(YARN/HBASE)
Legacy Applications
without Resource
Coupon
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Namenode Resource Coupon Manager (RCM)
NEW
Client
YARN
Resource
Manager
HDFS Namenode
RCM
HDFS Datanode
YARN Node Manager
YARN Container
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Resource Management – Datanode
⬢ Use case:
–When a client writes to HDFS faster than the disk bandwidth of the DNs, it saturates the disk
bandwidth and put the DNs into an unresponsive state.
–The client only backs off by aborting / recovering the pipeline, which causes failed writes and
unnecessary pipeline recovery.
⬢ Static I/O Throttling
–HDFS-7265 Support HDFS IO throttling
–HDFS-9796 Use a throttler for replica write in datanode
–HDFS-4412 Add throttler for datanode bandwidth
–HADOOP-10410 datanode Qos via ioprio_set on DataXceiver thread
⬢ Dynamic I/O Throttling
–HDFS-7270 Add congestion signaling capability to DataNode write pipline(ECN)
⬢ Future work: I/O bandwidth reservation with resource coupon
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetHortonworks
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In ElasticsearchKnoldus Inc.
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 

Was ist angesagt? (20)

ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 

Andere mochten auch

Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchAbhishek Andhavarapu
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoopmarklpollack
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 

Andere mochten auch (6)

Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 

Ähnlich wie Toward Better Multi-Tenancy Support from HDFS

Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Wangda Tan
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
SAP HANA SPS09 - Multitenant Database Containers
SAP HANA SPS09 - Multitenant Database ContainersSAP HANA SPS09 - Multitenant Database Containers
SAP HANA SPS09 - Multitenant Database ContainersSAP Technology
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 

Ähnlich wie Toward Better Multi-Tenancy Support from HDFS (20)

Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
SAP HANA SPS09 - Multitenant Database Containers
SAP HANA SPS09 - Multitenant Database ContainersSAP HANA SPS09 - Multitenant Database Containers
SAP HANA SPS09 - Multitenant Database Containers
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Toward Better Multi-Tenancy Support from HDFS

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Toward Better Multi- Tenancy Support from HDFS Xiaoyu Yao Email: xyao@hortonworks.com
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About myself ⬢ Member of Technical Staff at Hortonworks since 2014 ⬢ Apache Hadoop Committer and PMC member. ⬢ Currently working on HDFS. ⬢ This talk is to help better understanding of HDFS multi-tenancy support and ongoing work for better resource management.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Overview ⬢ Hadoop multi-tenancy features ⬢ HDFS resources and multi-tenancy offerings ⬢ HDFS resource management via resource coupon ⬢ Q&A
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Overview ⬢ Centrally managed infrastructure –Consolidate to simplify management and lower TCO –Better utilization and efficiency ⬢ Requirement –Resource Sharing –Resource Isolation –Resource Control
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-Tenancy Support from Hadoop Resource Sharing Resource Isolation Resource Management HBASE Y Namespace, Region Server Group Quota YARN Y Queue, Node Label ... Capacity Scheduler, ... HDFS Y Federation Quota, FairCallQueue, Backoff
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resources ⬢ Capacity –Namespace –Storage Space –Storage Type ⬢ Operational Resources –Namenode •RPC –Datanode •Disk & Network
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Sharing/Isolation – Federation
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Capacity Management – Quota ⬢ Quota –Namespace –StorageSpace –HDFS-7584 Quota by Storage Types ⬢ Limitations –Static –Per directory –No per user/job control
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Operational Resource Management – Namenode RPC Isolation (1) ⬢Internal RPC –DN->NN block report, heartbeat, etc. –ZKFC->NN liveness check ⬢External RPC –Client RPCs from HDFSClients such as MR jobs/Hive queries/HBase Client Listener Reader Reader Call Queue Handler Handler Handler FSN
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Operational Resource Management – Namenode RPC Isolation (2) ⬢Use case: –HFDS access from normal jobs impacted by offending jobs –Internal RPCs impacted by External RPCs –One blocked RPC method could affect others ⬢Protect HDFS internal RPCs: –Dedicated service RPC server/port •Isolate DN->NN block report, heartbeat, etc. –Dedicated lifeline RPC server/port •Protect ZKFC->NN liveness check ⬢All external RPCs go to the default port (e.g., 8020)
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Name Node RPC Call Queue ⬢ In multi-tenancy scenario, call queue should play an important role like a shock absorber to accommodate different workload, converting busty arrivals into smooth, steady departures. ⬢ Good call queue –queue without call bloat –catches and handles bursts with no more than a temporary increase of queue delay –maximum server utilization ⬢ Bad call queue –queue that exhibits call bloat –queue filled up and stay filled upon bursts –low utilization and high queue latency
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management - Fair Call Queue ⬢ Before HADOOP-9640 LinkedBlockingQueue –Single queue –Client blocked and timeout/fail when queue is full ⬢ HADOOP-9640 - Fair Call Queue –Multiple priority levels and call queues with different processing priority –Each RPC is assigned a priority by scheduler –High priority RPC calls are put into call queue with higher probability of being executed. Scheduler Queue 0 Queue ... Queue 2 Multiplexer (WRR)
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Namenode RPC Throttling <1> ⬢ HADOOP-10597 Backoff when the call queue is full –Send back a Retriable exception –Let the client do exponential wait and retry instead of blocking/timeout/failed the call.
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Namenode RPC Throttling <2> ⬢ HADOOP-12916 Backoff based on response time –The basic idea: Backoff earlier to avoid call queue overload so that namenode can recover quickly. –Low priority calls get backed off if response time of high priority call is over predefined threshold. –More per user/queue metrics added for trouble shooting.
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Namenode RPC Throttling <3> ⬢ Abstract scheduler interface from call queue for pluggable RPC priority assignment –DefaultRpcScheduler: all RPC calls with same priority –DecayRpcScheduler: from original FairCallQueue priority assigned based on previous call volumes of users. –Other experimental schedulers: configurable list of high priority user/group for low latency jobs, medium priority user/group for normal jobs and low priority user/group for batch jobs.
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS resource management - QoS ⬢ Use case: –Allow high performance QoS mechanism with minimum decoding effort on server side ⬢ HADOOP-9194 QoS support for Hadoop RPC –One bytes in RPC header to facilitate QoS mechanism –E.g., differentiate OLTP/OLAP, batch/streaming against the same HDFS ⬢ Limitation –No mechanism level implementation yet
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS resource management with YARN ⬢ Use Case –Priority inversion without centralized resource management (e.g., RPC calls from high priority YARN jobs may be put into low priority HDFS namenode call queue) –Identify and manage ”bad” caller effectively ⬢ Namenode – RPC handler –FairCallQueue offers the fairness use of namenode RPC handlers –No guarantee of differentiation ⬢ Datanode – I/O bandwidth –No differentiation of writer/reader and bandwidth usage. –Datanode allows static throttling balancer I/O.
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Reservation ⬢ HADOOP-13128 propose HDFS namenode resource reservation via resource coupon –From throttling to manage –Similar to delegation token in many aspects –Works for both Kerberos and non-Kerberos cluster –Allows only privileged service user to request resource coupons from namenode. –Coupon can be serialized/de-serialized for use within container. –Coupon can be renewed for long running jobs or canceled after the intended job is finished.
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Coupon ⬢ Coupon Identifier –Finer grain owner (MR job ID, Hive Query ID) to help identify and manage “good” and “bad” callers –Resource type (Namenode RPC or Datanode I/O bandwidth) –Flexible management unit for different resources. •Min/Max percentage (e.g. Namenode RPC) •Absolute value (Datanode I/O bandwidth)
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Coupon Manager (RCM) ⬢ Grant/Renew/Cancel resource coupon ⬢ Monitor and report resource usage ⬢ Check and validate resource use requests
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Pool HDFS Namenode Resource Pool Fairness Pool Managed Pool Applications supporting Resource Coupon (YARN/HBASE) Legacy Applications without Resource Coupon
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Coupon Manager (RCM) NEW Client YARN Resource Manager HDFS Namenode RCM HDFS Datanode YARN Node Manager YARN Container
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Datanode ⬢ Use case: –When a client writes to HDFS faster than the disk bandwidth of the DNs, it saturates the disk bandwidth and put the DNs into an unresponsive state. –The client only backs off by aborting / recovering the pipeline, which causes failed writes and unnecessary pipeline recovery. ⬢ Static I/O Throttling –HDFS-7265 Support HDFS IO throttling –HDFS-9796 Use a throttler for replica write in datanode –HDFS-4412 Add throttler for datanode bandwidth –HADOOP-10410 datanode Qos via ioprio_set on DataXceiver thread ⬢ Dynamic I/O Throttling –HDFS-7270 Add congestion signaling capability to DataNode write pipline(ECN) ⬢ Future work: I/O bandwidth reservation with resource coupon
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you! Q&A

Hinweis der Redaktion

  1. move the yarn pic here
  2. sever/client
  3. bandwidth via ioprio for dfsclient and xceiver thread maybe no standard across OS
  4. Reservation based dynamic throttling utilizes existing DataXceiver bandwidth throttling