Toward Better Multi-Tenancy Support from HDFS

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Toward Better Multi-
Tenancy Support from
HDFS
Xiaoyu Yao
Email: xyao@hortonworks.com

About myself
⬢ Member of Technical Staff at Hortonworks since 2014
⬢ Apache Hadoop Committer and PMC member.
⬢ Currently working on HDFS.
⬢ This talk is to help better understanding of HDFS multi-tenancy support and ongoing
work for better resource management.

Agenda
⬢ Overview
⬢ Hadoop multi-tenancy features
⬢ HDFS resources and multi-tenancy offerings
⬢ HDFS resource management via resource coupon
⬢ Q&A

Overview
⬢ Centrally managed infrastructure
–Consolidate to simplify management and lower TCO
–Better utilization and efficiency
⬢ Requirement
–Resource Sharing
–Resource Isolation
–Resource Control

Multi-Tenancy Support from Hadoop
Resource
Sharing
Resource
Isolation
Resource
Management
HBASE Y Namespace,
Region Server
Group
Quota
YARN Y Queue, Node Label
...
Capacity Scheduler,
...
HDFS Y Federation Quota,
FairCallQueue,
Backoff

HDFS Resources
⬢ Capacity
–Namespace
–Storage Space
–Storage Type
⬢ Operational Resources
–Namenode
•RPC
–Datanode
•Disk & Network

HDFS Resource Sharing/Isolation – Federation

HDFS Capacity Management – Quota
⬢ Quota
–Namespace
–StorageSpace
–HDFS-7584 Quota by Storage Types
⬢ Limitations
–Static
–Per directory
–No per user/job control

HDFS Operational Resource Management – Namenode RPC
Isolation (1)
⬢Internal RPC
–DN->NN block report, heartbeat, etc.
–ZKFC->NN liveness check
⬢External RPC
–Client RPCs from HDFSClients such as MR jobs/Hive queries/HBase
Client Listener
Reader
Reader
Call Queue
Handler
Handler
Handler
FSN

HDFS Operational Resource Management – Namenode RPC
Isolation (2)
⬢Use case:
–HFDS access from normal jobs impacted by offending jobs
–Internal RPCs impacted by External RPCs
–One blocked RPC method could affect others
⬢Protect HDFS internal RPCs:
–Dedicated service RPC server/port
•Isolate DN->NN block report, heartbeat, etc.
–Dedicated lifeline RPC server/port
•Protect ZKFC->NN liveness check
⬢All external RPCs go to the default port (e.g., 8020)

HDFS Resource Management – Name Node RPC Call Queue
⬢ In multi-tenancy scenario, call queue should play an important role like a shock
absorber to accommodate different workload, converting busty arrivals into smooth,
steady departures.
⬢ Good call queue
–queue without call bloat
–catches and handles bursts with no more than a temporary increase of queue delay
–maximum server utilization
⬢ Bad call queue
–queue that exhibits call bloat
–queue filled up and stay filled upon bursts
–low utilization and high queue latency

HDFS Resource Management - Fair Call Queue
⬢ Before HADOOP-9640 LinkedBlockingQueue
–Single queue
–Client blocked and timeout/fail when queue is full
⬢ HADOOP-9640 - Fair Call Queue
–Multiple priority levels and call queues with different processing priority
–Each RPC is assigned a priority by scheduler
–High priority RPC calls are put into call queue with higher probability of being executed.
Scheduler
Queue 0
Queue ...
Queue 2
Multiplexer (WRR)

HDFS Resource Management – Namenode RPC Throttling <1>
⬢ HADOOP-10597 Backoff when the call queue is full
–Send back a Retriable exception
–Let the client do exponential wait and retry instead of blocking/timeout/failed
the call.

⬢ HADOOP-12916 Backoff based on response time
–The basic idea: Backoff earlier to avoid call queue overload so that namenode
can recover quickly.
–Low priority calls get backed off if response time of high priority call is over
predefined threshold.
–More per user/queue metrics added for trouble shooting.

⬢ Abstract scheduler interface from call queue for pluggable RPC priority assignment
–DefaultRpcScheduler: all RPC calls with same priority
–DecayRpcScheduler: from original FairCallQueue priority assigned based on
previous call volumes of users.
–Other experimental schedulers: configurable list of high priority user/group for
low latency jobs, medium priority user/group for normal jobs and low priority
user/group for batch jobs.

HDFS resource management - QoS
⬢ Use case:
–Allow high performance QoS mechanism with minimum decoding effort on server side
⬢ HADOOP-9194 QoS support for Hadoop RPC
–One bytes in RPC header to facilitate QoS mechanism
–E.g., differentiate OLTP/OLAP, batch/streaming against the same HDFS
⬢ Limitation
–No mechanism level implementation yet

HDFS resource management with YARN
⬢ Use Case
–Priority inversion without centralized resource management (e.g., RPC calls from high priority
YARN jobs may be put into low priority HDFS namenode call queue)
–Identify and manage ”bad” caller effectively
⬢ Namenode – RPC handler
–FairCallQueue offers the fairness use of namenode RPC handlers
–No guarantee of differentiation
⬢ Datanode – I/O bandwidth
–No differentiation of writer/reader and bandwidth usage.
–Datanode allows static throttling balancer I/O.

HDFS Namenode Resource Reservation
⬢ HADOOP-13128 propose HDFS namenode resource reservation via resource coupon
–From throttling to manage
–Similar to delegation token in many aspects
–Works for both Kerberos and non-Kerberos cluster
–Allows only privileged service user to request resource coupons from namenode.
–Coupon can be serialized/de-serialized for use within container.
–Coupon can be renewed for long running jobs or canceled after the intended job is finished.

HDFS Namenode Resource Coupon
⬢ Coupon Identifier
–Finer grain owner (MR job ID, Hive Query ID) to help identify and manage “good” and “bad”
callers
–Resource type (Namenode RPC or Datanode I/O bandwidth)
–Flexible management unit for different resources.
•Min/Max percentage (e.g. Namenode RPC)
•Absolute value (Datanode I/O bandwidth)

HDFS Namenode Resource Coupon Manager (RCM)
⬢ Grant/Renew/Cancel resource coupon
⬢ Monitor and report resource usage
⬢ Check and validate resource use requests

HDFS Namenode Resource Pool
HDFS Namenode
Resource Pool
Fairness Pool Managed Pool
Applications supporting
Resource Coupon
(YARN/HBASE)
Legacy Applications
without Resource
Coupon

HDFS Namenode Resource Coupon Manager (RCM)
NEW
Client
YARN
Resource
Manager
HDFS Namenode
RCM
HDFS Datanode
YARN Node Manager
YARN Container

HDFS Resource Management – Datanode
⬢ Use case:
–When a client writes to HDFS faster than the disk bandwidth of the DNs, it saturates the disk
bandwidth and put the DNs into an unresponsive state.
–The client only backs off by aborting / recovering the pipeline, which causes failed writes and
unnecessary pipeline recovery.
⬢ Static I/O Throttling
–HDFS-7265 Support HDFS IO throttling
–HDFS-9796 Use a throttler for replica write in datanode
–HDFS-4412 Add throttler for datanode bandwidth
–HADOOP-10410 datanode Qos via ioprio_set on DataXceiver thread
⬢ Dynamic I/O Throttling
–HDFS-7270 Add congestion signaling capability to DataNode write pipline(ECN)
⬢ Future work: I/O bandwidth reservation with resource coupon

Toward Better Multi-Tenancy Support from HDFS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Toward Better Multi-Tenancy Support from HDFS

Ähnlich wie Toward Better Multi-Tenancy Support from HDFS (20)

Mehr von DataWorks Summit/Hadoop Summit

Mehr von DataWorks Summit/Hadoop Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Toward Better Multi-Tenancy Support from HDFS

Hinweis der Redaktion