SlideShare a Scribd company logo
1 of 22
Presented by
Rohith Sharma, Naganarasimha &
Sunil
About us..
Rohith Sharma K S,
-Hadoop Committer, Works for Huawei
-5+ year of experience in Hadoop ecosystems
Naganarasimha G R,
-Apache Hadoop Contributor for YARN, Huawei
-4+ year of experience in Hadoop ecosystems
Sunil Govind
-Apache Hadoop Contributor for YARN and MapReduce
-3+ year of experience in Hadoop ecosystems
Agenda
➔Overview about general cluster deployment
➔Yarn cluster resource configurations walk through
➔Anti Patterns
◆ MapReduce
◆ YARN
● RM Restart/HA
● Queue Planning
➔Summary
Brief Overview: General Cluster Deployment
A sample Hadoop Cluster Layout with HA
NM DNRM
(Master)
NN
(Master)
RM
(Backup)
NN
(Backup)
NM
NM
NM DN
DN
DN
Client
ATS RM - Resource Manager
NM - Node Manager
NN - Name Node
DN - Data Node
ATS - Application Timeline Server
ZK - ZooKeeper
ZK
ZK
ZK
ZooKeeper Cluster
YARN Configuration : An Example
Legacy NodeManager’s or DataNode’s were having low resource configurations. Nowadays most of the
systems has high end capability and customers wants high end machines with less number of nodes
(50~100 nodes) to achieve better performance.
Sample NodeManager configurations could be like:
-64 GB in Memory
-8/16 cores of CPU
-1Gb Network cards
-100 TB disk (or Disk Arrays)
We are now more focussing on these set of deployment and will try to cover anti-patterns OR best
usages in coming slides.
YARN Configuration: Related to Resources
NodeManager:
●yarn.nodemanager.resource.memory-mb
●yarn.nodemanager.resource.cpu-vcores
●yarn.nodemanager.vmem-pmem-ratio
●yarn.nodemanager.log-dirs
●yarn.nodemanager.local-dirs
Scheduler:
●yarn.scheduler.minimum-allocation-mb
●yarn.scheduler.maximum-allocation-mb
MapReduce:
●mapreduce.map/reduce.java.opts
●mapreduce.map/reduce.memory.mb
●mapreduce.map/reduce.cpu.vcores
YARN and MR has these various resource tuning configurations to help for a better resource
allocation.
●With “vmem-pmem-ratio” (2:1 for example), Node Manager can kill container if its Virtual
Memory shoots twice to its configured memory usage.
●It’s advised to configure “local-dirs” and “log-dirs” in different mount points.
Anti Pattern in
MRAppMaster
Container Memory Vs Container Heap Memory
Customer : “Enough container memory is configured, still job runs slowly and sometimes
when data is relatively more, tasks fails with OOM”
Resolution:
1.Container memory and container Heap Size both are different configurations.
2.Make sure if mapreduce.map/reduce.memory.mb is configured then configure
mapreduce.map/reduce.java.opts for heap size.
3.Since this was common mistake from users, currently in trunk we have handled this scenario. RM will
set 0.8 of container configured/requested memory as its heap memory.
1. if mapreduce.map/reduce.memory.mb values are specified, but no -Xmx is supplied for
mapreduce.map/reduce.java.opts keys, then the -Xmx value will be derived from the former's value.
2. For both these conversions, a scaling factor specified by property mapreduce.job.heap.memory-
mb.ratio is used (default 80%), to account for overheads between heap usage vs. actual physical
memory usage.
Shuffle phase is taking long time
Customer: “500 GB data Job finished in 4 hours, and on the cluster 1000 GB data
job is running since 12 hours in reducer phase. I think job is stuck.”
After enquiring more about resource configuration,
The same resource configurations used for both the jobs
Resolution:
1.Job is NOT hanged/stuck, rather time has spent on copying map output.
2.Increase the task resources
3.Tuning configurations
mapreduce.reduce.shuffle.parallelcopies
mapreduce.reduce.shuffle.input.buffer.percent
Anti Pattern in YARN
RM Restart : RMStateStore Limit
Customer: “Configured to yarn.resourcemanager.max-completed-applications to 100000.
Completed applications in cluster has reached the limit and there many applications are in
running. Observation is RM service to be up, takes 10-15 seconds”
Resolution:
1.It is NOT suggested to configure 100000 max-completed-applications.
2.Suggested to use TimelimeServer for history of YARN applications
3.Higher the value significantly impact on the RM recovery
Queue planning
Queue planning : Queue Mapping
Queue planning : Queue Capacity Planning and Preemption
Queue planning : Queue Capacity Planning for multiple users
Customer : “I have multiple users submitting apps to a queue, seems like all the resources have
been taken by single user’s app(s) though other apps are activated“
Queue Capacity Planning :
CS provides options to control resources used by different users under a queue. yarn.scheduler.capacity.<queue-
path>.minimum-user-limit-percent and yarn.scheduler.capacity.<queue-path>.user-limit-factor are the configurations which
determines what amount of resources each user gets
yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent defaults to 100% which implies no user limits are imposed.
This defines how much minimum resource each user is going to get.
yarn.scheduler.capacity.<queue-path>.user-limit-factor defaults to 1 which implies that a single user can never take complete
queue’s resources. Needs to be configured such that even when other users are not using the queue, how much a particular
user can take.
Queue planning : AM Resource Limit
Customer: “Hey buddy, most of my Jobs are in ACCEPTED state and never starts to run.
What should be the problem?”
“All my Jobs were running fine. But after RM switchover, few Jobs didn’t resume its work.
Why RM is not able to allocate new containers to these Jobs?”
Resolution:
1.User need to ensure that AM Resource Limit is properly configured w.r.t the User’s deployment needs.
Maximum resource limit for running AM containers need to be analyzed and configured correctly to
ensure effective progress of applications.
a. Refer yarn.scheduler.capacity.maximum-am-resource-percent
2.After RM switchover if few NMs were not registered back, it can result a change in cluster size
compared to what was there prior to failover. This will affect the AM Resource Limit, and hence less AMs
will be activated after restart.
3.For analytical : more AM limit, For Batch queries : less AM limit
Queue planning : Application Priority within Queue
Customer : “I have many applications running in my cluster, and few are very important jobs
which has to execute fast. I now use separate queues to run some very important
applications. Configuration seems very complex here and I feel cluster resources are not
utilized well because of this.”
Resolution:
root
sales (50%) inventory(50%)
low
40%
high
20%
med
40%
low
40%
high
20%
med
40%
Configuration seems very complex for this case and
cluster resources may not be utilized very well.
Suggesting to use Application Priority instead.
Resolution:
Application Priority will be available in YARN from 2.8 release onwards. A brief heads-up
about this feature.
1.Configure “yarn.cluster.max-application-priority” in yarn-site.xml. This will be the maximum
priority for any user/application which can be configured.
2.Within a queue, currently applications are selected by using OrderingPolicy (FIFO/Fair). If
applications are submitted with priority, Capacity Scheduler will also consider prioirity of
application in FiFoOrderingPolicy. Hence an application with highest priority will always be
picked for resource allocation.
3.For MapReduce, use “mapreduce.job.priority” to set priority.
Application Priority within Queue
(contd..)
Resource Request Limits
Customer: “I am not very sure about the capacity of node managers and maximum-allocation
resource configuration. But my application is not getting any containers or its getting killed.”
Resolution/Suggestion:
NMs are not having more than 6GB memory. If container request has big memory/cpu demand which
may more than a node manager’s memory and less than default “maximum-allocation-mb”, then
container requests will not be served by RM. Unfortunately this is not thrown as an error to the user side,
and application will continuously wait for allocation. On the other hand, Scheduler will also be waiting for
some nodes to meet this heavy resource requests.
User yarn.scheduler.maximum-allocation-mb and yarn.scheduler.maximum-allocation-vcores effectively by looking up
on the NodeManager memory/cpu limit.
Reservation Issue
Customer : “My Application has reserved container in a node and never able to get new
containers.”
Resolution:
Reservation feature in Capacity Scheduler serves a great deal to ensure a better linear resource
allocation. However it’s possible that there can be few corner cases. For example, an application has
made a reservation to a node. But this node has various containers running (long-lived), so chances of
getting some free resources from this node is minimal in an immediate time frame.
Configurations like below can help in having some time-framed reservation for effective cluster usage.
●yarn.scheduler.capacity.reservations-continue-look-all-nodes will help in looking for a suitable resource in other
nodes too.
Suggestions in Resource Configuration
Thank you

More Related Content

What's hot

Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Jianfeng Zhang
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Rohit Agrawal
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...Zhijie Shen
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...DataStax
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!DataStax Academy
 

What's hot (20)

Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Spark & Yarn better together 1.2
Spark & Yarn better together 1.2
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Yahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at ScaleYahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at Scale
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!
 

Similar to Essential YARN and MapReduce configurations for Hadoop cluster optimization

Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource AllocationSujith Jay Nair
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 
Venturing into Large Hadoop Clusters
Venturing into Large Hadoop ClustersVenturing into Large Hadoop Clusters
Venturing into Large Hadoop ClustersVARUN SAXENA
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Weblogic performance tuning2
Weblogic performance tuning2Weblogic performance tuning2
Weblogic performance tuning2Aditya Bhuyan
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningAditya Bhuyan
 
Weblogic Cluster performance tuning
Weblogic Cluster performance tuningWeblogic Cluster performance tuning
Weblogic Cluster performance tuningAditya Bhuyan
 
Weblogic performance tuning1
Weblogic performance tuning1Weblogic performance tuning1
Weblogic performance tuning1Aditya Bhuyan
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPOmkar Joshi
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...Papitha Velumani
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...DataStax Academy
 
Speed up your XPages Application performance
Speed up your XPages Application performanceSpeed up your XPages Application performance
Speed up your XPages Application performanceMaarga Systems
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET Journal
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guideVinay Kumar
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 

Similar to Essential YARN and MapReduce configurations for Hadoop cluster optimization (20)

Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource Allocation
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Venturing into Large Hadoop Clusters
Venturing into Large Hadoop ClustersVenturing into Large Hadoop Clusters
Venturing into Large Hadoop Clusters
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Weblogic performance tuning2
Weblogic performance tuning2Weblogic performance tuning2
Weblogic performance tuning2
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuning
 
Weblogic Cluster performance tuning
Weblogic Cluster performance tuningWeblogic Cluster performance tuning
Weblogic Cluster performance tuning
 
Weblogic performance tuning1
Weblogic performance tuning1Weblogic performance tuning1
Weblogic performance tuning1
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Speed up your XPages Application performance
Speed up your XPages Application performanceSpeed up your XPages Application performance
Speed up your XPages Application performance
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 

Recently uploaded

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Recently uploaded (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Essential YARN and MapReduce configurations for Hadoop cluster optimization

  • 1. Presented by Rohith Sharma, Naganarasimha & Sunil
  • 2. About us.. Rohith Sharma K S, -Hadoop Committer, Works for Huawei -5+ year of experience in Hadoop ecosystems Naganarasimha G R, -Apache Hadoop Contributor for YARN, Huawei -4+ year of experience in Hadoop ecosystems Sunil Govind -Apache Hadoop Contributor for YARN and MapReduce -3+ year of experience in Hadoop ecosystems
  • 3. Agenda ➔Overview about general cluster deployment ➔Yarn cluster resource configurations walk through ➔Anti Patterns ◆ MapReduce ◆ YARN ● RM Restart/HA ● Queue Planning ➔Summary
  • 4. Brief Overview: General Cluster Deployment A sample Hadoop Cluster Layout with HA NM DNRM (Master) NN (Master) RM (Backup) NN (Backup) NM NM NM DN DN DN Client ATS RM - Resource Manager NM - Node Manager NN - Name Node DN - Data Node ATS - Application Timeline Server ZK - ZooKeeper ZK ZK ZK ZooKeeper Cluster
  • 5. YARN Configuration : An Example Legacy NodeManager’s or DataNode’s were having low resource configurations. Nowadays most of the systems has high end capability and customers wants high end machines with less number of nodes (50~100 nodes) to achieve better performance. Sample NodeManager configurations could be like: -64 GB in Memory -8/16 cores of CPU -1Gb Network cards -100 TB disk (or Disk Arrays) We are now more focussing on these set of deployment and will try to cover anti-patterns OR best usages in coming slides.
  • 6. YARN Configuration: Related to Resources NodeManager: ●yarn.nodemanager.resource.memory-mb ●yarn.nodemanager.resource.cpu-vcores ●yarn.nodemanager.vmem-pmem-ratio ●yarn.nodemanager.log-dirs ●yarn.nodemanager.local-dirs Scheduler: ●yarn.scheduler.minimum-allocation-mb ●yarn.scheduler.maximum-allocation-mb MapReduce: ●mapreduce.map/reduce.java.opts ●mapreduce.map/reduce.memory.mb ●mapreduce.map/reduce.cpu.vcores YARN and MR has these various resource tuning configurations to help for a better resource allocation. ●With “vmem-pmem-ratio” (2:1 for example), Node Manager can kill container if its Virtual Memory shoots twice to its configured memory usage. ●It’s advised to configure “local-dirs” and “log-dirs” in different mount points.
  • 8. Container Memory Vs Container Heap Memory Customer : “Enough container memory is configured, still job runs slowly and sometimes when data is relatively more, tasks fails with OOM” Resolution: 1.Container memory and container Heap Size both are different configurations. 2.Make sure if mapreduce.map/reduce.memory.mb is configured then configure mapreduce.map/reduce.java.opts for heap size. 3.Since this was common mistake from users, currently in trunk we have handled this scenario. RM will set 0.8 of container configured/requested memory as its heap memory. 1. if mapreduce.map/reduce.memory.mb values are specified, but no -Xmx is supplied for mapreduce.map/reduce.java.opts keys, then the -Xmx value will be derived from the former's value. 2. For both these conversions, a scaling factor specified by property mapreduce.job.heap.memory- mb.ratio is used (default 80%), to account for overheads between heap usage vs. actual physical memory usage.
  • 9. Shuffle phase is taking long time Customer: “500 GB data Job finished in 4 hours, and on the cluster 1000 GB data job is running since 12 hours in reducer phase. I think job is stuck.” After enquiring more about resource configuration, The same resource configurations used for both the jobs Resolution: 1.Job is NOT hanged/stuck, rather time has spent on copying map output. 2.Increase the task resources 3.Tuning configurations mapreduce.reduce.shuffle.parallelcopies mapreduce.reduce.shuffle.input.buffer.percent
  • 11. RM Restart : RMStateStore Limit Customer: “Configured to yarn.resourcemanager.max-completed-applications to 100000. Completed applications in cluster has reached the limit and there many applications are in running. Observation is RM service to be up, takes 10-15 seconds” Resolution: 1.It is NOT suggested to configure 100000 max-completed-applications. 2.Suggested to use TimelimeServer for history of YARN applications 3.Higher the value significantly impact on the RM recovery
  • 13. Queue planning : Queue Mapping
  • 14. Queue planning : Queue Capacity Planning and Preemption
  • 15. Queue planning : Queue Capacity Planning for multiple users Customer : “I have multiple users submitting apps to a queue, seems like all the resources have been taken by single user’s app(s) though other apps are activated“ Queue Capacity Planning : CS provides options to control resources used by different users under a queue. yarn.scheduler.capacity.<queue- path>.minimum-user-limit-percent and yarn.scheduler.capacity.<queue-path>.user-limit-factor are the configurations which determines what amount of resources each user gets yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent defaults to 100% which implies no user limits are imposed. This defines how much minimum resource each user is going to get. yarn.scheduler.capacity.<queue-path>.user-limit-factor defaults to 1 which implies that a single user can never take complete queue’s resources. Needs to be configured such that even when other users are not using the queue, how much a particular user can take.
  • 16. Queue planning : AM Resource Limit Customer: “Hey buddy, most of my Jobs are in ACCEPTED state and never starts to run. What should be the problem?” “All my Jobs were running fine. But after RM switchover, few Jobs didn’t resume its work. Why RM is not able to allocate new containers to these Jobs?” Resolution: 1.User need to ensure that AM Resource Limit is properly configured w.r.t the User’s deployment needs. Maximum resource limit for running AM containers need to be analyzed and configured correctly to ensure effective progress of applications. a. Refer yarn.scheduler.capacity.maximum-am-resource-percent 2.After RM switchover if few NMs were not registered back, it can result a change in cluster size compared to what was there prior to failover. This will affect the AM Resource Limit, and hence less AMs will be activated after restart. 3.For analytical : more AM limit, For Batch queries : less AM limit
  • 17. Queue planning : Application Priority within Queue Customer : “I have many applications running in my cluster, and few are very important jobs which has to execute fast. I now use separate queues to run some very important applications. Configuration seems very complex here and I feel cluster resources are not utilized well because of this.” Resolution: root sales (50%) inventory(50%) low 40% high 20% med 40% low 40% high 20% med 40% Configuration seems very complex for this case and cluster resources may not be utilized very well. Suggesting to use Application Priority instead.
  • 18. Resolution: Application Priority will be available in YARN from 2.8 release onwards. A brief heads-up about this feature. 1.Configure “yarn.cluster.max-application-priority” in yarn-site.xml. This will be the maximum priority for any user/application which can be configured. 2.Within a queue, currently applications are selected by using OrderingPolicy (FIFO/Fair). If applications are submitted with priority, Capacity Scheduler will also consider prioirity of application in FiFoOrderingPolicy. Hence an application with highest priority will always be picked for resource allocation. 3.For MapReduce, use “mapreduce.job.priority” to set priority. Application Priority within Queue (contd..)
  • 19. Resource Request Limits Customer: “I am not very sure about the capacity of node managers and maximum-allocation resource configuration. But my application is not getting any containers or its getting killed.” Resolution/Suggestion: NMs are not having more than 6GB memory. If container request has big memory/cpu demand which may more than a node manager’s memory and less than default “maximum-allocation-mb”, then container requests will not be served by RM. Unfortunately this is not thrown as an error to the user side, and application will continuously wait for allocation. On the other hand, Scheduler will also be waiting for some nodes to meet this heavy resource requests. User yarn.scheduler.maximum-allocation-mb and yarn.scheduler.maximum-allocation-vcores effectively by looking up on the NodeManager memory/cpu limit.
  • 20. Reservation Issue Customer : “My Application has reserved container in a node and never able to get new containers.” Resolution: Reservation feature in Capacity Scheduler serves a great deal to ensure a better linear resource allocation. However it’s possible that there can be few corner cases. For example, an application has made a reservation to a node. But this node has various containers running (long-lived), so chances of getting some free resources from this node is minimal in an immediate time frame. Configurations like below can help in having some time-framed reservation for effective cluster usage. ●yarn.scheduler.capacity.reservations-continue-look-all-nodes will help in looking for a suitable resource in other nodes too.
  • 21. Suggestions in Resource Configuration