SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Yarns about YARN:
Migrating to MapReduce v2	
  
Kathleen	
  Ting	
  |	
  @kate_0ng	
  	
  
Technical	
  Account	
  Manager,	
  Cloudera	
  |	
  Sqoop	
  PMC	
  Member	
  	
  
Big	
  Data	
  Camp	
  LA	
  
June	
  14,	
  2014	
  
Who Am I?
•  Started 3 yr ago as 1st Cloudera Support Eng
•  Now manages Cloudera’s 2 largest customers
•  Sqoop Committer, PMC Member
•  Co-Author of the Apache Sqoop Cookbook
•  MRv1 misconfig talk viewed 20k on slideshare
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
MR2 Motivation
•  Higher cluster utilization
•  Recommended MRv1 run only at 70% cap
•  Resources not used can be consumed by another
•  Lower operational costs
•  One cluster running MR, Spark, Impala, etc
•  Don’t need to transfer data between clusters
•  Not restricted to < 5k cluster
MRv2 Architecture
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
10	
  
MapReduce is Central to Hadoop
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
What are Misconfigurations?
•  Issues requiring change to Hadoop or to OS config files
•  Comprises 35% of Cloudera Support Tickets
•  e.g. resource-allocation: memory, file-handles, disk-space
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
1. Task Out Of Memory Error (MRv1)
FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError:
Java heap space
at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.<init>
•  What does it mean?
o  Memory leak in task code
•  What causes this?
o  MR task heap sizes will not fit
1. Task Out Of Memory Error (MRv1)
o  MRv1 TaskTracker:
o  mapred.child.ulimit > 2*mapred.child.java.opts
o  0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts
o  MRv1 DataNode:
o  Use short pathnames for dfs.data.dir names
o  e.g. /data/1, /data/2, /data/3
o  Increase DN heap
o  MRv2:
o  Manual tuning of io.sort.record.percent not needed
o  Tune mapreduce.map|reduce.memory.mb
o  mapred.child.ulimit = yarn.nodemanager.vmem-pmem-ratio
o  Moot if yarn.nodemanager.vmem-check-enabled is disabled
2. JobTracker Out of Memory Error
ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProg
ress.java:122)
•  What does it mean?
o  Total JT memory usage > allocated RAM
•  What causes this?
o  Tasks too small
o  Too much job history
2. JobTracker Out of Memory Error
•  How can it be resolved?
o  sudo	
  -­‐u	
  mapreduce	
  jmap	
  -­‐histo:live	
  <pid>	
  	
  
o  histogram	
  of	
  what	
  objects	
  the	
  JVM	
  has	
  allocated	
  
o  Increase JT heap
o  Don’t co-locate JT and NN
o  mapred.job.tracker.handler.count = ln(#TT)*20
o  mapred.jobtracker.completeuserjobs.maximum = 5
o  mapred.job.tracker.retiredjobs.cache.size = 100
o  mapred.jobtracker.retirejob.interval = 3600000
o  YARN has Uber AMs (run in single JVM)
o  One AM per MR job
o  Not restricted to keeping 5 jobs in memory
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
Fetch Failures
3. Too Many Fetch-Failures
MR1: INFO org.apache.hadoop.mapred.JobInProgress:
Too many fetch-failures for output of task
MR2: ERROR
org.apache.hadoop.mapred.ShuffleHandler: Shuffle
error:
java.io.IOException: Broken pipe
•  What does it mean?
o  Reducer fetch operations fail to retrieve mapper outputs
o  Too many could blacklist the TT
•  What causes this?
o  DNS issues
o  Not enough http threads on the mapper side
o  Not enough connections
3. Too Many Fetch-Failures
MR1:
o  mapred.reduce.slowstart.completed.maps = 0.80
o  Unblocks other reducers to run while a big job waits on mappers
o  tasktracker.http.threads = 80
o  Increases threads used to serve map output to reducers
o  mapred.reduce.parallel.copies = SQRT(Nodes), floor of 10
o  Allows reducers to fetch map output in parallel
MR2:
o  Set ShuffleHandler configs:
o  yarn.nodemanager.aux-services = mapreduce_shuffle
o  yarn.nodemanager.aux-services.mapreduce_shuffle.class =
org.apache.hadoop.mapred.ShuffleHandler
o  tasktracker.http.threads N/A
o  max # of threads is based on # of processors on machine
o  Uses Netty, allowing up to twice as many threads as there are processors
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
4. Federation: Just (Don’t) Do It
= spreads FS metadata across NNs
= is stable (but ViewFS isn’t)
= is meant for 1k+ nodes
≠ multi-tenancy
≠ horizontally scale namespaces
è NN HA + YARN
è RPC QoS
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
5. Optimizing YARN Virtual Memory Usage
Problem:	
  	
  
Current usage: 337.6 MB of 1 GB physical
memory used; 2.2 GB of 2.1 GB virtual memory
used. Killing container.
	
  
Solution:	
  
•  Set yarn.nodemanager.vmem-check-enabled = false
•  Determine AM container size:
yarn.app.mapreduce.am.resource.cpu-vcores
yarn.app.mapreduce.am.resource.mb
•  Sizing the AM: 1024mb (-Xmx768m)
•  Can be smaller because only storing one job unless using Uber
6. CPU Isolation in YARN Containers
•  mapreduce.map.cpu.vcores
mapreduce.reduce.cpu.vcores
(per-job config)
•  yarn.nodemanager.resource.cpu-vcores
(slave service side resource config)
•  yarn.scheduler.minimum-allocation-vcores
yarn.scheduler.maximum-allocation-vcores
(scheduler allocation control configs)
•  yarn.nodemanager.linux-container-
executor.resources-handler.class
(turn on cgroups in NM)
7. Understanding YARN Virtual Memory	
  
Situation:
yarn.nodemanager.resource.cpu-vcores
> actual cores
yarn.nodemanager.resource.memory-mb
> RAM
Effect:
- Exceeding cores = sharing existing cores, slower
-  Exceeding RAM = swapping, OOM
Bonus: Fair Scheduler Errors
ERROR
org.apache.hadoop.yarn.server.resourc
emanager.scheduler.fair.FairScheduler
: Request for appInfo of unknown
attemptappattempt_1395214170909_0059_
000001
	
  
Harmless	
  message	
  fixed	
  in	
  YARN-­‐1785	
  
YARN Resources
•  Migrating to MR2 on YARN:
•  For Operators:
http://blog.cloudera.com/blog/2013/11/migrating-to-
mapreduce-2-on-yarn-for-operators/
•  For Users:
http://blog.cloudera.com/blog/2013/11/migrating-to-
mapreduce-2-on-yarn-for-users/
•  http://blog.cloudera.com/blog/2014/04/apache-hadoop-
yarn-avoiding-6-time-consuming-gotchas/
•  Getting MR2 Up to Speed:
•  http://blog.cloudera.com/blog/2014/02/getting-
mapreduce-2-up-to-speed/
Takeaways
•  Want to DIY?
•  Take Cloudera’s Admin Training - now with 4x the labs
•  Get it right the first time with monitoring tools.
•  "Yep - we were able to download/install/configure/
setup a Cloudera Manager cluster from scratch in
minutes :)”
•  Want misconfig updates?
•  Follow @kate_ting

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Yahoo Developer Network
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
Cloudera, Inc.
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
DataWorks Summit
 

Was ist angesagt? (20)

Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
ha_module5
ha_module5ha_module5
ha_module5
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 

Andere mochten auch

Andere mochten auch (20)

Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 

Ähnlich wie Yarn cloudera-kathleenting061414 kate-ting

Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
mundlapudi
 

Ähnlich wie Yarn cloudera-kathleenting061414 kate-ting (20)

Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learning
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Yarn
YarnYarn
Yarn
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
 

Mehr von Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Mehr von Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Yarn cloudera-kathleenting061414 kate-ting

  • 1. Yarns about YARN: Migrating to MapReduce v2   Kathleen  Ting  |  @kate_0ng     Technical  Account  Manager,  Cloudera  |  Sqoop  PMC  Member     Big  Data  Camp  LA   June  14,  2014  
  • 2. Who Am I? •  Started 3 yr ago as 1st Cloudera Support Eng •  Now manages Cloudera’s 2 largest customers •  Sqoop Committer, PMC Member •  Co-Author of the Apache Sqoop Cookbook •  MRv1 misconfig talk viewed 20k on slideshare
  • 3. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 4. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 5.
  • 6. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 7. MR2 Motivation •  Higher cluster utilization •  Recommended MRv1 run only at 70% cap •  Resources not used can be consumed by another •  Lower operational costs •  One cluster running MR, Spark, Impala, etc •  Don’t need to transfer data between clusters •  Not restricted to < 5k cluster
  • 9. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 10. 10  
  • 11. MapReduce is Central to Hadoop
  • 12. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 13. What are Misconfigurations? •  Issues requiring change to Hadoop or to OS config files •  Comprises 35% of Cloudera Support Tickets •  e.g. resource-allocation: memory, file-handles, disk-space
  • 14. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 15. 1. Task Out Of Memory Error (MRv1) FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask $MapOutputBuffer.<init> •  What does it mean? o  Memory leak in task code •  What causes this? o  MR task heap sizes will not fit
  • 16. 1. Task Out Of Memory Error (MRv1) o  MRv1 TaskTracker: o  mapred.child.ulimit > 2*mapred.child.java.opts o  0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts o  MRv1 DataNode: o  Use short pathnames for dfs.data.dir names o  e.g. /data/1, /data/2, /data/3 o  Increase DN heap o  MRv2: o  Manual tuning of io.sort.record.percent not needed o  Tune mapreduce.map|reduce.memory.mb o  mapred.child.ulimit = yarn.nodemanager.vmem-pmem-ratio o  Moot if yarn.nodemanager.vmem-check-enabled is disabled
  • 17.
  • 18. 2. JobTracker Out of Memory Error ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProg ress.java:122) •  What does it mean? o  Total JT memory usage > allocated RAM •  What causes this? o  Tasks too small o  Too much job history
  • 19. 2. JobTracker Out of Memory Error •  How can it be resolved? o  sudo  -­‐u  mapreduce  jmap  -­‐histo:live  <pid>     o  histogram  of  what  objects  the  JVM  has  allocated   o  Increase JT heap o  Don’t co-locate JT and NN o  mapred.job.tracker.handler.count = ln(#TT)*20 o  mapred.jobtracker.completeuserjobs.maximum = 5 o  mapred.job.tracker.retiredjobs.cache.size = 100 o  mapred.jobtracker.retirejob.interval = 3600000 o  YARN has Uber AMs (run in single JVM) o  One AM per MR job o  Not restricted to keeping 5 jobs in memory
  • 20. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 22.
  • 23. 3. Too Many Fetch-Failures MR1: INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task MR2: ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error: java.io.IOException: Broken pipe •  What does it mean? o  Reducer fetch operations fail to retrieve mapper outputs o  Too many could blacklist the TT •  What causes this? o  DNS issues o  Not enough http threads on the mapper side o  Not enough connections
  • 24. 3. Too Many Fetch-Failures MR1: o  mapred.reduce.slowstart.completed.maps = 0.80 o  Unblocks other reducers to run while a big job waits on mappers o  tasktracker.http.threads = 80 o  Increases threads used to serve map output to reducers o  mapred.reduce.parallel.copies = SQRT(Nodes), floor of 10 o  Allows reducers to fetch map output in parallel MR2: o  Set ShuffleHandler configs: o  yarn.nodemanager.aux-services = mapreduce_shuffle o  yarn.nodemanager.aux-services.mapreduce_shuffle.class = org.apache.hadoop.mapred.ShuffleHandler o  tasktracker.http.threads N/A o  max # of threads is based on # of processors on machine o  Uses Netty, allowing up to twice as many threads as there are processors
  • 25. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 26. 4. Federation: Just (Don’t) Do It = spreads FS metadata across NNs = is stable (but ViewFS isn’t) = is meant for 1k+ nodes ≠ multi-tenancy ≠ horizontally scale namespaces è NN HA + YARN è RPC QoS
  • 27. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 28. 5. Optimizing YARN Virtual Memory Usage Problem:     Current usage: 337.6 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.   Solution:   •  Set yarn.nodemanager.vmem-check-enabled = false •  Determine AM container size: yarn.app.mapreduce.am.resource.cpu-vcores yarn.app.mapreduce.am.resource.mb •  Sizing the AM: 1024mb (-Xmx768m) •  Can be smaller because only storing one job unless using Uber
  • 29. 6. CPU Isolation in YARN Containers •  mapreduce.map.cpu.vcores mapreduce.reduce.cpu.vcores (per-job config) •  yarn.nodemanager.resource.cpu-vcores (slave service side resource config) •  yarn.scheduler.minimum-allocation-vcores yarn.scheduler.maximum-allocation-vcores (scheduler allocation control configs) •  yarn.nodemanager.linux-container- executor.resources-handler.class (turn on cgroups in NM)
  • 30. 7. Understanding YARN Virtual Memory   Situation: yarn.nodemanager.resource.cpu-vcores > actual cores yarn.nodemanager.resource.memory-mb > RAM Effect: - Exceeding cores = sharing existing cores, slower -  Exceeding RAM = swapping, OOM
  • 31. Bonus: Fair Scheduler Errors ERROR org.apache.hadoop.yarn.server.resourc emanager.scheduler.fair.FairScheduler : Request for appInfo of unknown attemptappattempt_1395214170909_0059_ 000001   Harmless  message  fixed  in  YARN-­‐1785  
  • 32. YARN Resources •  Migrating to MR2 on YARN: •  For Operators: http://blog.cloudera.com/blog/2013/11/migrating-to- mapreduce-2-on-yarn-for-operators/ •  For Users: http://blog.cloudera.com/blog/2013/11/migrating-to- mapreduce-2-on-yarn-for-users/ •  http://blog.cloudera.com/blog/2014/04/apache-hadoop- yarn-avoiding-6-time-consuming-gotchas/ •  Getting MR2 Up to Speed: •  http://blog.cloudera.com/blog/2014/02/getting- mapreduce-2-up-to-speed/
  • 33. Takeaways •  Want to DIY? •  Take Cloudera’s Admin Training - now with 4x the labs •  Get it right the first time with monitoring tools. •  "Yep - we were able to download/install/configure/ setup a Cloudera Manager cluster from scratch in minutes :)” •  Want misconfig updates? •  Follow @kate_ting