Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Introduction to YARN and
MapReduce 2

© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without pr...
Course Objectives
 Intended Audience
–Developers, data analysts and system administrators familiar with
MapReduce 1

 Af...
Course Topics
Introduction to YARN and
MapReduce 2
 Overview of MapReduce 1 and 2
 YARN Architecture
 MapReduce v2
 Ma...
MRv1 and MRv2
 MapReduce 1 (“Classic”) has three main components
–API – for user-level programming of MR applications
–Fr...
MapReduce2 History
 Originally architected at Yahoo in 2008
 “Alpha” in Hadoop 2 pre-GA
–Included in CDH 4
 YARN promot...
Chapter Topics
Introduction to YARN and
MapReduce 2
 Introduction: MapReduce 1 and 2
 YARN Architecture
 MapReduce 2
 ...
Why is YARN needed? (1)
 MRv1 resource management issues
–Inflexible “slots” configured on nodes – Map or Reduce, not bot...
Why is YARN needed? (2)
 YARN solutions
–No slots
–Nodes have “resources” – memory and CPU cores – which are
allocated to...
YARN Daemons
 Resource Manager (RM)
–Runs on master node
–Global resource scheduler
–Arbitrates system resources between ...
Running an Application in YARN
 Containers
–Created by the RM upon request
–Allocate a certain amount of resources
(memor...
YARN Cluster
NodeManager

NodeManager

NodeManager

Resource
Manager
NodeManager

© Copyright 2010-2013 Cloudera. All righ...
YARN Cluster: Running an Application
NodeManager

Client

NodeManager

Application:
MyApp

Launch

NodeManager
Application...
YARN Cluster: Running an Application
NodeManager

Client

NodeManager

Application:
MyApp

NodeManager

Resource
Manager

...
YARN Cluster: Running an Application
Client

Application:
MyApp

NodeManager

NodeManager
MyApp
Launch

NodeManager
Applic...
YARN Cluster: Running an Application
Client

Application:
MyApp

Client

Application:
YourApp

NodeManager

NodeManager
Ap...
YARN Cluster: Running an Application
Client

Client

NodeManager
YourApp

Application:
MyApp

Application:
YourApp

NodeMa...
YARN Schedulers (1)
 Pluggable in Resource Manager
 YARN includes two schedulers
–CapacityScheduler
–FairScheduler
 How...
YARN Schedulers (2)
 Hierarchical queues
–Queues can contain subqueues
–Sub-queues share the
resources assigned to
queues...
Resource Manager Things to Know
 What it does
Resource
–Manages nodes
Manager
–Tracks heartbeats from NodeManagers
–Manag...
Node Manager Things to Know
NodeManager
 What it does
–Communicates with the RM
–Registers and provides info on node reso...
Resource Requests
Resource Request
• Resource name (hostname, rackname or *)
• Priority (within this application, not betw...
Launch Container

Container Launch Context
• Container ID
• Commands (to start application)
• Environment (configuration)
...
Non-MR2 YARN Applications
 Distributed Shell
 Impala
 Apache Giraph
 Spark
 Others
–http://wiki.apache.org/hadoop/Pow...
Chapter Topics
Introductions to YARN and
MapReduce 2
 Overview of MapReduce 1 and 2
 YARN Architecture
 MapReduce 2
 M...
YARN and MapReduce
 YARN does not know or care what kind of application it is running
–Could be MR or something else (e.g...
Running a MapReduce Application in MRv2
NodeManager

DataNode

NodeManager

DataNode

MR Job
History
Server

NodeManager

...
Running a MapReduce Application in MRv2
NodeManager

HDFS

DataNode
Block1

NodeManager

DataNode
Block2

NodeManager

Dat...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager

DataNode
Block1

Client
...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager

DataNode
Block1

Client
...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager

DataNode
Block1

Client
...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager
WordCount
Map Task

DataN...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager
WordCount
Map Task

DataN...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager

DataNode
Block1

Client
...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager

DataNode
Block1

Client
...
Running a MapReduce Application in MRv2
$ hadoop jar wc.jar
WordCount mydata output

NodeManager

DataNode
Block1

Client
...
The MapReduce Framework on YARN
 In YARN, Shuffle is run as an auxiliary service
–Runs in the NodeManager JVM as a persis...
Fault Tolerance
 Any of the following can fail
–Task (Container) – Handled just like in MRv1
–MRAppMaster will re-attempt...
Fault Tolerance
 Any of the following can fail
–Task (Container) – Handled just like in MRv1
–MRAppMaster will re-attempt...
Fault Tolerance
 Any of the following can fail
–NodeManager
–If NM stops sending heartbeats to RM, it is removed from lis...
Chapter Topics
Introduction to YARN and
MapReduce 2
 Overview of MapReduce 1 and 2
 YARN Architecture
 MapReduce 2
 Ma...
Resource Manager UI: Nodes
http://rmhost:8088/cluster/nodes

Cluster Overview

link to Node
Manager UI

List of each node
...
Resource Manager UI: Applications
http://rmhost:8088/cluster/apps

Cluster Overview

Link to
Application
Details…

List of...
Resource Manager UI: Application Detail
http://rmhost:8088/cluster/app/appid

Link to
Application
Master…

© Copyright 201...
MRAppMaster UI: Jobs
http://rmhost:8088/proxy/appid

Link to Job
Details…

© Copyright 2010-2013 Cloudera. All rights rese...
MRAppMaster UI: Tasks
http://rmhost:8088/proxy/appid/mapreduce/job/jobid

© Copyright 2010-2013 Cloudera. All rights reser...
MR Job History Server
 YARN does not keep track of job history
 MapReduce Job History Server
–Archives job’s metrics and...
Cloudera Manager
 Full support for MR2 added in CM 5

© Copyright 2010-2013 Cloudera. All rights reserved. Not to be repr...
Hue
 Hue supports browsing MRv2 jobs
–Connects to Job History Server for “retired” jobs

© Copyright 2010-2013 Cloudera. ...
Chapter Topics
Introduction to YARN and
MapReduce 2
 Overview of MapReduce 1 and 2
 YARN Architecture
 MapReduce 2
 Ma...
CDH 4 and CDH 5
CDH 4

 CDH 4 includes MRv2 “preview”

 CDH 5 MRv2 is “production ready”
–Cloudera officially recommends...
Compatibility
 CDH
–Complete binary compatibility – programs compiled for MRv1 will run
without recompilation
–Source com...
MR2 and the Hadoop Ecosystem
Cloudera Enterprise 4
Cloudera Manager
MRv1
 Cloudera 5 includes MR2 support for:
(productio...
Chapter Topics
Introduction to YARN and
MapReduce 2
 Overview of MapReduce 1 and 2
 YARN Architecture
 MapReduce 2
 Ma...
Key Points
 MRv2 is based on YARN
–YARN replaces JobTracker/TaskTracker architecture of MRv1
–ResourceManager handles res...
Where To Learn More

 Hadoop: The Definitive Guide, 3rd Edition
–Chapter 6 – focuses on how MR is implemented on YARN

 ...
© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

56
• Submit questions in the Q&A panel
• Download Cloudera Enterprise 5 beta:
http://go.cloudera.com/c5-beta
• Follow Clouder...
Nächste SlideShare
Wird geladen in …5
×

Introduction to YARN and MapReduce 2

68.546 Aufrufe

Veröffentlicht am

As part of the recent release of Hadoop 2 by the Apache Software Foundation, YARN and MapReduce 2 deliver significant upgrades to scheduling, resource management, and execution in Hadoop.

At their core, YARN and MapReduce 2’s improvements separate cluster resource management capabilities from MapReduce-specific logic. YARN enables Hadoop to share resources dynamically between multiple parallel processing frameworks such as Cloudera Impala, allows more sensible and finer-grained resource configuration for better cluster utilization, and scales Hadoop to accommodate more and larger jobs.

Veröffentlicht in: Technologie, Business
  • Unlock Her Legs(Official) $69 | Get 90% Off + 8 Special Bonus? ◆◆◆ http://scamcb.com/unlockher/pdf
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • How to start a wildly profitable 7 figure marketing business and get your first commission check tonight, click here ♣♣♣ https://bit.ly/2kS5a5J
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • This is very good article. It will be very helpful if this is available to download.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • More than 5000 registered IT consultants and Corporates.Search for IT online training Providers at http://www.todaycourses.com
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • how can i download it?
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Introduction to YARN and MapReduce 2

  1. 1. Introduction to YARN and MapReduce 2 © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 1 201310b
  2. 2. Course Objectives  Intended Audience –Developers, data analysts and system administrators familiar with MapReduce 1  After attending this course, you will understand –The motive behind YARN and MapReduce 2 –Key differences between MapReduce 1 and 2 –How YARN manages resources in a cluster –The lifecycle of an MRv2 job on a YARN cluster –Tools for managing a YARN cluster –How MapReduce 2 fits into Cloudera Enterprise and the Enterprise Data Hub © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 2
  3. 3. Course Topics Introduction to YARN and MapReduce 2  Overview of MapReduce 1 and 2  YARN Architecture  MapReduce v2  Managing a YARN Cluster  Cloudera and MRv2  Conclusion © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 3
  4. 4. MRv1 and MRv2  MapReduce 1 (“Classic”) has three main components –API – for user-level programming of MR applications –Framework – runtime services for running Map and Reduce processes, shuffling and sorting, etc. –Resource management – infrastructure to monitor nodes, allocate resources, and schedule jobs  MapReduce 2 (“NextGen”) moves Resource Management into YARN MapReduce 2 MapReduce 1 API MR API Framework Framework Resource Management YARN YARN API Resource Management © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 4
  5. 5. MapReduce2 History  Originally architected at Yahoo in 2008  “Alpha” in Hadoop 2 pre-GA –Included in CDH 4  YARN promoted to Apache Hadoop sub-project –summer 2013  “Production ready” in Hadoop 2 GA –Included in CDH5 (Beta in Oct 2013) Hadoop 0.20 HDFS MRv1 Hadoop Common Hadoop 2.0 (pre-GA) HDFS MRv2/YARN Hadoop Common Hadoop 2.2 (GA) HDFS MRv2 YARN Hadoop Common © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 5
  6. 6. Chapter Topics Introduction to YARN and MapReduce 2  Introduction: MapReduce 1 and 2  YARN Architecture  MapReduce 2  Managing a YARN Cluster  Cloudera and MRv2  Conclusion © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 6
  7. 7. Why is YARN needed? (1)  MRv1 resource management issues –Inflexible “slots” configured on nodes – Map or Reduce, not both –Underutilization of cluster when more map or reduce tasks are running –Can’t share resources with non-MR applications running on Hadoop cluster (e.g. Impala, Giraph) –Scalability – one JobTracker per cluster – limit of about 4000 nodes per cluster © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 7
  8. 8. Why is YARN needed? (2)  YARN solutions –No slots –Nodes have “resources” – memory and CPU cores – which are allocated to applications when requested –Supports MR and non-MR applications running on the same cluster –Most Job Tracker functions moved to Application Master – one cluster can have many Application Masters Resource Manager Job Tracker Application Application Master Application Master Master © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 8
  9. 9. YARN Daemons  Resource Manager (RM) –Runs on master node –Global resource scheduler –Arbitrates system resources between competing applications  Node Manager (NM) –Runs on slave nodes –Communicates with RM Resource Manager NodeManager NodeManager © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 9
  10. 10. Running an Application in YARN  Containers –Created by the RM upon request –Allocate a certain amount of resources (memory, CPU) on a slave node –Applications run in one or more containers  Application Master (AM) –One per application –Framework/application specific –Runs in a container –Requests more containers to run application tasks NodeManager 1 Gb 1 core 3 Gb 1 core NodeManager Application Master © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 10
  11. 11. YARN Cluster NodeManager NodeManager NodeManager Resource Manager NodeManager © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 11
  12. 12. YARN Cluster: Running an Application NodeManager Client NodeManager Application: MyApp Launch NodeManager Application Master Resource Manager NodeManager © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 12
  13. 13. YARN Cluster: Running an Application NodeManager Client NodeManager Application: MyApp NodeManager Resource Manager Resource Request Application Master Container IDs NodeManager © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 13
  14. 14. YARN Cluster: Running an Application Client Application: MyApp NodeManager NodeManager MyApp Launch NodeManager Application Master Resource Manager Launch NodeManager MyApp © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 14
  15. 15. YARN Cluster: Running an Application Client Application: MyApp Client Application: YourApp NodeManager NodeManager Application Master MyApp NodeManager Application Master Resource Manager NodeManager MyApp © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 15
  16. 16. YARN Cluster: Running an Application Client Client NodeManager YourApp Application: MyApp Application: YourApp NodeManager Application Master MyApp NodeManager Resource Manager YourApp Application Master NodeManager MyApp © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 16
  17. 17. YARN Schedulers (1)  Pluggable in Resource Manager  YARN includes two schedulers –CapacityScheduler –FairScheduler  How are these different than MRv1 schedulers? –Support any YARN application, not just MR –No more “slots” – tasks are allocated based on resources (memory and CPU for now) –FairScheduler: pools are now called queues © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 17
  18. 18. YARN Schedulers (2)  Hierarchical queues –Queues can contain subqueues –Sub-queues share the resources assigned to queues Website Logs ETL Product A Product B Fast Lane Regular Data Anal Engineering weight=2 Marketing weight=1 © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 18
  19. 19. Resource Manager Things to Know  What it does Resource –Manages nodes Manager –Tracks heartbeats from NodeManagers –Manages containers –Handles AM requests for resources –De-allocates containers when they expire or the application completes –Manages ApplicationMasters –Creates a container for AMs and tracks heartbeats –Manages security –Supports Kerberos © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 19
  20. 20. Node Manager Things to Know NodeManager  What it does –Communicates with the RM –Registers and provides info on node resources –Sends heartbeats and container status –Manages processes in containers –Launches AMs on request from the RM –Launches application processes on request from AM –Monitors resource usage by containers; kills run-away processes –Provides logging services to applications –Aggregates logs for an application and saves them to HDFS –Runs auxiliary services –Maintains node level security via ACLs © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 20
  21. 21. Resource Requests Resource Request • Resource name (hostname, rackname or *) • Priority (within this application, not between applications) • Resource requirements • memory (MB) • CPU (# of cores) • more to come, e.g. disk and network I/O, GPUs, etc. • Number of containers NodeManager Application Master Resource Manager Container(s) • ID • Node © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 21
  22. 22. Launch Container Container Launch Context • Container ID • Commands (to start application) • Environment (configuration) • Local Resources (e.g. application binary, HDFS files) NodeManager MyApp NodeManager Application Master © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 22
  23. 23. Non-MR2 YARN Applications  Distributed Shell  Impala  Apache Giraph  Spark  Others –http://wiki.apache.org/hadoop/PoweredByYarn © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 23
  24. 24. Chapter Topics Introductions to YARN and MapReduce 2  Overview of MapReduce 1 and 2  YARN Architecture  MapReduce 2  Managing a YARN Cluster  Cloudera and MRv2  Conclusion © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 24
  25. 25. YARN and MapReduce  YARN does not know or care what kind of application it is running –Could be MR or something else (e.g. Impala)  MR2 uses YARN –Hadoop includes a MapReduce ApplicationMaster (MRAppMaster) to manage MR jobs –Each MapReduce job is an a new instance of an application MapReduce 2 MR API NodeManager MRAppMaster Framework YARN YARN API Resource Management © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 25
  26. 26. Running a MapReduce Application in MRv2 NodeManager DataNode NodeManager DataNode MR Job History Server NodeManager DataNode Name Node(s) NodeManager DataNode Resource Manager © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 26
  27. 27. Running a MapReduce Application in MRv2 NodeManager HDFS DataNode Block1 NodeManager DataNode Block2 NodeManager DataNode NodeManager mydata MR Job History Server DataNode Resource Manager Name Node(s) © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 27
  28. 28. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager DataNode Block1 Client NodeManager DataNode Block2 Launch NodeManager Resource Manager DataNode MRAppMaster NodeManager MR Job History Server Name Node(s) DataNode © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 28
  29. 29. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager DataNode Block1 Client NodeManager DataNode Block2 Resource Request: - 1 x Node1/1GB/1 core NodeManager - 1 x Node2/1GB/1 core Resource Manager DataNode MRAppMaster NodeManager MR Job History Server Name Node(s) DataNode © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 29
  30. 30. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager DataNode Block1 Client NodeManager DataNode Block2 NodeManager Resource Manager DataNode MRAppMaster MR Job History Server Name Node(s) “Here are your containers” NodeManager DataNode © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 30
  31. 31. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager WordCount Map Task DataNode Block1 Client NodeManager WordCount Map Task NodeManager Resource Manager DataNode Block2 DataNode MRAppMaster NodeManager MR Job History Server Name Node(s) DataNode © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 31
  32. 32. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager WordCount Map Task DataNode Block1 Client NodeManager WordCount Map Task Resource Request: NodeManager - 2 x */1GB/1 core Resource Manager DataNode Block2 DataNode MRAppMaster NodeManager MR Job History Server Name Node(s) DataNode © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 32
  33. 33. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager DataNode Block1 Client NodeManager DataNode Block2 NodeManager Resource Manager DataNode MRAppMaster MR Job History Server Name Node(s) “Here are your containers” NodeManager DataNode © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 33
  34. 34. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager DataNode Block1 Client NodeManager DataNode Block2 WordCount Reduce Task NodeManager Resource Manager DataNode MRAppMaster NodeManager MR Job History Server Name Node(s) DataNode WordCount Reduce Task © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 34
  35. 35. Running a MapReduce Application in MRv2 $ hadoop jar wc.jar WordCount mydata output NodeManager DataNode Block1 Client NodeManager DataNode Block2 WordCount Reduce Task NodeManager Resource Manager DataNode MRAppMaster “I’m done!” NodeManager MR Job History Server Name Node(s) DataNode WordCount Reduce Task © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 35
  36. 36. The MapReduce Framework on YARN  In YARN, Shuffle is run as an auxiliary service –Runs in the NodeManager JVM as a persistent service NodeManager HDFS input Map Task Shuffle Service Intermediate Data (local) NodeManager Reduce Task HDFS output NodeManager Reduce Task HDFS output © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 36
  37. 37. Fault Tolerance  Any of the following can fail –Task (Container) – Handled just like in MRv1 –MRAppMaster will re-attempt tasks that complete with exceptions or stop responding (4 times by default) –Applications with too many failed tasks are considered failed © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 37
  38. 38. Fault Tolerance  Any of the following can fail –Task (Container) – Handled just like in MRv1 –MRAppMaster will re-attempt tasks that complete with exceptions or stop responding (4 times by default) –Applications with too many failed tasks are considered failed –Application Master –If application fails or if AM stops sending heartbeats, RM will reattempt the whole application (2 times by default ) –MRAppMaster optional setting: Job recovery • if false, all tasks will re-run • if true, MRAppMaster retrieves state of tasks when it restarts; only incomplete tasks will be re-run © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 38
  39. 39. Fault Tolerance  Any of the following can fail –NodeManager –If NM stops sending heartbeats to RM, it is removed from list of active nodes –Tasks on the node will be treated as failed by MRAppMaster –If the App Master node fails, it will be treated as a failed application –ResourceManager –No applications or tasks can be launched if RM is unavailable –Can be configured with High Availability Resource Manager (active) Resource Manager (standby) © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 39
  40. 40. Chapter Topics Introduction to YARN and MapReduce 2  Overview of MapReduce 1 and 2  YARN Architecture  MapReduce 2  Managing a YARN Cluster  Migrating from MRv1 to MRv2  Conclusion © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 40
  41. 41. Resource Manager UI: Nodes http://rmhost:8088/cluster/nodes Cluster Overview link to Node Manager UI List of each node in cluster © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 41
  42. 42. Resource Manager UI: Applications http://rmhost:8088/cluster/apps Cluster Overview Link to Application Details… List of running and recent applications © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 42
  43. 43. Resource Manager UI: Application Detail http://rmhost:8088/cluster/app/appid Link to Application Master… © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 43
  44. 44. MRAppMaster UI: Jobs http://rmhost:8088/proxy/appid Link to Job Details… © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 44
  45. 45. MRAppMaster UI: Tasks http://rmhost:8088/proxy/appid/mapreduce/job/jobid © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 45
  46. 46. MR Job History Server  YARN does not keep track of job history  MapReduce Job History Server –Archives job’s metrics and metadata –Can be accessed through Job History UI or Hue MR Job History Server http://rmhost:19888/jobhistory © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 46
  47. 47. Cloudera Manager  Full support for MR2 added in CM 5 © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 47
  48. 48. Hue  Hue supports browsing MRv2 jobs –Connects to Job History Server for “retired” jobs © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 48
  49. 49. Chapter Topics Introduction to YARN and MapReduce 2  Overview of MapReduce 1 and 2  YARN Architecture  MapReduce 2  Managing a YARN cluster  Cloudera and MRv2  Conclusion © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 49
  50. 50. CDH 4 and CDH 5 CDH 4  CDH 4 includes MRv2 “preview”  CDH 5 MRv2 is “production ready” –Cloudera officially recommends using MRv2 in CDH5  CDH 5 supports both MRv1 and MRv2 –Running both on the same cluster is not supported MRv1 (production) MRv2 / YARN (preview) CDH 5 MRv2 MRv1 YARN © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 50
  51. 51. Compatibility  CDH –Complete binary compatibility – programs compiled for MRv1 will run without recompilation –Source compatibility for almost all programs –Small number of exceptions noted here: blog.cloudera.com: Migrating to MapReduce 2 on YARN Migration Path CDH4/MR2  CDH5/MR2 CDH4/MR1  CDH5/MR2 CDH5/MR1  CDH5/MR2 Binary Support? ✓ ✓ ✓ © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 51
  52. 52. MR2 and the Hadoop Ecosystem Cloudera Enterprise 4 Cloudera Manager MRv1  Cloudera 5 includes MR2 support for: (production) –Cloudera Manager and Hue –All ecosystem projects that use MR –Hive, Pig, Mahout, Crunch, etc. Cloudera Enterprise 5 –Impala Cloudera Manager MRv2 / YARN (preview) Cloudera Manager MR applications MRv1 MRv2 Impala YARN © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 52
  53. 53. Chapter Topics Introduction to YARN and MapReduce 2  Overview of MapReduce 1 and 2  YARN Architecture  MapReduce 2  Managing a YARN cluster  Cloudera and MRv2  Conclusion © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 53
  54. 54. Key Points  MRv2 is based on YARN –YARN replaces JobTracker/TaskTracker architecture of MRv1 –ResourceManager handles resource allocation and scheduling for the cluster –Multiple ApplicationMasters handle application-specific job processing  Why? –More scalable –More efficient –More flexible  User-visible changes? –Almost none! Same API, same CLI, very similar web UIs  MapReduce v2 is production ready in Hadoop 2.2 / CDH 5 –As of CDH 5 Cloudera officially recommends using MRv2 © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 54
  55. 55. Where To Learn More  Hadoop: The Definitive Guide, 3rd Edition –Chapter 6 – focuses on how MR is implemented on YARN  Cloudera Blog posts – blog.cloudera.com/blog/category/yarn –Migrating to MapReduce 2 on YARN –Writing Hadoop Programs That Work Across Releases –and more… © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 55
  56. 56. © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 56
  57. 57. • Submit questions in the Q&A panel • Download Cloudera Enterprise 5 beta: http://go.cloudera.com/c5-beta • Follow Cloudera University @ClouderaU • Learn about the Enterprise Data Hub: http://tinyurl.com/edh-webinar • Watch on-demand video of this webinar and many more at http://cloudera.com Register now for Cloudera training at http://university.cloudera.com Use discount code YARN_10 to save 10% on new enrollments in training classes delivered by Cloudera until February ‘14* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until February ‘14* • Thank you for attending! * Excludes classes sold or delivered by Cloudera partners © Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 57

×