Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Apache Hadoop YARN

19.475 Aufrufe

Veröffentlicht am

Introduction to Apache Hadoop YARN at Warsaw Hadoop User Group (WHUG)

Veröffentlicht in: Technologie
  • Dating direct: ♥♥♥ http://bit.ly/369VOVb ♥♥♥
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/369VOVb ❤❤❤
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Hi Vignesh

    Per each TaskTracker, you configure the number of map and reduce slots (map slots are running map tasks, and reduce slots are running reduce tasks). It is not possible that map slots will be used for running reduce tasks (and vice versa).

    The general rule of thumb is to assign 60-70% of slots to map slots, and the remaining ones to reduce slots (but obviously, it depends on your workload). At Spotify, we currently use something around 68%-32%.

    Hope this helps!
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • And do you suggest the capacity of Map task and Reduce task should be similar ?
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Hi Adam, Thanks a lot for this.
    I have one doubt in slide no 9. I know that Hadoop calculates the numbers of Map and Reduce task via different calculations than how can all these Map slots are replaced with reducers.
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Apache Hadoop YARN

  1. 1. Introduction To YARNAdam Kawa, Spotify thThe 9 Meeting of Warsaw Hadoop User Group2/23/13
  2. 2. About MeData Engineer at Spotify, SwedenHadoop Instructor at Compendium (Cloudera Training Partner)+2.5 year of experience in Hadoop2/23/13
  3. 3. “Classical” Apache Hadoop ClusterCan consist of one, five, hundred or 4000 nodes2/23/13
  4. 4. Hadoop Golden EraOne of the most exciting open-source projects on the planetBecomes the standard for large-scale processingSuccessfully deployed by hundreds/thousands of companiesLike a salutary virus... But it has some little drawbacks2/23/13
  5. 5. HDFS Main LimitationsSingle NameNode that keeps all metadata in RAM performs all metadata operations becomes a single point of failure (SPOF)(not the topic of this presentation… )2/23/13
  6. 6. “Classical” MapReduce LimitationsLimited scalabilityPoor resource utilizationLack of support for the alternative frameworksLack of wire-compatible protocols2/23/13
  7. 7. Limited ScalabilityScales to only ~4,000-node cluster ~40,000 concurrent tasksSmaller and less-powerful clusters must be created2/23/13Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
  8. 8. Poor Cluster Resources Utilization Hard-coded values. TaskTrackers must be restarted after a change.2/23/13
  9. 9. Poor Cluster Resources Utilization Map tasks are waiting for the slots Hard-coded values. TaskTrackers which are must be restarted after a change. NOT currently used by Reduce tasks2/23/13
  10. 10. Resources Needs That Varies Over Time How to hard-code the number of map and reduce slots efficiently?2/23/13
  11. 11. (Secondary) Benefits Of Better UtilizationThe same calculation on more efficient Hadoop cluster Less data center space Less silicon waste Less power utilizations Less carbon emissions = Less disastrous environmental consequences2/23/13Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  12. 12. “Classical” MapReduce Daemons2/23/13
  13. 13. JobTracker ResponsibilitiesManages the computational resources (map and reduce slots)Schedules all user jobs Schedules all tasks that belongs to a job Monitors tasks executions Restarts failed and speculatively runs slow tasks Calculates job counters totals2/23/13
  14. 14. Simple ObservationJobTracker has lots tasks...2/23/13
  15. 15. JobTracker Redesign IdeaReduce responsibilities of JobTracker Separate cluster resource management from job coordination Use slaves (many of them!) to manage jobs life-cycleScale to (at least) 10K nodes, 10K jobs, 100K tasks2/23/13
  16. 16. Disappearance Of The Centralized JobTracker2/23/13
  17. 17. YARN – Yet Another Resource Negotiator2/23/13
  18. 18. Resource Manager and Application Master2/23/13
  19. 19. YARN ApplicationsMRv2 (simply MRv1 rewritten to run on top of YARN) No need to rewrite existing MapReduce jobsDistributed shellSpark, Apache S4 (a real-time processing)Hamster (MPI on Hadoop)Apache Giraph (a graph processing)Apache HAMA (matrix, graph and network algorithms) ... and http://wiki.apache.org/hadoop/PoweredByYarn2/23/13
  20. 20. NodeManagerMore flexible and efficient than TaskTrackerExecutes any computation that makes sense to ApplicationMaster Not only map or reduce tasksContainers with variable resource sizes (e.g. RAM, CPU, network, disk) No hard-coded split into map and reduce slots2/23/13
  21. 21. NodeManager ContainersNodeManager creates a container for each taskContainer contains variable resources sizes e.g. 2GB RAM, 1 CPU, 1 diskNumber of created containers is limited By total sum of resources available on NodeManager2/23/13
  22. 22. Application Submission In YARN2/23/13
  23. 23. Resource Manager Components2/23/13
  24. 24. UberizationRunning the small jobs in the same JVM as AplicationMastermapreduce.job.ubertask.enable true (false is default)mapreduce.job.ubertask.maxmaps 9*mapreduce.job.ubertask.maxreduces 1*mapreduce.job.ubertask.maxbytes dfs.block.size** Users may override these values, but only downward2/23/13
  25. 25. Interesting DifferencesTask progress, status, counters are passed directly to the Application MasterShuffle handlers (long-running auxiliary services in Node Managers) serve map outputs to reduce tasksYARN does not support JVM reuse for map/reduce tasks2/23/13
  26. 26. Fault-TolleranceFailure of the running tasks or NodeManagers is similar as in MRv1Applications can be retried several times yarn.resourcemanager.am.max-retries 1 yarn.app.mapreduce.am.job.recovery.enable falseResource Manager can start Application Master in a new container yarn.app.mapreduce.am.job.recovery.enable false2/23/13
  27. 27. Resource Manager FailureTwo interesting features RM restarts without the need to re-run currently running/submitted applications [YARN-128] ZK-based High Availability (HA) for RM [YARN-149] (still in progress)2/23/13
  28. 28. Wire-Compatible ProtocolClient and cluster may use different versionsMore manageable upgrades Rolling upgrades without disrupting the service Active and standby NameNodes upgraded independentlyProtocol buffers chosen for serialization (instead of Writables)2/23/13
  29. 29. YARN MaturityStill considered as both production and not production-ready The code is already promoted to the trunk Yahoo! runs it on 2000 and 6000-node clusters Using Apache Hadoop 2.0.2 (Alpha)Anyway, it will replace MRv1 sooner than later2/23/13
  30. 30. How To Quickly Setup The YARN Cluster2/23/13
  31. 31. Basic YARN Configuration Parametersyarn.nodemanager.resource.memory-mb 8192yarn.nodemanager.resource.cpu-cores 8yarn.scheduler.minimum-allocation-mb 1024yarn.scheduler.maximum-allocation-mb 8192yarn.scheduler.minimum-allocation-vcores 1yarn.scheduler.maximum-allocation-vcores 322/23/13
  32. 32. Basic MR Apps Configuration Parametersmapreduce.map.cpu.vcores 1mapreduce.reduce.cpu.vcores 1mapreduce.map.memory.mb 1536mapreduce.map.java.opts -Xmx1024Mmapreduce.reduce.memory.mb 3072mapreduce.reduce.java.opts -Xmx2560Mmapreduce.task.io.sort.mb 5122/23/13
  33. 33. Apache Whirr Configuration File2/23/13
  34. 34. Running And Destroying Cluster$ whirr launch-cluster --config hadoop-yarn.properties$ whirr destroy-cluster --config hadoop-yarn.properties2/23/13
  35. 35. It Might Be A DemoBut what about runningproduction applicationson the real clusterconsisting of hundreds of nodes?Join Spotify!jobs@spotify.com2/23/13
  36. 36. 2/23/13Image source: http://allthingsd.com/files/2012/07/10Questions.jpeg
  37. 37. Thank You!2/23/13