Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Next Generation of Apache Hadoop MapReduce<br />Arun C. Murthy - Hortonworks Founder and Architect<br />@acmurthy (@horton...
Hello! I’m Arun…<br />Architect & Lead, Apache Hadoop MapReduce Development Team at Hortonworks (formerly at Yahoo!)<br />...
Hadoop MapReduce Today<br />JobTracker<br />Manages cluster resources and job scheduling<br />TaskTracker<br />Per-node ag...
Current Limitations<br />Scalability<br />Maximum Cluster size – 4,000 nodes<br />Maximum concurrent tasks – 40,000<br />C...
Current Limitations<br />Lacks support for alternate paradigms<br />Iterative applications implemented using MapReduce are...
Requirements<br />Reliability<br />Availability<br />Scalability - Clusters of 6,000-10,000 machines<br />Each machine wit...
Design Centre<br />Split up the two major functions of JobTracker<br />Cluster resource management<br />Application life-c...
Architecture<br />
Architecture<br />Resource Manager<br />Global resource scheduler<br />Hierarchical queues<br />Node Manager<br />Per-mach...
 Improvements vis-à-vis current MapReduce<br />Scalability <br />Application life-cycle management is very expensive<br />...
Improvments vis-à-vis current MapReduce<br />Fault Tolerance and Availability <br />Resource Manager<br />No single point ...
 Improvements vis-à-vis current MapReduce<br />Wire Compatibility <br />Protocols are wire-compatible<br />Old clients can...
 Improvements vis-à-vis current MapReduce<br />Innovation and Agility<br />MapReduce now becomes a user-land library<br />...
 Improvements vis-à-vis current MapReduce<br />Utilization<br />Generic resource model <br />Memory<br />CPU<br />Disk b/w...
 Improvements vis-à-vis current MapReduce<br />Support for programming paradigms other than MapReduce<br />MPI<br />Master...
Summary<br />MapReduce .Next takes Hadoop to the next level<br />Scale-out even further<br />High availability<br />Cluste...
Status – June, 2011<br />Feature complete<br />Rigorous testing cycle underway<br />Scale testing at ~500 nodes<br />Sort/...
Questions?<br />http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/<br />© Hortonworks Inc. 2011<br /...
Thank You.<br />© Hortonworks Inc. 2011<br />
Nächste SlideShare
Wird geladen in …5
×

NextGen Apache Hadoop MapReduce

25.556 Aufrufe

Veröffentlicht am

Arun C Murthy, Founder and Architect at Hortonworks Inc., talks about the upcoming Next Generation Apache Hadoop MapReduce framework at the Hadoop Summit, 2011.

Veröffentlicht in: Technologie
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • how to download?????????????
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • It's correct, of course..
    It means, Nowdays computer can do more than 2009 does
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • slide 10 7th line:
    6,000 2012 machines > 12,000 2009 machines
    '2009' is correct? It seems '2013' or more
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • slide 10 7th line:
    6,000 2012 machines > 12,000 2009 machines
    '2009' is correct? It seems '2013' or more
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

NextGen Apache Hadoop MapReduce

  1. 1. Next Generation of Apache Hadoop MapReduce<br />Arun C. Murthy - Hortonworks Founder and Architect<br />@acmurthy (@hortonworks)<br />Formerly Architect, MapReduce @ Yahoo!<br />8 years @ Yahoo!<br />© Hortonworks Inc. 2011<br />June 29, 2011<br />
  2. 2. Hello! I’m Arun…<br />Architect & Lead, Apache Hadoop MapReduce Development Team at Hortonworks (formerly at Yahoo!)<br />Apache Hadoop Committer and Member of PMC<br />Full-time contributor to Apache Hadoop since early 2006<br />
  3. 3. Hadoop MapReduce Today<br />JobTracker<br />Manages cluster resources and job scheduling<br />TaskTracker<br />Per-node agent<br />Manage tasks<br />
  4. 4. Current Limitations<br />Scalability<br />Maximum Cluster size – 4,000 nodes<br />Maximum concurrent tasks – 40,000<br />Coarse synchronization in JobTracker<br />Single point of failure <br />Failure kills all queued and running jobs<br />Jobs need to be re-submitted by users<br />Restart is very tricky due to complex state<br />Hard partition of resources into map and reduce slots<br />© Hortonworks Inc. 2011<br />5<br />
  5. 5. Current Limitations<br />Lacks support for alternate paradigms<br />Iterative applications implemented using MapReduce are 10x slower. <br />Example: K-Means, PageRank<br />Lack of wire-compatible protocols <br />Client and cluster must be of same version<br />Applications and workflows cannot migrate to different clusters<br />© Hortonworks Inc. 2011<br />6<br />
  6. 6. Requirements<br />Reliability<br />Availability<br />Scalability - Clusters of 6,000-10,000 machines<br />Each machine with 16 cores, 48G/96G RAM, 24TB/36TB disks<br />100,000+ concurrent tasks<br />10,000 concurrent jobs<br />Wire Compatibility<br />Agility & Evolution – Ability for customers to control upgrades to the grid software stack.<br />© Hortonworks Inc. 2011<br />7<br />
  7. 7. Design Centre<br />Split up the two major functions of JobTracker<br />Cluster resource management<br />Application life-cycle management<br />MapReduce becomes user-land library<br />© Hortonworks Inc. 2011<br />8<br />
  8. 8. Architecture<br />
  9. 9. Architecture<br />Resource Manager<br />Global resource scheduler<br />Hierarchical queues<br />Node Manager<br />Per-machine agent<br />Manages the life-cycle of container<br />Container resource monitoring<br />Application Master<br />Per-application<br />Manages application scheduling and task execution<br />E.g. MapReduce Application Master<br />© Hortonworks Inc. 2011<br />10<br />
  10. 10. Improvements vis-à-vis current MapReduce<br />Scalability <br />Application life-cycle management is very expensive<br />Partition resource management and application life-cycle management<br />Application management is distributed<br />Hardware trends - Currently run clusters of 4,000 machines<br />6,000 2012 machines > 12,000 2009 machines<br /><16+ cores, 48/96G, 24TB> v/s <8 cores, 16G, 4TB><br />© Hortonworks Inc. 2011<br />11<br />
  11. 11. Improvments vis-à-vis current MapReduce<br />Fault Tolerance and Availability <br />Resource Manager<br />No single point of failure – state saved in ZooKeeper<br />Application Masters are restarted automatically on RM restart<br />Applications continue to progress with existing resources during restart, new resources aren’t allocated<br />Application Master<br />Optional failover via application-specific checkpoint<br />MapReduce applications pick up where they left off via state saved in HDFS<br />© Hortonworks Inc. 2011<br />12<br />
  12. 12. Improvements vis-à-vis current MapReduce<br />Wire Compatibility <br />Protocols are wire-compatible<br />Old clients can talk to new servers<br />Rolling upgrades<br />© Hortonworks Inc. 2011<br />13<br />
  13. 13. Improvements vis-à-vis current MapReduce<br />Innovation and Agility<br />MapReduce now becomes a user-land library<br />Multiple versions of MapReduce can run in the same cluster (a la Apache Pig)<br />Faster deployment cycles for improvements<br />Customers upgrade MapReduce versions on their schedule<br />Users can customize MapReduce e.g. HOP without affecting everyone!<br />© Hortonworks Inc. 2011<br />14<br />
  14. 14. Improvements vis-à-vis current MapReduce<br />Utilization<br />Generic resource model <br />Memory<br />CPU<br />Disk b/w<br />Network b/w<br />Remove fixed partition of map and reduce slots<br />© Hortonworks Inc. 2011<br />15<br />
  15. 15. Improvements vis-à-vis current MapReduce<br />Support for programming paradigms other than MapReduce<br />MPI<br />Master-Worker<br />Machine Learning<br />Iterative processing<br />Enabled by allowing use of paradigm-specific Application Master<br />Run all on the same Hadoop cluster<br />© Hortonworks Inc. 2011<br />16<br />
  16. 16. Summary<br />MapReduce .Next takes Hadoop to the next level<br />Scale-out even further<br />High availability<br />Cluster Utilization <br />Support for paradigms other than MapReduce<br />© Hortonworks Inc. 2011<br />17<br />
  17. 17. Status – June, 2011<br />Feature complete<br />Rigorous testing cycle underway<br />Scale testing at ~500 nodes<br />Sort/Scan/Shuffle benchmarks<br />GridMixV3!<br />Integration testing<br />Pig integration complete!<br />Coming in the next release of Apache Hadoop!<br />Beta deployments of next release of Apache Hadoop at Yahoo! in Q4, 2011<br />© Hortonworks Inc. 2011<br />18<br />
  18. 18. Questions?<br />http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/<br />© Hortonworks Inc. 2011<br />19<br />
  19. 19. Thank You.<br />© Hortonworks Inc. 2011<br />

×