SlideShare ist ein Scribd-Unternehmen logo
1 von 17
The Next Generation of
 Hadoop Map-Reduce
        Sharad Agarwal
     sharadag@yahoo-inc.com
        sharad@apache.org
About Me

   Hadoop Committer and PMC member
   Architect at Yahoo!
Hadoop Map-Reduce Today
   JobTracker
    - Manages cluster resources
      and job scheduling
   TaskTracker
    - Per-node agent
    - Manage tasks
Current Limitations
   Scalability
    - Maximum Cluster size – 4,000 nodes
    - Maximum concurrent tasks – 40,000
    - Coarse synchronization in JobTracker
   Single point of failure
    - Failure kills all queued and running jobs
    - Jobs need to be re-submitted by users
   Restart is very tricky due to complex state
   Hard partition of resources into map and reduce
    slots
Current Limitations

   Lacks support for alternate paradigms
    - Iterative applications implemented using Map-Reduce
      are 10x slower.
    - Example: K-Means, PageRank
   Lack of wire-compatible protocols
    - Client and cluster must be of same version
    - Applications and workflows cannot migrate to
      different clusters
Next Generation Map-Reduce Requirements
   Reliability
   Availability
   Scalability - Clusters of 6,000 machines
    - Each machine with 16 cores, 48G RAM, 24TB disks
    - 100,000 concurrent tasks
    - 10,000 concurrent jobs
   Wire Compatibility
   Agility & Evolution – Ability for customers to
    control upgrades to the grid software stack.
Next Generation Map-Reduce – Design
Centre

   Split up the two major functions of JobTracker
    - Cluster resource management
    - Application life-cycle management
   Map-Reduce becomes user-land library
Architecture
Architecture
   Resource Manager
    - Global resource scheduler
    - Hierarchical queues
   Node Manager
    - Per-machine agent
    - Manages the life-cycle of container
    - Container resource monitoring
   Application Master
    - Per-application
    - Manages application scheduling and task execution
    - E.g. Map-Reduce Application Master
Improvements vis-à-vis current Map-Reduce
     Scalability
      - Application life-cycle management is very
        expensive
      - Partition resource management and application
        life-cycle management
      - Application management is distributed
      - Hardware trends - Currently run clusters of 4,000
        machines
          • 6,000 2012 machines > 12,000 2009 machines
          • <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
Improvements vis-à-vis current Map-Reduce
     Availability
      - Application Master
          • Optional failover via application-specific checkpoint
          • Map-Reduce applications pick up where they left off
      - Resource Manager
          • No single point of failure - failover via ZooKeeper
          • Application Masters are restarted automatically
Improvements vis-à-vis current Map-Reduce
     Wire Compatibility
      - Protocols are wire-compatible
      - Old clients can talk to new servers
      - Rolling upgrades
Improvements vis-à-vis current Map-Reduce
     Agility / Evolution
      - Map-Reduce now becomes a user-land library
      - Multiple versions of Map-Reduce can run in the
        same cluster (ala Apache Pig)
          • Faster deployment cycles for improvements
      - Customers upgrade Map-Reduce versions on their
        schedule
Improvements vis-à-vis current Map-Reduce
     Utilization
      - Generic resource model
          •   Memory
          •   CPU
          •   Disk b/w
          •   Network b/w
      - Remove fixed partition of map and reduce slots
Improvements vis-à-vis current Map-Reduce
     Support for programming paradigms other
      than Map-Reduce
      - MPI
      - Master-Worker
      - Machine Learning
      - Iterative processing
      - Enabled by allowing use of paradigm-specific
        Application Master
      - Run all on the same Hadoop cluster
Summary
   The next generation of Map-Reduce takes
    Hadoop to the next level
    -   Scale-out even further
    -   High availability
    -   Cluster Utilization
    -   Support for paradigms other than Map-Reduce
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Adam Doyle
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at ShareaholicShareaholic
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...DataStax Academy
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Anti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentAnti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentNaganarasimha Garla
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterJen Aman
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetesharidasnss
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersDatabricks
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsS N
 

Was ist angesagt? (20)

Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at Shareaholic
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Anti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentAnti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deployment
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetes
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 

Andere mochten auch

Sk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayatSk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayatizz_zafran
 
Gulliver al país de Li.liput
Gulliver al país de Li.liputGulliver al país de Li.liput
Gulliver al país de Li.liputlelescd
 
Skoleni golfovych rozhodcich III. tridy
Skoleni golfovych rozhodcich  III. tridySkoleni golfovych rozhodcich  III. tridy
Skoleni golfovych rozhodcich III. tridyBoleslav Bobcik
 
Lensfree Microscopy and Tomography
Lensfree Microscopy and TomographyLensfree Microscopy and Tomography
Lensfree Microscopy and TomographySERHAN ISIKMAN
 
Scvaf 2011 rooms presentation
Scvaf 2011 rooms presentationScvaf 2011 rooms presentation
Scvaf 2011 rooms presentationmduhe2
 
Organizational Wholeness and Growth
Organizational Wholeness and GrowthOrganizational Wholeness and Growth
Organizational Wholeness and GrowthChery Gegelman
 
Tax Assist Budget Summary2011
Tax Assist Budget Summary2011Tax Assist Budget Summary2011
Tax Assist Budget Summary2011Paul_Chillman
 
Grade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độGrade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độTa Hien
 
Speaker Kit - Gift Spotter
Speaker Kit - Gift SpotterSpeaker Kit - Gift Spotter
Speaker Kit - Gift SpotterKristyn Haywood
 
The global environment (2)
The global environment (2)The global environment (2)
The global environment (2)hi2mcfly
 
Describing trends
Describing trendsDescribing trends
Describing trendsTa Hien
 
New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009mrittmayer
 
B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013lukereaper
 
100% Funding Exec Summary
100%  Funding Exec Summary100%  Funding Exec Summary
100% Funding Exec Summarydgc_finance
 
Downtown orlando, florida
Downtown orlando, floridaDowntown orlando, florida
Downtown orlando, floridaJennifer Degan
 
Nutricion celular mi herbalife
Nutricion celular mi herbalifeNutricion celular mi herbalife
Nutricion celular mi herbalifeafernandezh
 

Andere mochten auch (20)

Sk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayatSk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayat
 
Gulliver al país de Li.liput
Gulliver al país de Li.liputGulliver al país de Li.liput
Gulliver al país de Li.liput
 
Skoleni golfovych rozhodcich III. tridy
Skoleni golfovych rozhodcich  III. tridySkoleni golfovych rozhodcich  III. tridy
Skoleni golfovych rozhodcich III. tridy
 
Lensfree Microscopy and Tomography
Lensfree Microscopy and TomographyLensfree Microscopy and Tomography
Lensfree Microscopy and Tomography
 
Asadsa s asd
Asadsa s asdAsadsa s asd
Asadsa s asd
 
Scvaf 2011 rooms presentation
Scvaf 2011 rooms presentationScvaf 2011 rooms presentation
Scvaf 2011 rooms presentation
 
Organizational Wholeness and Growth
Organizational Wholeness and GrowthOrganizational Wholeness and Growth
Organizational Wholeness and Growth
 
Tax Assist Budget Summary2011
Tax Assist Budget Summary2011Tax Assist Budget Summary2011
Tax Assist Budget Summary2011
 
Grade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độGrade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độ
 
Speaker Kit - Gift Spotter
Speaker Kit - Gift SpotterSpeaker Kit - Gift Spotter
Speaker Kit - Gift Spotter
 
The global environment (2)
The global environment (2)The global environment (2)
The global environment (2)
 
Describing trends
Describing trendsDescribing trends
Describing trends
 
New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009
 
Steve Ventre
Steve VentreSteve Ventre
Steve Ventre
 
B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013
 
100% Funding Exec Summary
100%  Funding Exec Summary100%  Funding Exec Summary
100% Funding Exec Summary
 
Downtown orlando, florida
Downtown orlando, floridaDowntown orlando, florida
Downtown orlando, florida
 
Nutricion celular mi herbalife
Nutricion celular mi herbalifeNutricion celular mi herbalife
Nutricion celular mi herbalife
 
Murata power supply hotsell
Murata power supply hotsellMurata power supply hotsell
Murata power supply hotsell
 
Heavy metal
Heavy metalHeavy metal
Heavy metal
 

Ähnlich wie YARN Hadoop Summit Bangalore 2011

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Yahoo Developer Network
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReducehuguk
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextDataWorks Summit
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLMLconf
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Pete Siddall
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Caserta
 

Ähnlich wie YARN Hadoop Summit Bangalore 2011 (20)

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Yarn
YarnYarn
Yarn
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 

Kürzlich hochgeladen

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

YARN Hadoop Summit Bangalore 2011

  • 1. The Next Generation of Hadoop Map-Reduce Sharad Agarwal sharadag@yahoo-inc.com sharad@apache.org
  • 2. About Me  Hadoop Committer and PMC member  Architect at Yahoo!
  • 3. Hadoop Map-Reduce Today  JobTracker - Manages cluster resources and job scheduling  TaskTracker - Per-node agent - Manage tasks
  • 4. Current Limitations  Scalability - Maximum Cluster size – 4,000 nodes - Maximum concurrent tasks – 40,000 - Coarse synchronization in JobTracker  Single point of failure - Failure kills all queued and running jobs - Jobs need to be re-submitted by users  Restart is very tricky due to complex state  Hard partition of resources into map and reduce slots
  • 5. Current Limitations  Lacks support for alternate paradigms - Iterative applications implemented using Map-Reduce are 10x slower. - Example: K-Means, PageRank  Lack of wire-compatible protocols - Client and cluster must be of same version - Applications and workflows cannot migrate to different clusters
  • 6. Next Generation Map-Reduce Requirements  Reliability  Availability  Scalability - Clusters of 6,000 machines - Each machine with 16 cores, 48G RAM, 24TB disks - 100,000 concurrent tasks - 10,000 concurrent jobs  Wire Compatibility  Agility & Evolution – Ability for customers to control upgrades to the grid software stack.
  • 7. Next Generation Map-Reduce – Design Centre  Split up the two major functions of JobTracker - Cluster resource management - Application life-cycle management  Map-Reduce becomes user-land library
  • 9. Architecture  Resource Manager - Global resource scheduler - Hierarchical queues  Node Manager - Per-machine agent - Manages the life-cycle of container - Container resource monitoring  Application Master - Per-application - Manages application scheduling and task execution - E.g. Map-Reduce Application Master
  • 10. Improvements vis-à-vis current Map-Reduce  Scalability - Application life-cycle management is very expensive - Partition resource management and application life-cycle management - Application management is distributed - Hardware trends - Currently run clusters of 4,000 machines • 6,000 2012 machines > 12,000 2009 machines • <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
  • 11. Improvements vis-à-vis current Map-Reduce  Availability - Application Master • Optional failover via application-specific checkpoint • Map-Reduce applications pick up where they left off - Resource Manager • No single point of failure - failover via ZooKeeper • Application Masters are restarted automatically
  • 12. Improvements vis-à-vis current Map-Reduce  Wire Compatibility - Protocols are wire-compatible - Old clients can talk to new servers - Rolling upgrades
  • 13. Improvements vis-à-vis current Map-Reduce  Agility / Evolution - Map-Reduce now becomes a user-land library - Multiple versions of Map-Reduce can run in the same cluster (ala Apache Pig) • Faster deployment cycles for improvements - Customers upgrade Map-Reduce versions on their schedule
  • 14. Improvements vis-à-vis current Map-Reduce  Utilization - Generic resource model • Memory • CPU • Disk b/w • Network b/w - Remove fixed partition of map and reduce slots
  • 15. Improvements vis-à-vis current Map-Reduce  Support for programming paradigms other than Map-Reduce - MPI - Master-Worker - Machine Learning - Iterative processing - Enabled by allowing use of paradigm-specific Application Master - Run all on the same Hadoop cluster
  • 16. Summary  The next generation of Map-Reduce takes Hadoop to the next level - Scale-out even further - High availability - Cluster Utilization - Support for paradigms other than Map-Reduce