SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Introduction To YARN
Adam Kawa, Spotify
          th
The 9 Meeting of Warsaw Hadoop User Group




2/23/13
About Me
Data Engineer at Spotify, Sweden
Hadoop Instructor at Compendium (Cloudera Training Partner)
+2.5 year of experience in Hadoop




2/23/13
“Classical” Apache Hadoop Cluster
Can consist of one, five, hundred or 4000 nodes




2/23/13
Hadoop Golden Era
One of the most exciting open-source projects on the planet
Becomes the standard for large-scale processing
Successfully deployed by hundreds/thousands of companies
Like a salutary virus
... But it has some little drawbacks



2/23/13
HDFS Main Limitations

Single NameNode that
   keeps all metadata in RAM
   performs all metadata operations
   becomes a single point of failure (SPOF)
(not the topic of this presentation… )

2/23/13
“Classical” MapReduce Limitations
Limited scalability
Poor resource utilization
Lack of support for the alternative frameworks
Lack of wire-compatible protocols




2/23/13
Limited Scalability
Scales to only
    ~4,000-node cluster
    ~40,000 concurrent tasks
Smaller and less-powerful clusters must be created




2/23/13
Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
Poor Cluster Resources Utilization




                       Hard-coded values. TaskTrackers
                       must be restarted after a change.



2/23/13
Poor Cluster Resources Utilization




   Map tasks are waiting for the slots   Hard-coded values. TaskTrackers
   which are                             must be restarted after a change.
   NOT currently used by Reduce tasks

2/23/13
Resources Needs That Varies Over Time
                             How to
                             hard-code the
                             number of map
                             and reduce slots
                             efficiently?




2/23/13
(Secondary) Benefits Of Better Utilization
The same calculation on more efficient Hadoop cluster
    Less data center space
    Less silicon waste
    Less power utilizations
    Less carbon emissions
    =
    Less disastrous environmental consequences
2/23/13
Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
“Classical” MapReduce Daemons




2/23/13
JobTracker Responsibilities
Manages the computational resources (map and reduce slots)
Schedules all user jobs
   Schedules all tasks that belongs to a job
   Monitors tasks executions
   Restarts failed and speculatively runs slow tasks
   Calculates job counters totals

2/23/13
Simple Observation
JobTracker has lots tasks...




2/23/13
JobTracker Redesign Idea
Reduce responsibilities of JobTracker
   Separate cluster resource management from job
   coordination
   Use slaves (many of them!) to manage jobs life-cycle
Scale to (at least) 10K nodes, 10K jobs, 100K tasks



2/23/13
Disappearance Of The Centralized JobTracker




2/23/13
YARN – Yet Another Resource Negotiator




2/23/13
Resource Manager and Application Master




2/23/13
YARN Applications
MRv2 (simply MRv1 rewritten to run on top of YARN)
   No need to rewrite existing MapReduce jobs
Distributed shell
Spark, Apache S4 (a real-time processing)
Hamster (MPI on Hadoop)
Apache Giraph (a graph processing)
Apache HAMA (matrix, graph and network algorithms)
   ... and http://wiki.apache.org/hadoop/PoweredByYarn
2/23/13
NodeManager
More flexible and efficient than TaskTracker
Executes any computation that makes sense to ApplicationMaster
   Not only map or reduce tasks
Containers with variable resource sizes (e.g. RAM, CPU, network, disk)
   No hard-coded split into map and reduce slots




2/23/13
NodeManager Containers
NodeManager creates a container for each task
Container contains variable resources sizes
   e.g. 2GB RAM, 1 CPU, 1 disk
Number of created containers is limited
   By total sum of resources available on NodeManager



2/23/13
Application Submission In YARN




2/23/13
Resource Manager Components




2/23/13
Uberization
Running the small jobs in the same JVM as AplicationMaster

mapreduce.job.ubertask.enable                     true (false is default)
mapreduce.job.ubertask.maxmaps                    9*
mapreduce.job.ubertask.maxreduces 1*
mapreduce.job.ubertask.maxbytes                   dfs.block.size*


* Users may override these values, but only downward

2/23/13
Interesting Differences
Task progress, status, counters are passed directly to the
  Application Master
Shuffle handlers (long-running auxiliary services in Node
  Managers) serve map outputs to reduce tasks
YARN does not support JVM reuse for map/reduce tasks




2/23/13
Fault-Tollerance
Failure of the running tasks or NodeManagers is similar as in MRv1
Applications can be retried several times
      yarn.resourcemanager.am.max-retries         1
      yarn.app.mapreduce.am.job.recovery.enable   false
Resource Manager can start Application Master in a new container
      yarn.app.mapreduce.am.job.recovery.enable   false



2/23/13
Resource Manager Failure
Two interesting features
   RM restarts without the need to re-run currently
   running/submitted applications [YARN-128]
   ZK-based High Availability (HA) for RM [YARN-149] (still in
   progress)




2/23/13
Wire-Compatible Protocol
Client and cluster may use different versions
More manageable upgrades
   Rolling upgrades without disrupting the service
   Active and standby NameNodes upgraded independently
Protocol buffers chosen for serialization (instead of Writables)



2/23/13
YARN Maturity
Still considered as both production and not production-ready
   The code is already promoted to the trunk
   Yahoo! runs it on 2000 and 6000-node clusters
      Using Apache Hadoop 2.0.2 (Alpha)
Anyway, it will replace MRv1 sooner than later



2/23/13
How To Quickly Setup The YARN Cluster




2/23/13
Basic YARN Configuration Parameters
yarn.nodemanager.resource.memory-mb        8192
yarn.nodemanager.resource.cpu-cores        8


yarn.scheduler.minimum-allocation-mb       1024
yarn.scheduler.maximum-allocation-mb       8192
yarn.scheduler.minimum-allocation-vcores   1
yarn.scheduler.maximum-allocation-vcores   32



2/23/13
Basic MR Apps Configuration Parameters
mapreduce.map.cpu.vcores     1
mapreduce.reduce.cpu.vcores 1


mapreduce.map.memory.mb      1536
mapreduce.map.java.opts      -Xmx1024M
mapreduce.reduce.memory.mb   3072
mapreduce.reduce.java.opts   -Xmx2560M
mapreduce.task.io.sort.mb    512

2/23/13
Apache Whirr Configuration File




2/23/13
Running And Destroying Cluster
$ whirr launch-cluster --config hadoop-yarn.properties
$ whirr destroy-cluster --config hadoop-yarn.properties




2/23/13
It Might Be A Demo
But what about running
production applications
on the real cluster
consisting of hundreds of nodes?


Join Spotify!
jobs@spotify.com
2/23/13
2/23/13
Image source: http://allthingsd.com/files/2012/07/10Questions.jpeg
Thank You!




2/23/13

Weitere ähnliche Inhalte

Was ist angesagt?

Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Streaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamStreaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamAll Things Open
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 

Was ist angesagt? (20)

Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Streaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache BeamStreaming Data Pipelines With Apache Beam
Streaming Data Pipelines With Apache Beam
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Andere mochten auch

Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java APIAdam Kawa
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGAdam Kawa
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Multi risk factor model
Multi risk factor model Multi risk factor model
Multi risk factor model Stefan Duprey
 
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Jabir Ahmed
 

Andere mochten auch (20)

Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Multi risk factor model
Multi risk factor model Multi risk factor model
Multi risk factor model
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Yarn
YarnYarn
Yarn
 
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 

Ähnlich wie Apache Hadoop YARN

Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMRABC Talks
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data ManagementChris Bunch
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingNandan Kumar
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsEMC
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperDavid Luan
 

Ähnlich wie Apache Hadoop YARN (20)

E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
H04502048051
H04502048051H04502048051
H04502048051
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMR
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data Management
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
YARN (2).pptx
YARN (2).pptxYARN (2).pptx
YARN (2).pptx
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch training
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaper
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Apache Hadoop YARN

  • 1. Introduction To YARN Adam Kawa, Spotify th The 9 Meeting of Warsaw Hadoop User Group 2/23/13
  • 2. About Me Data Engineer at Spotify, Sweden Hadoop Instructor at Compendium (Cloudera Training Partner) +2.5 year of experience in Hadoop 2/23/13
  • 3. “Classical” Apache Hadoop Cluster Can consist of one, five, hundred or 4000 nodes 2/23/13
  • 4. Hadoop Golden Era One of the most exciting open-source projects on the planet Becomes the standard for large-scale processing Successfully deployed by hundreds/thousands of companies Like a salutary virus ... But it has some little drawbacks 2/23/13
  • 5. HDFS Main Limitations Single NameNode that keeps all metadata in RAM performs all metadata operations becomes a single point of failure (SPOF) (not the topic of this presentation… ) 2/23/13
  • 6. “Classical” MapReduce Limitations Limited scalability Poor resource utilization Lack of support for the alternative frameworks Lack of wire-compatible protocols 2/23/13
  • 7. Limited Scalability Scales to only ~4,000-node cluster ~40,000 concurrent tasks Smaller and less-powerful clusters must be created 2/23/13 Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
  • 8. Poor Cluster Resources Utilization Hard-coded values. TaskTrackers must be restarted after a change. 2/23/13
  • 9. Poor Cluster Resources Utilization Map tasks are waiting for the slots Hard-coded values. TaskTrackers which are must be restarted after a change. NOT currently used by Reduce tasks 2/23/13
  • 10. Resources Needs That Varies Over Time How to hard-code the number of map and reduce slots efficiently? 2/23/13
  • 11. (Secondary) Benefits Of Better Utilization The same calculation on more efficient Hadoop cluster Less data center space Less silicon waste Less power utilizations Less carbon emissions = Less disastrous environmental consequences 2/23/13 Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  • 13. JobTracker Responsibilities Manages the computational resources (map and reduce slots) Schedules all user jobs Schedules all tasks that belongs to a job Monitors tasks executions Restarts failed and speculatively runs slow tasks Calculates job counters totals 2/23/13
  • 14. Simple Observation JobTracker has lots tasks... 2/23/13
  • 15. JobTracker Redesign Idea Reduce responsibilities of JobTracker Separate cluster resource management from job coordination Use slaves (many of them!) to manage jobs life-cycle Scale to (at least) 10K nodes, 10K jobs, 100K tasks 2/23/13
  • 16. Disappearance Of The Centralized JobTracker 2/23/13
  • 17. YARN – Yet Another Resource Negotiator 2/23/13
  • 18. Resource Manager and Application Master 2/23/13
  • 19. YARN Applications MRv2 (simply MRv1 rewritten to run on top of YARN) No need to rewrite existing MapReduce jobs Distributed shell Spark, Apache S4 (a real-time processing) Hamster (MPI on Hadoop) Apache Giraph (a graph processing) Apache HAMA (matrix, graph and network algorithms) ... and http://wiki.apache.org/hadoop/PoweredByYarn 2/23/13
  • 20. NodeManager More flexible and efficient than TaskTracker Executes any computation that makes sense to ApplicationMaster Not only map or reduce tasks Containers with variable resource sizes (e.g. RAM, CPU, network, disk) No hard-coded split into map and reduce slots 2/23/13
  • 21. NodeManager Containers NodeManager creates a container for each task Container contains variable resources sizes e.g. 2GB RAM, 1 CPU, 1 disk Number of created containers is limited By total sum of resources available on NodeManager 2/23/13
  • 24. Uberization Running the small jobs in the same JVM as AplicationMaster mapreduce.job.ubertask.enable true (false is default) mapreduce.job.ubertask.maxmaps 9* mapreduce.job.ubertask.maxreduces 1* mapreduce.job.ubertask.maxbytes dfs.block.size* * Users may override these values, but only downward 2/23/13
  • 25. Interesting Differences Task progress, status, counters are passed directly to the Application Master Shuffle handlers (long-running auxiliary services in Node Managers) serve map outputs to reduce tasks YARN does not support JVM reuse for map/reduce tasks 2/23/13
  • 26. Fault-Tollerance Failure of the running tasks or NodeManagers is similar as in MRv1 Applications can be retried several times yarn.resourcemanager.am.max-retries 1 yarn.app.mapreduce.am.job.recovery.enable false Resource Manager can start Application Master in a new container yarn.app.mapreduce.am.job.recovery.enable false 2/23/13
  • 27. Resource Manager Failure Two interesting features RM restarts without the need to re-run currently running/submitted applications [YARN-128] ZK-based High Availability (HA) for RM [YARN-149] (still in progress) 2/23/13
  • 28. Wire-Compatible Protocol Client and cluster may use different versions More manageable upgrades Rolling upgrades without disrupting the service Active and standby NameNodes upgraded independently Protocol buffers chosen for serialization (instead of Writables) 2/23/13
  • 29. YARN Maturity Still considered as both production and not production-ready The code is already promoted to the trunk Yahoo! runs it on 2000 and 6000-node clusters Using Apache Hadoop 2.0.2 (Alpha) Anyway, it will replace MRv1 sooner than later 2/23/13
  • 30. How To Quickly Setup The YARN Cluster 2/23/13
  • 31. Basic YARN Configuration Parameters yarn.nodemanager.resource.memory-mb 8192 yarn.nodemanager.resource.cpu-cores 8 yarn.scheduler.minimum-allocation-mb 1024 yarn.scheduler.maximum-allocation-mb 8192 yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 32 2/23/13
  • 32. Basic MR Apps Configuration Parameters mapreduce.map.cpu.vcores 1 mapreduce.reduce.cpu.vcores 1 mapreduce.map.memory.mb 1536 mapreduce.map.java.opts -Xmx1024M mapreduce.reduce.memory.mb 3072 mapreduce.reduce.java.opts -Xmx2560M mapreduce.task.io.sort.mb 512 2/23/13
  • 34. Running And Destroying Cluster $ whirr launch-cluster --config hadoop-yarn.properties $ whirr destroy-cluster --config hadoop-yarn.properties 2/23/13
  • 35. It Might Be A Demo But what about running production applications on the real cluster consisting of hundreds of nodes? Join Spotify! jobs@spotify.com 2/23/13