SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Hadoop	
  Nextgen/MRv2/YARN	
  
           Sharad	
  Agarwal	
  
         sharad@apache.org	
  
About	
  me	
  
•  Apache	
  FoundaAon	
  
   –  Hadoop	
  CommiDer	
  and	
  PMC	
  member	
  
   –  Hadoop	
  MR	
  contributor	
  ~	
  4	
  years	
  
   –  Author	
  of	
  Hadoop	
  Nextgen	
  core	
  


•  Head	
  of	
  Technology	
  PlaKorms	
  @InMobi	
  
   –  Formerly	
  Architect	
  @Yahoo!	
  
   	
  
Hadoop	
  Map-­‐Reduce	
  Today	
  
•  JobTracker	
  
   –  Manages	
  cluster	
  
      resources	
  and	
  job	
  
      scheduling	
  
•  TaskTracker	
  
   –  Per-­‐node	
  agent	
  
   –  Manage	
  tasks	
  
Current	
  LimitaAons	
  
•  Scalability	
  
    –  Maximum	
  Cluster	
  size	
  –	
  4,000	
  nodes	
  
    –  Maximum	
  concurrent	
  tasks	
  –	
  40,000	
  
    –  Coarse	
  synchronizaAon	
  in	
  JobTracker	
  
•  Single	
  point	
  of	
  failure	
  
                                   	
  
    –  Failure	
  kills	
  all	
  queued	
  and	
  running	
  jobs	
  
    –  Jobs	
  need	
  to	
  be	
  re-­‐submiDed	
  by	
  users	
  
•  Restart	
  is	
  very	
  tricky	
  due	
  to	
  complex	
  state	
  
•  Hard	
  parAAon	
  of	
  resources	
  into	
  map	
  and	
  
   reduce	
  slots	
  
Current	
  LimitaAons	
  
•  Lacks	
  support	
  for	
  alternate	
  paradigms	
  
    –  IteraAve	
  applicaAons	
  implemented	
  using	
  Map-­‐
       Reduce	
  are	
  10x	
  slower.	
  	
  
    –  Example:	
  K-­‐Means,	
  PageRank	
  
•  Lack	
  of	
  wire-­‐compaAble	
  protocols	
  	
  
    –  Client	
  and	
  cluster	
  must	
  be	
  of	
  same	
  version	
  
    –  ApplicaAons	
  and	
  workflows	
  cannot	
  migrate	
  to	
  
       different	
  clusters	
  
Next	
  GeneraAon	
  Map-­‐Reduce	
  
                   Requirements	
  
•  Reliability	
  
•  Availability	
  
•  Scalability	
  -­‐	
  Clusters	
  of	
  6,000	
  machines	
  
    –  Each	
  machine	
  with	
  16	
  cores,	
  48G	
  RAM,	
  24TB	
  disks	
  
    –  100,000	
  concurrent	
  tasks	
  
    –  10,000	
  concurrent	
  jobs	
  
•  Wire	
  CompaAbility	
  
•  Agility	
  &	
  EvoluAon	
  –	
  Ability	
  for	
  customers	
  to	
  
   control	
  upgrades	
  to	
  the	
  grid	
  sodware	
  stack.	
  
Next	
  GeneraAon	
  Map-­‐Reduce	
  
                   Architecture	
  

•  Split	
  up	
  the	
  two	
  major	
  funcAons	
  of	
  JobTracker	
  
    –  Cluster	
  resource	
  management	
  
    –  ApplicaAon	
  life-­‐cycle	
  management	
  
•  Map-­‐Reduce	
  becomes	
  user-­‐land	
  library	
  
Architecture	
  
                                         Node
                                         Node
                                        Manager
                                        Manager


                                  Container   App Mstr
                                              App Mstr


Client

                      Resource
                      Resource           Node
                                         Node
                      Manager           Manager
                                        Manager
                      Manager
 Client
Client

                                  App Mstr    Container
                                              Container




 MapReduce Status                        Node
                                         Node
 MapReduce Status                       Manager
                                        Manager
   Job Submission
  Job Submission
    Node Status
   Node Status
 Resource Request
 Resource Request                 Container   Container
Architecture	
  
•  Resource	
  Manager	
  
    –  Global	
  resource	
  scheduler	
  
    –  Hierarchical	
  queues	
  
•  Node	
  Manager	
  
    –  Per-­‐machine	
  agent	
  
    –  Manages	
  the	
  life-­‐cycle	
  of	
  container	
  
    –  Container	
  resource	
  monitoring	
  
•  ApplicaAon	
  Master	
  
    –  Per-­‐applicaAon	
  
    –  Manages	
  applicaAon	
  scheduling	
  and	
  task	
  execuAon	
  
    –  E.g.	
  Map-­‐Reduce	
  ApplicaAon	
  Master	
  
 Improvements	
  vis-­‐à-­‐vis	
  current	
  Map-­‐
                Reduce	
  
•  Scalability	
  	
  
    –  ApplicaAon	
  life-­‐cycle	
  management	
  is	
  very	
  
       expensive	
  
    –  ParAAon	
  resource	
  management	
  and	
  
       applicaAon	
  life-­‐cycle	
  management	
  
    –  ApplicaAon	
  management	
  is	
  distributed	
  
    –  Hardware	
  trends	
  -­‐	
  Currently	
  run	
  clusters	
  of	
  
       4,000	
  machines	
  
         •  6,000	
  2012	
  machines	
  >	
  12,000	
  2009	
  machines	
  
         •  <8	
  cores,	
  16G,	
  4TB>	
  v/s	
  <16+	
  cores,	
  48/96G,	
  
            24TB>	
  
 Improvements	
  vis-­‐à-­‐vis	
  current	
  Map-­‐
                Reduce	
  
•  Availability	
  	
  
    –  ApplicaAon	
  Master	
  
         •  OpAonal	
  failover	
  via	
  applicaAon-­‐specific	
  
            checkpoint	
  
         •  Map-­‐Reduce	
  applicaAons	
  pick	
  up	
  where	
  they	
  
            led	
  off	
  
    –  Resource	
  Manager	
  
         •  No	
  single	
  point	
  of	
  failure	
  -­‐	
  failover	
  via	
  
            ZooKeeper	
  
         •  ApplicaAon	
  Masters	
  are	
  restarted	
  
            automaAcally	
  
 Improvements	
  vis-­‐à-­‐vis	
  current	
  Map-­‐
                Reduce	
  
•  Wire	
  CompaAbility	
  	
  
   –  Protocols	
  are	
  wire-­‐compaAble	
  
   –  Old	
  clients	
  can	
  talk	
  to	
  new	
  servers	
  
   –  Rolling	
  upgrades	
  
 Improvements	
  vis-­‐à-­‐vis	
  current	
  Map-­‐
                Reduce	
  
•  Agility	
  /	
  EvoluAon	
  	
  
    –  Map-­‐Reduce	
  now	
  becomes	
  a	
  user-­‐land	
  
       library	
  
    –  MulAple	
  versions	
  of	
  Map-­‐Reduce	
  can	
  run	
  
       in	
  the	
  same	
  cluster	
  (ala	
  Apache	
  Pig)	
  
         •  Faster	
  deployment	
  cycles	
  for	
  improvements	
  
    –  Customers	
  upgrade	
  Map-­‐Reduce	
  versions	
  
       on	
  their	
  schedule	
  
 Improvements	
  vis-­‐à-­‐vis	
  current	
  Map-­‐
                Reduce	
  
•  UAlizaAon	
  
   –  Generic	
  resource	
  model	
  	
  
       •  Memory	
  
       •  CPU	
  
       •  Disk	
  b/w	
  
       •  Network	
  b/w	
  
   –  Remove	
  fixed	
  parAAon	
  of	
  map	
  and	
  reduce	
  
      slots	
  
 Improvements	
  vis-­‐à-­‐vis	
  current	
  Map-­‐
                Reduce	
  
•  Support	
  for	
  programming	
  paradigms	
  
   other	
  than	
  Map-­‐Reduce	
  
   –  MPI	
  
   –  Master-­‐Worker	
  
   –  Machine	
  Learning	
  
   –  IteraAve	
  processing	
  
   –  Enabled	
  by	
  allowing	
  use	
  of	
  paradigm-­‐
      specific	
  ApplicaAon	
  Master	
  
   –  Run	
  all	
  on	
  the	
  same	
  Hadoop	
  cluster	
  
Summary	
  
•  The	
  next	
  generaAon	
  of	
  Map-­‐Reduce	
  takes	
  
   Hadoop	
  to	
  the	
  next	
  level	
  
   –  Scale-­‐out	
  even	
  further	
  
   –  High	
  availability	
  
   –  Cluster	
  UAlizaAon	
  	
  
   –  Support	
  for	
  paradigms	
  other	
  than	
  Map-­‐Reduce	
  
Status	
  
•  Apache	
  Hadoop	
  0.23	
  release	
  is	
  out	
  
     –  HDFS	
  FederaAon	
  
     –  MRv2	
  
•  Currently	
  undergoing	
  tests	
  on	
  Small	
  scale	
  ~	
  500	
  nodes	
  
•  Alpha	
  	
  
     –  2000	
  nodes	
  
     –  Q1	
  2012	
  
•  Beta/ProducAon	
  
     –  Variety	
  of	
  applicaAons	
  and	
  loads	
  	
  
     –  4000+	
  nodes	
  
     –  Q2	
  2012	
  
     	
  
     	
  
QuesAons?	
  


Follow	
  me	
  on	
  @twiDer:	
  sharad_ag	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
 

Was ist angesagt? (20)

Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Anti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentAnti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deployment
 
Investing the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesInvesting the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resources
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Application Timeline Server Past, Present and Future
Application Timeline Server  Past, Present and FutureApplication Timeline Server  Past, Present and Future
Application Timeline Server Past, Present and Future
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at Shareaholic
 

Andere mochten auch

Coursera neuromarketing 2015
Coursera neuromarketing 2015Coursera neuromarketing 2015
Coursera neuromarketing 2015
Ines Solari
 
Compliance plus presentation slide share
Compliance plus presentation slide shareCompliance plus presentation slide share
Compliance plus presentation slide share
Andy Brooks
 

Andere mochten auch (17)

Marcel Kornacker, Software Enginner at Cloudera - "Data modeling for data sci...
Marcel Kornacker, Software Enginner at Cloudera - "Data modeling for data sci...Marcel Kornacker, Software Enginner at Cloudera - "Data modeling for data sci...
Marcel Kornacker, Software Enginner at Cloudera - "Data modeling for data sci...
 
Coursera neuromarketing 2015
Coursera neuromarketing 2015Coursera neuromarketing 2015
Coursera neuromarketing 2015
 
Ppt citysurv as_a_service_en_slideshare
Ppt citysurv as_a_service_en_slidesharePpt citysurv as_a_service_en_slideshare
Ppt citysurv as_a_service_en_slideshare
 
Hombres iguales por naturaleza
Hombres iguales por naturalezaHombres iguales por naturaleza
Hombres iguales por naturaleza
 
Asanid hajar
Asanid hajarAsanid hajar
Asanid hajar
 
історичний розвиток органічного світу
історичний розвиток органічного світуісторичний розвиток органічного світу
історичний розвиток органічного світу
 
Understanding Analytics With Twitter
Understanding Analytics With TwitterUnderstanding Analytics With Twitter
Understanding Analytics With Twitter
 
Compliance plus presentation slide share
Compliance plus presentation slide shareCompliance plus presentation slide share
Compliance plus presentation slide share
 
Sortem repite experiencia en Funergal
Sortem repite experiencia en FunergalSortem repite experiencia en Funergal
Sortem repite experiencia en Funergal
 
Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability
 
Предиктивная аналитика
Предиктивная аналитикаПредиктивная аналитика
Предиктивная аналитика
 
AI and Big Data For National Intelligence
AI and Big Data For National IntelligenceAI and Big Data For National Intelligence
AI and Big Data For National Intelligence
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
 
[Elite Camp 2016] Stacey MacNaught - Nobody Pays the Bills in "Social Shares"...
[Elite Camp 2016] Stacey MacNaught - Nobody Pays the Bills in "Social Shares"...[Elite Camp 2016] Stacey MacNaught - Nobody Pays the Bills in "Social Shares"...
[Elite Camp 2016] Stacey MacNaught - Nobody Pays the Bills in "Social Shares"...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Choppers and cycloconverters
Choppers and cycloconvertersChoppers and cycloconverters
Choppers and cycloconverters
 
InfoSecurity Magazine - Data Loss Prevention
InfoSecurity Magazine - Data Loss PreventionInfoSecurity Magazine - Data Loss Prevention
InfoSecurity Magazine - Data Loss Prevention
 

Ähnlich wie Hadoop bangalore-meetup-dec-2011-hadoop nextgen

Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
Hortonworks
 
YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011
Sharad Agarwal
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
huguk
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationApache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup Presentation
Hortonworks
 

Ähnlich wie Hadoop bangalore-meetup-dec-2011-hadoop nextgen (20)

Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
Apache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationApache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup Presentation
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the Union
 

Mehr von InMobi

Mehr von InMobi (20)

Responding to Coronavirus: How marketers can leverage digital responsibly
Responding to Coronavirus: How marketers can leverage digital responsiblyResponding to Coronavirus: How marketers can leverage digital responsibly
Responding to Coronavirus: How marketers can leverage digital responsibly
 
2020: Celebrating the Era of the Connected Consumer
2020: Celebrating the Era of the Connected Consumer2020: Celebrating the Era of the Connected Consumer
2020: Celebrating the Era of the Connected Consumer
 
Winning the Indian Festive Shopper in 2019
Winning the Indian Festive Shopper in 2019Winning the Indian Festive Shopper in 2019
Winning the Indian Festive Shopper in 2019
 
The Changing Face of the Indian Mobile User
The Changing Face of the Indian Mobile UserThe Changing Face of the Indian Mobile User
The Changing Face of the Indian Mobile User
 
Unlocking the True Potential of Data on Mobile
Unlocking the True Potential of Data on MobileUnlocking the True Potential of Data on Mobile
Unlocking the True Potential of Data on Mobile
 
InMobi State of Mobile Video Advertising Report 2018
InMobi State of Mobile Video Advertising Report 2018InMobi State of Mobile Video Advertising Report 2018
InMobi State of Mobile Video Advertising Report 2018
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
 
The Essential Mediation Toolkit - Korean
The Essential Mediation Toolkit - KoreanThe Essential Mediation Toolkit - Korean
The Essential Mediation Toolkit - Korean
 
A Comprehensive Guide for App Marketers
A Comprehensive Guide for App MarketersA Comprehensive Guide for App Marketers
A Comprehensive Guide for App Marketers
 
A Cure for Ad-Fraud: Turning Fraud Detection into Fraud Prevention
A Cure for Ad-Fraud: Turning Fraud Detection into Fraud PreventionA Cure for Ad-Fraud: Turning Fraud Detection into Fraud Prevention
A Cure for Ad-Fraud: Turning Fraud Detection into Fraud Prevention
 
[Webinar] driving accountability in mobile advertising
[Webinar] driving accountability in mobile advertising[Webinar] driving accountability in mobile advertising
[Webinar] driving accountability in mobile advertising
 
The Brand Marketer's Guide to Mobile Video Viewability
The Brand Marketer's Guide to Mobile Video ViewabilityThe Brand Marketer's Guide to Mobile Video Viewability
The Brand Marketer's Guide to Mobile Video Viewability
 
Top 2017 Mobile Advertising Trends in Indonesia
Top 2017 Mobile Advertising Trends in IndonesiaTop 2017 Mobile Advertising Trends in Indonesia
Top 2017 Mobile Advertising Trends in Indonesia
 
Mobile marketing strategy guide
Mobile marketing strategy guide Mobile marketing strategy guide
Mobile marketing strategy guide
 
InMobi Yearbook 2016
InMobi Yearbook 2016InMobi Yearbook 2016
InMobi Yearbook 2016
 
Boost Retention on Mobile and Keep Users Coming Back for More!
Boost Retention on Mobile and Keep Users Coming Back for More!Boost Retention on Mobile and Keep Users Coming Back for More!
Boost Retention on Mobile and Keep Users Coming Back for More!
 
Building Mobile Creatives that Deliver Real Results
Building Mobile Creatives that Deliver Real ResultsBuilding Mobile Creatives that Deliver Real Results
Building Mobile Creatives that Deliver Real Results
 
Everything you need to know about mobile video ads in india and apac
Everything you need to know about mobile video ads in india and apacEverything you need to know about mobile video ads in india and apac
Everything you need to know about mobile video ads in india and apac
 
The Golden Age of Mobile Video Advertising | Global
The Golden Age of Mobile Video Advertising | GlobalThe Golden Age of Mobile Video Advertising | Global
The Golden Age of Mobile Video Advertising | Global
 
Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Hadoop bangalore-meetup-dec-2011-hadoop nextgen

  • 1. Hadoop  Nextgen/MRv2/YARN   Sharad  Agarwal   sharad@apache.org  
  • 2. About  me   •  Apache  FoundaAon   –  Hadoop  CommiDer  and  PMC  member   –  Hadoop  MR  contributor  ~  4  years   –  Author  of  Hadoop  Nextgen  core   •  Head  of  Technology  PlaKorms  @InMobi   –  Formerly  Architect  @Yahoo!    
  • 3. Hadoop  Map-­‐Reduce  Today   •  JobTracker   –  Manages  cluster   resources  and  job   scheduling   •  TaskTracker   –  Per-­‐node  agent   –  Manage  tasks  
  • 4. Current  LimitaAons   •  Scalability   –  Maximum  Cluster  size  –  4,000  nodes   –  Maximum  concurrent  tasks  –  40,000   –  Coarse  synchronizaAon  in  JobTracker   •  Single  point  of  failure     –  Failure  kills  all  queued  and  running  jobs   –  Jobs  need  to  be  re-­‐submiDed  by  users   •  Restart  is  very  tricky  due  to  complex  state   •  Hard  parAAon  of  resources  into  map  and   reduce  slots  
  • 5. Current  LimitaAons   •  Lacks  support  for  alternate  paradigms   –  IteraAve  applicaAons  implemented  using  Map-­‐ Reduce  are  10x  slower.     –  Example:  K-­‐Means,  PageRank   •  Lack  of  wire-­‐compaAble  protocols     –  Client  and  cluster  must  be  of  same  version   –  ApplicaAons  and  workflows  cannot  migrate  to   different  clusters  
  • 6. Next  GeneraAon  Map-­‐Reduce   Requirements   •  Reliability   •  Availability   •  Scalability  -­‐  Clusters  of  6,000  machines   –  Each  machine  with  16  cores,  48G  RAM,  24TB  disks   –  100,000  concurrent  tasks   –  10,000  concurrent  jobs   •  Wire  CompaAbility   •  Agility  &  EvoluAon  –  Ability  for  customers  to   control  upgrades  to  the  grid  sodware  stack.  
  • 7. Next  GeneraAon  Map-­‐Reduce   Architecture   •  Split  up  the  two  major  funcAons  of  JobTracker   –  Cluster  resource  management   –  ApplicaAon  life-­‐cycle  management   •  Map-­‐Reduce  becomes  user-­‐land  library  
  • 8. Architecture   Node Node Manager Manager Container App Mstr App Mstr Client Resource Resource Node Node Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
  • 9. Architecture   •  Resource  Manager   –  Global  resource  scheduler   –  Hierarchical  queues   •  Node  Manager   –  Per-­‐machine  agent   –  Manages  the  life-­‐cycle  of  container   –  Container  resource  monitoring   •  ApplicaAon  Master   –  Per-­‐applicaAon   –  Manages  applicaAon  scheduling  and  task  execuAon   –  E.g.  Map-­‐Reduce  ApplicaAon  Master  
  • 10.  Improvements  vis-­‐à-­‐vis  current  Map-­‐ Reduce   •  Scalability     –  ApplicaAon  life-­‐cycle  management  is  very   expensive   –  ParAAon  resource  management  and   applicaAon  life-­‐cycle  management   –  ApplicaAon  management  is  distributed   –  Hardware  trends  -­‐  Currently  run  clusters  of   4,000  machines   •  6,000  2012  machines  >  12,000  2009  machines   •  <8  cores,  16G,  4TB>  v/s  <16+  cores,  48/96G,   24TB>  
  • 11.  Improvements  vis-­‐à-­‐vis  current  Map-­‐ Reduce   •  Availability     –  ApplicaAon  Master   •  OpAonal  failover  via  applicaAon-­‐specific   checkpoint   •  Map-­‐Reduce  applicaAons  pick  up  where  they   led  off   –  Resource  Manager   •  No  single  point  of  failure  -­‐  failover  via   ZooKeeper   •  ApplicaAon  Masters  are  restarted   automaAcally  
  • 12.  Improvements  vis-­‐à-­‐vis  current  Map-­‐ Reduce   •  Wire  CompaAbility     –  Protocols  are  wire-­‐compaAble   –  Old  clients  can  talk  to  new  servers   –  Rolling  upgrades  
  • 13.  Improvements  vis-­‐à-­‐vis  current  Map-­‐ Reduce   •  Agility  /  EvoluAon     –  Map-­‐Reduce  now  becomes  a  user-­‐land   library   –  MulAple  versions  of  Map-­‐Reduce  can  run   in  the  same  cluster  (ala  Apache  Pig)   •  Faster  deployment  cycles  for  improvements   –  Customers  upgrade  Map-­‐Reduce  versions   on  their  schedule  
  • 14.  Improvements  vis-­‐à-­‐vis  current  Map-­‐ Reduce   •  UAlizaAon   –  Generic  resource  model     •  Memory   •  CPU   •  Disk  b/w   •  Network  b/w   –  Remove  fixed  parAAon  of  map  and  reduce   slots  
  • 15.  Improvements  vis-­‐à-­‐vis  current  Map-­‐ Reduce   •  Support  for  programming  paradigms   other  than  Map-­‐Reduce   –  MPI   –  Master-­‐Worker   –  Machine  Learning   –  IteraAve  processing   –  Enabled  by  allowing  use  of  paradigm-­‐ specific  ApplicaAon  Master   –  Run  all  on  the  same  Hadoop  cluster  
  • 16. Summary   •  The  next  generaAon  of  Map-­‐Reduce  takes   Hadoop  to  the  next  level   –  Scale-­‐out  even  further   –  High  availability   –  Cluster  UAlizaAon     –  Support  for  paradigms  other  than  Map-­‐Reduce  
  • 17. Status   •  Apache  Hadoop  0.23  release  is  out   –  HDFS  FederaAon   –  MRv2   •  Currently  undergoing  tests  on  Small  scale  ~  500  nodes   •  Alpha     –  2000  nodes   –  Q1  2012   •  Beta/ProducAon   –  Variety  of  applicaAons  and  loads     –  4000+  nodes   –  Q2  2012      
  • 18. QuesAons?   Follow  me  on  @twiDer:  sharad_ag