SlideShare ist ein Scribd-Unternehmen logo
1 von 25
High Availability in YARN



      ID2219 Project Presentation
          Arinto Murdopo (arinto@gmail.com)
The team!
•   Mário A. (site – 4khnahs #at# gmail)
•   Arinto M. (site – arinto #at# gmail)
•   Strahinja L. (strahinja1984 #at# gmail)
•   Umit C.B. (ucbuyuksahin #at# gmail)

• Special thanks
      – Jim Dowling (SICS, supervisor)
      – Vasiliki Kalavri (EMJD-DC, supervisor)
      – Johan Montelius (Course teacher)

12/6/2012                                        2
Outline
•   Define: YARN
•   Why is it not highly available (H.A.)?
•   Providing H.A in YARN
•   What storage to use?
•   Here comes NDB
•   What we have done so far?
•   Experiment result
•   What’s next?
•   Conclusions

12/6/2012                                    3
Define: YARN
• YARN = Yet Another Resource Negotiator

• Is NOT ONLY MapReduce 2.0, but also…

• Framework to develop and/or execute
  distributed processing applications

• Example: MapReduce, Spark, Apache
  HAMA, Apache Giraph


12/6/2012                                  4
Generic
                                 containers
  Define: YARN




Split JobTracker’s    Per-App
 responsibilities    AppMaster

  12/6/2012                                   5
What is it not highly available (H.A.)?




            ResourceManager is
            Single Point of Failure
            (SPoF)
12/6/2012                                 6
Providing H.A. in YARN
Proposed approach
• store and reload state
• failure model:
   1. Recovery
   2. Failover
   3. Stateless




12/6/2012                  7
Failure Model#1: Recovery

               Store states
               Load states




               1. RM stores states when needed
               2. RM failure happens
               3. Clients keep retrying
               4. RM restarts and loads states
               5. Clients successfully connect to
                  resurrected RM
               6. Downtime exists!

12/6/2012                                       8
Failure Model#2: Failover
• Utilize Standby RM
• Little Downtime                        Standby
                              Resource
                                         Resource
                              Manager
                                         Manager



             Store

                       Load




 12/6/2012                                          9
Failure Model#3: Stateless
 Store all states in
 storage, example:
 1. NM Lists                  Resource   Resource
 2. App Lists                 Manager    Manager




                       Client
                         Node
                       Manager
                       AppMaster



12/6/2012                                           10
What storage to use?
Apache proposed
• Hadoop Distributed File System (HDFS)
      – Fault-tolerant, large datasets, streaming
        access to data and more


• ZooKeeper
      – Highly reliable distributed coordination
      – Wait-free, FIFO client ordering,
        linearizables writes and more

12/6/2012                                           11
Here comes NDB
NDB MySQL Cluster is a scalable, ACID-
compliant transactional database

Some features
• Designed for availability (No SPoF)
• In-memory distributed database
• Horizontal scalability (auto-sharding, no downtime
  when adding new node)
• Fast R/W rate
• Fine grained locking
• SQL and NoSQL Interface

12/6/2012                                              12
Here comes NDB
                 Client




12/6/2012                 13
Here comes NDB
      MySQL Cluster version 7.2




                                  Linear horizontal
                                     scalability


                                  Up to 4.3 Billion
                                   reads/minute!




12/6/2012                                       14
What we have done so far?
• Phase 1: The Ndb-storage-class
      – Apache proposed failure model
      – We developed NdbRMStateStore, that has
        H.A!


• Phase 2 : The Framework
      – Apache created ZK and FS storage classes
      – We developed a framework for storage
        benchmarking

12/6/2012                                          15
Phase 1: The Ndb-storage-class
Apache
      – implemented Memory Store for Resource
        Manager (RM) recovery (MemoryRMStateStore)
      – Application State and Application Attempt are
        stored
      – Restart app when RM is resurrected
      – It’s not really H.A.!

We
      – Implemented NDB Mysql Cluster Store
        (NdbRMStateStore)using clusterj
      – Implemented TestNdbRMRestart, to prove the
        H.A. in YARN
12/6/2012                                          16
Phase 1: The-Ndb-storage-class
                     TestNdbRM-
                     Restart

                        Restart all
                      unfinished jobs




12/6/2012                               18
Phase 2: The Framework
Apache
      – Implemented Zookeeper Store
        (ZKRMStateStore)
      – Implemented File System Store
        (FileSystemRMStateStore)

We
      – Developed a storage-benchmark-framework
        to benchmark both performances with our
      store
      – https://github.com/4knahs/zkndb

12/6/2012                                     19
Phase 2: The Framework
zkndb = framework for storage benchmarking




12/6/2012                               20
Phase 2: The Framework
zkndb extensibility




12/6/2012                21
Experiment Setup
• ZooKeeper
      – Three nodes in SICS cluster
      – Each ZK process has max memory of 5GB

• HDFS
      – Three DataNodes and one Namenode
      – Each HDFS DN and NN process has max
        memory of 5GB

• NDB
      – Three-node cluster
12/6/2012                                       22
Experiment Result #1
Load Setup#1:
1 node             ZK is limited by
12 threads             its store
60 seconds         implementation

Each node:
Dual six-core
CPUs
@2.6Ghz

All clusters
consist of 3
                                      Not good
nodes
                                      for small
                                        files!
Utilize Hadoop
code for ZK and
HDFS

   12/6/2012                               23
Experiment Result #2
Load Setup#2:
3 nodes
@12 threads         ZK could scale
30 seconds           a bit more!

Each node:
Dual six-core
CPUs
@2.6Ghz
                                       Get even
All clusters                         worse due to
consist of 3                          root lock in
nodes                                NameNode!

Utilize Hadoop
code for ZK and
HDFS

   12/6/2012                                  24
What’s next?
• Scheduler and ResourceTracker
  Analysis

• Stateless Architecture

• Study the overhead of writing state
  to NDB


12/6/2012                               25
Conclusions
• NDB has higher throughput than ZK
  and HDFS

• NDB is the suitable storage for
  Stateless Failure Model

• but ZK and HDFS are not for
  Stateless Failure Model!
12/6/2012                             26

Weitere ähnliche Inhalte

Was ist angesagt?

Reduce planned database down time with Oracle technology
Reduce planned database down time with Oracle technologyReduce planned database down time with Oracle technology
Reduce planned database down time with Oracle technologyKirill Loifman
 
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Naganarasimha Garla
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerAndrejs Vorobjovs
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 featuresanand murari
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
 
Oracle database upgrade to 12c and available methods
Oracle database upgrade to 12c and available methodsOracle database upgrade to 12c and available methods
Oracle database upgrade to 12c and available methodsSatishbabu Gunukula
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Database Consolidation using the Oracle Multitenant Architecture
Database Consolidation using the Oracle Multitenant ArchitectureDatabase Consolidation using the Oracle Multitenant Architecture
Database Consolidation using the Oracle Multitenant ArchitecturePini Dibask
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 

Was ist angesagt? (20)

Reduce planned database down time with Oracle technology
Reduce planned database down time with Oracle technologyReduce planned database down time with Oracle technology
Reduce planned database down time with Oracle technology
 
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
 
HBase internals
HBase internalsHBase internals
HBase internals
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Yarn
YarnYarn
Yarn
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
 
Oracle database upgrade to 12c and available methods
Oracle database upgrade to 12c and available methodsOracle database upgrade to 12c and available methods
Oracle database upgrade to 12c and available methods
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Database Consolidation using the Oracle Multitenant Architecture
Database Consolidation using the Oracle Multitenant ArchitectureDatabase Consolidation using the Oracle Multitenant Architecture
Database Consolidation using the Oracle Multitenant Architecture
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 

Andere mochten auch

Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Kelvin Glen
 
Arviointi ja palaute 2011
Arviointi ja palaute 2011Arviointi ja palaute 2011
Arviointi ja palaute 2011Marko Havu
 
The counting system for small animals in japanese
The counting system for small animals in japaneseThe counting system for small animals in japanese
The counting system for small animals in japaneseCheyanneStotlar
 
Maailmassa on parempia pankkeja
Maailmassa on parempia pankkejaMaailmassa on parempia pankkeja
Maailmassa on parempia pankkejaPankki2
 
how to say foods and drinks in japanese
how to say foods and drinks in japanesehow to say foods and drinks in japanese
how to say foods and drinks in japaneseCheyanneStotlar
 
Pankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittelyPankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittelyPankki2
 
Architecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArchitecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArinto Murdopo
 
Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Arinto Murdopo
 
Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Arinto Murdopo
 
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8yaying-yingg
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Arinto Murdopo
 
Cultura mites
Cultura mitesCultura mites
Cultura mitesComalat1D
 
153 test plan
153 test plan153 test plan
153 test plan< <
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...Arinto Murdopo
 
Uso correto de epi´s abafadores
Uso correto de epi´s   abafadoresUso correto de epi´s   abafadores
Uso correto de epi´s abafadoresPaulo Carvalho
 
Moodboards eda
Moodboards edaMoodboards eda
Moodboards edaedaozdemir
 

Andere mochten auch (20)

Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015
 
Arviointi ja palaute 2011
Arviointi ja palaute 2011Arviointi ja palaute 2011
Arviointi ja palaute 2011
 
The counting system for small animals in japanese
The counting system for small animals in japaneseThe counting system for small animals in japanese
The counting system for small animals in japanese
 
Maailmassa on parempia pankkeja
Maailmassa on parempia pankkejaMaailmassa on parempia pankkeja
Maailmassa on parempia pankkeja
 
how to say foods and drinks in japanese
how to say foods and drinks in japanesehow to say foods and drinks in japanese
how to say foods and drinks in japanese
 
Pankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittelyPankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittely
 
Architecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArchitecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity Fabric
 
Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Why File Sharing is Dangerous?
Why File Sharing is Dangerous?
 
UX homework4
UX homework4UX homework4
UX homework4
 
Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Distributed Computing - What, why, how..
Distributed Computing - What, why, how..
 
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8
 
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services
 
Sam houston chess team
Sam houston chess teamSam houston chess team
Sam houston chess team
 
Facebook
FacebookFacebook
Facebook
 
Cultura mites
Cultura mitesCultura mites
Cultura mites
 
153 test plan
153 test plan153 test plan
153 test plan
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...
 
Uso correto de epi´s abafadores
Uso correto de epi´s   abafadoresUso correto de epi´s   abafadores
Uso correto de epi´s abafadores
 
Moodboards eda
Moodboards edaMoodboards eda
Moodboards eda
 

Ähnlich wie High Availability in YARN

High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)Mário Almeida
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
An Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive ApplicationsAn Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive ApplicationsXiao Qin
 
MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)Frazer Clement
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
IEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFS
IEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFSIEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFS
IEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFSAndré Oriani
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkDavid Smelker
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 

Ähnlich wie High Availability in YARN (20)

High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
An Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive ApplicationsAn Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive Applications
 
MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)
 
ha_module5
ha_module5ha_module5
ha_module5
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
HugNov14
HugNov14HugNov14
HugNov14
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
IEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFS
IEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFSIEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFS
IEEE SRDS'12: From Backup to Hot Standby: High Availability for HDFS
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 

Mehr von Arinto Murdopo

Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...Arinto Murdopo
 
Quantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideQuantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideArinto Murdopo
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksArinto Murdopo
 
Parallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIParallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIArinto Murdopo
 
Megastore - ID2220 Presentation
Megastore - ID2220 PresentationMegastore - ID2220 Presentation
Megastore - ID2220 PresentationArinto Murdopo
 
Flume Event Scalability
Flume Event ScalabilityFlume Event Scalability
Flume Event ScalabilityArinto Murdopo
 
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideLarge Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideArinto Murdopo
 
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsLarge-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsArinto Murdopo
 
Rise of Network Virtualization
Rise of Network VirtualizationRise of Network Virtualization
Rise of Network VirtualizationArinto Murdopo
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignArinto Murdopo
 
Distributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingDistributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingArinto Murdopo
 
Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Arinto Murdopo
 

Mehr von Arinto Murdopo (17)

Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...
 
Quantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideQuantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slide
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
Parallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIParallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPI
 
Dremel Paper Review
Dremel Paper ReviewDremel Paper Review
Dremel Paper Review
 
Megastore - ID2220 Presentation
Megastore - ID2220 PresentationMegastore - ID2220 Presentation
Megastore - ID2220 Presentation
 
Flume Event Scalability
Flume Event ScalabilityFlume Event Scalability
Flume Event Scalability
 
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideLarge Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
 
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsLarge-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
 
Rise of Network Virtualization
Rise of Network VirtualizationRise of Network Virtualization
Rise of Network Virtualization
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
Distributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingDistributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer Computing
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 

High Availability in YARN

  • 1. High Availability in YARN ID2219 Project Presentation Arinto Murdopo (arinto@gmail.com)
  • 2. The team! • Mário A. (site – 4khnahs #at# gmail) • Arinto M. (site – arinto #at# gmail) • Strahinja L. (strahinja1984 #at# gmail) • Umit C.B. (ucbuyuksahin #at# gmail) • Special thanks – Jim Dowling (SICS, supervisor) – Vasiliki Kalavri (EMJD-DC, supervisor) – Johan Montelius (Course teacher) 12/6/2012 2
  • 3. Outline • Define: YARN • Why is it not highly available (H.A.)? • Providing H.A in YARN • What storage to use? • Here comes NDB • What we have done so far? • Experiment result • What’s next? • Conclusions 12/6/2012 3
  • 4. Define: YARN • YARN = Yet Another Resource Negotiator • Is NOT ONLY MapReduce 2.0, but also… • Framework to develop and/or execute distributed processing applications • Example: MapReduce, Spark, Apache HAMA, Apache Giraph 12/6/2012 4
  • 5. Generic containers Define: YARN Split JobTracker’s Per-App responsibilities AppMaster 12/6/2012 5
  • 6. What is it not highly available (H.A.)? ResourceManager is Single Point of Failure (SPoF) 12/6/2012 6
  • 7. Providing H.A. in YARN Proposed approach • store and reload state • failure model: 1. Recovery 2. Failover 3. Stateless 12/6/2012 7
  • 8. Failure Model#1: Recovery Store states Load states 1. RM stores states when needed 2. RM failure happens 3. Clients keep retrying 4. RM restarts and loads states 5. Clients successfully connect to resurrected RM 6. Downtime exists! 12/6/2012 8
  • 9. Failure Model#2: Failover • Utilize Standby RM • Little Downtime Standby Resource Resource Manager Manager Store Load 12/6/2012 9
  • 10. Failure Model#3: Stateless Store all states in storage, example: 1. NM Lists Resource Resource 2. App Lists Manager Manager Client Node Manager AppMaster 12/6/2012 10
  • 11. What storage to use? Apache proposed • Hadoop Distributed File System (HDFS) – Fault-tolerant, large datasets, streaming access to data and more • ZooKeeper – Highly reliable distributed coordination – Wait-free, FIFO client ordering, linearizables writes and more 12/6/2012 11
  • 12. Here comes NDB NDB MySQL Cluster is a scalable, ACID- compliant transactional database Some features • Designed for availability (No SPoF) • In-memory distributed database • Horizontal scalability (auto-sharding, no downtime when adding new node) • Fast R/W rate • Fine grained locking • SQL and NoSQL Interface 12/6/2012 12
  • 13. Here comes NDB Client 12/6/2012 13
  • 14. Here comes NDB MySQL Cluster version 7.2 Linear horizontal scalability Up to 4.3 Billion reads/minute! 12/6/2012 14
  • 15. What we have done so far? • Phase 1: The Ndb-storage-class – Apache proposed failure model – We developed NdbRMStateStore, that has H.A! • Phase 2 : The Framework – Apache created ZK and FS storage classes – We developed a framework for storage benchmarking 12/6/2012 15
  • 16. Phase 1: The Ndb-storage-class Apache – implemented Memory Store for Resource Manager (RM) recovery (MemoryRMStateStore) – Application State and Application Attempt are stored – Restart app when RM is resurrected – It’s not really H.A.! We – Implemented NDB Mysql Cluster Store (NdbRMStateStore)using clusterj – Implemented TestNdbRMRestart, to prove the H.A. in YARN 12/6/2012 16
  • 17. Phase 1: The-Ndb-storage-class TestNdbRM- Restart Restart all unfinished jobs 12/6/2012 18
  • 18. Phase 2: The Framework Apache – Implemented Zookeeper Store (ZKRMStateStore) – Implemented File System Store (FileSystemRMStateStore) We – Developed a storage-benchmark-framework to benchmark both performances with our store – https://github.com/4knahs/zkndb 12/6/2012 19
  • 19. Phase 2: The Framework zkndb = framework for storage benchmarking 12/6/2012 20
  • 20. Phase 2: The Framework zkndb extensibility 12/6/2012 21
  • 21. Experiment Setup • ZooKeeper – Three nodes in SICS cluster – Each ZK process has max memory of 5GB • HDFS – Three DataNodes and one Namenode – Each HDFS DN and NN process has max memory of 5GB • NDB – Three-node cluster 12/6/2012 22
  • 22. Experiment Result #1 Load Setup#1: 1 node ZK is limited by 12 threads its store 60 seconds implementation Each node: Dual six-core CPUs @2.6Ghz All clusters consist of 3 Not good nodes for small files! Utilize Hadoop code for ZK and HDFS 12/6/2012 23
  • 23. Experiment Result #2 Load Setup#2: 3 nodes @12 threads ZK could scale 30 seconds a bit more! Each node: Dual six-core CPUs @2.6Ghz Get even All clusters worse due to consist of 3 root lock in nodes NameNode! Utilize Hadoop code for ZK and HDFS 12/6/2012 24
  • 24. What’s next? • Scheduler and ResourceTracker Analysis • Stateless Architecture • Study the overhead of writing state to NDB 12/6/2012 25
  • 25. Conclusions • NDB has higher throughput than ZK and HDFS • NDB is the suitable storage for Stateless Failure Model • but ZK and HDFS are not for Stateless Failure Model! 12/6/2012 26

Hinweis der Redaktion

  1. Today I am going to present the result of our project, titled High Availability in YARN. The main motivation of this project is shortcomings of YARN in term of availability Although Apache regards YARN as the next gen MR, it still has single point of failure, hence it has some availability problem into certain extent.
  2. MR = Spark = MR-like cluster computing framework for low-latency iterative jobs and interactive use of interpreterHAMA = computing framework on top of HDFS -> Matrix, Graph and Network algoGiraph = Apache’s graph processing platform
  3. Split the responsibility of JobTracker:Resource Management -> Scheduler and ResourceTrackerJobScheduling and Monitoring -> AppMasterEach application has its own AppMasterContainer nows generic, could be used to execute distributed application ie
  4. When Container fails:When AppMaster fails:When NM fails:When RM fails:
  5. Persist RM state1 out of 3 failure models
  6. HDFS good forFault tolerant -> replicated data into DatanodeLarge-dataset -> divide huge data smaller blocks and distribute them into HDFSStreaming access to file system dataDesigned to run on commodity hardwareZookeeper Wait-free = lock free + bounded number of steps to finish operationFIFO client ordering =all requests from a given client are executed in the order they were sent by the clientLinearizables write = all writes are linearizable: all steps can be viewed as valid atomic operation
  7. NDB: MySQL Cluster integrates the standard MySQL server with an in-memory clustered storage engine called NDB.Designed for availabilityIn-memory db -> good for session managementHorizontal scalability -> add new node means new capacityFast r/w rate -> 4.3 Billion read, 1.2 billion writes (update)Fine-grained locking -> lock applied to individual row
  8. Application nodes provide connectivity from the application logic to the data nodes. Multiple APIs are presented to the application. MySQL provides a standard SQL interface, including connectivity to all of the leading web development languages and frameworks. There are also a whole range of NoSQL interfaces including Memcached, REST/HTTP, C++ (NDB-API), Java and JPA.Data nodes manage the storage and access to data. Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healingManagement nodes are used to configure the cluster and provide arbitration in the event ofnetwork partitioning.
  9. 20 Million updates per second = 1.2 billion updates/minutesExperiment settings:FlexAsynch benchmark suiteThe benchmark reads or updates an entire row from the database as part of its test operation. All UPDATEoperations are fully transactional. As part of these tests, each row in this benchmark is 100 bytes total,comprising 25 columns, each 4 bytes in size, though the size and number of columns are fully configurable.
  10. clusterj is up to 10.5x faster than openjpa-jdbcAppState = AppId -> IntClusterTimeStamp -> Long, AppId + ClusterTimeStamp = ApplicationId classSubmitTime -> LongAppSubmissionContext -> Priority, AppName, Queue, User, ContainerLaunchContext (requested resource), some flagsCollection of AppAttemptAppAttempt = AppIdAppAttemptIdMasterContainer -> ContainerPBImpl (first allocated container from RM to AM)
  11. Extensibility in implementing the Storage (StorageImpl), defining the metrics, defining how we are going to store the result
  12. Flexibility in implementing the Storage (StorageImpl)Flexibility in defining the metricsFlexibility in defining how we are going to store the result
  13. Store implementation => fixed data access time since our code is synchronous writeHDFS not good for small files -> too many overhead.Furthermore, HDFS is not geared up to efficiently accessing small files: it is primarily designed for streaming access of large files. Reading through small files normally causes lots of seeks and lots of hopping from datanode to datanode to retrieve each small file, all of which is an inefficient data access patternd in storing small fileshttp://blog.cloudera.com/blog/2009/02/the-small-files-problem/ NN is bloated in tracking file metadata3900 15500 14003850 11500 10003850 13250 1400
  14. Put number here:Data Type,Zookeeper,Ndb,Hdfs10993.69 42665.2 5328.62 9858.92 28256.27 534.69210035.97 37607.8 1079.077