SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Project presentation by Mário Almeida
          Implementation of Distributed Systems
                                  EMDC @ KTH




                                                  1
Outline
   What is YARN?
   Why is YARN not Highly Available?
   How to make it Highly Available?
   What storage to use?
   Why about NDB?
   Our Contribution
   Results
   Future work
   Conclusions
   Our Team
                                        2
What is YARN?
  Yarn or MapReduce v2 is a complete overhaul of the
    original MapReduce.

                                                 No more
                                                   M/R
   Split                                        containers
JobTracker




              Per-App
             AppMaster
                                                             3
Is YARN Highly-Available?




     All jobs are
         lost!

                            4
How to make it H.A?
 Store application states!




                              5
How to make it H.A?
 Failure recovery

            RM1              Downtime          RM1




                     store              load




                                                     6
How to make it H.A?
 Failure recovery -> Fail-over chain

            RM1           No Downtime          RM2




                  store                 load




                                                     7
How to make it H.A?
 Failure recovery -> Fail-over chain -> Stateless RM




          RM1                 RM2                 RM3




                                The Scheduler
                               would have to be
                                    sync!


                                                        8
What storage to use?
 Hadoop proposed:
   Hadoop Distributed File System (HDFS).
       Fault-tolerant, large datasets, streaming access to data and
        more.
   Zookeeper – highly reliable distributed coordination.
       Wait-free, FIFO client ordering, linearizable writes and more.




                                                                         9
What about NDB?
 NDB MySQL Cluster is a scalable, ACID-compliant
  transactional database
 Some features:
   Auto-sharding for R/W scalability;
   SQL and NoSQL interfaces;
   No single point of failure;
   In-memory data;
   Load balancing;
   Adding nodes = no Downtime;
   Fast R/W rate
   Fine grained locking
   Now for G.A!


                                                    10
What about NDB?
                  Connected
                     to all
                   clustered
                    storage
                     nodes




Configuration
and network
 partitioning
                          11
What about NDB?



                     Linear
                   horizontal
                   scalability

                    Up to 4.3
                  Billion reads
                   p/minute!

                                  12
Our Contribution
 Two phases, dependent on YARN patch releases.

 Phase 1                                                   Not really
                                                              H.A!
    Apache
       Implemented Resource Manager recovery using a Memory
        Store (MemoryRMStateStore).
       Stores the Application State and Application Attempt State.
   We                                              Up to 10.5x
     Implemented NDB MySQL Cluster Store           faster than
                                                   openjpa-jdbc
      (NdbRMStateStore) using clusterj.
     Implemented TestNdbRMRestart to prove the H.A of YARN.


                                                                         13
Our Contribution
                testNdbRMRestart

                               Restarts all
                               unfinished
                                  jobs




                                          14
Our Contribution
 Phase 2:
    Apache
       Implemented Zookeeper Store (ZKRMStateStore).
       Implemented FileSystem Store (FileSystemRMStateStore).
   We
       Developed a storage benchmark framework
         To benchmark both performances with our store.

         https://github.com/4knahs/zkndb

                                                        For
                                                    supporting
                                                      clusterj

                                                                 15
Our contribution
 Zkndb architecture:




                        16
Our Contribution
 Zkndb extensibility:




                         17
Results
  Runed multiple
   experiments:        ZK is limited
                       by the store
      1 nodes
    12 Threads,
    60 seconds                          HDFS has
                                        problems
  Each node with:                      with creation
 Dual Six-core CPUs                       of files
      @2.6Ghz

 All clusters with 3
       nodes.
                                         Not good
   Same code as                          for small
Hadoop (ZK & HDFS)                         files!
                                                 18
Results
  Runed multiple
                       ZK could
   experiments:
                       scale a bit
                         more!
      3 nodes
  12 Threads each,
     30 seconds
                                      Gets even
  Each node with:                    worse due to
 Dual Six-core CPUs                  root lock in
      @2.6Ghz                        NameNode

 All clusters with 3
       nodes.

   Same code as
Hadoop (ZK & HDFS)
                                              19
Future work
 Implement stateless architecture.
 Study the overhead of writing state to NDB.




                                                20
Conclusions
 HDFS and Zookeeper have both disadvantages for this
    purpose.
   HDFS performs badly for multiple small file creation,
    so it would not be suitable for storing state from the
    Application Masters.
   Zookeeper serializes all updates through a single
    leader (up to 50K requests). Horizontal scalability?
   NDB throughput outperforms both HDFS and ZK.
   A combination of HDFS and ZK does support apache’s
    proposal with a few restrictions.
                                                             21
Our team!
 Mário Almeida (site – 4knahs(at)gmail)
 Arinto Murdopo (site – arinto(at)gmail)
 Strahinja Lazetic (strahinja1984(at)gmail)
 Umit Buyuksahin (ucbuyuksahin(at)gmail)


 Special thanks
    Jim Dowling (SICS, supervisor)
    Vasia Kalavri (EMJD-DC, supervisor)
    Johan Montelius (EMDC coordinator, course teacher)

                                                          22

Weitere ähnliche Inhalte

Was ist angesagt?

Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflowrjmurphyslideshare
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
 
linux.conf.au-HAminiconf-pgsql91-20120116
linux.conf.au-HAminiconf-pgsql91-20120116linux.conf.au-HAminiconf-pgsql91-20120116
linux.conf.au-HAminiconf-pgsql91-20120116ksk_ha
 
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph병수 박
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolDaniel Austin
 
Preventing multi master conflicts with tungsten
Preventing multi master conflicts with tungstenPreventing multi master conflicts with tungsten
Preventing multi master conflicts with tungstenGiuseppe Maxia
 
Vizuri exadata virtual
Vizuri exadata virtualVizuri exadata virtual
Vizuri exadata virtualZack Belcher
 
Solving MySQL replication problems with Tungsten
Solving MySQL replication problems with TungstenSolving MySQL replication problems with Tungsten
Solving MySQL replication problems with TungstenGiuseppe Maxia
 
Bluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysisBluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysis병수 박
 
Ph.D. thesis presentation
Ph.D. thesis presentationPh.D. thesis presentation
Ph.D. thesis presentationdavidkftam
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshootingmapr-academy
 
Jug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsJug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsDavide Carnevali
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2Nick Wang
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
Cache-partitioning
Cache-partitioningCache-partitioning
Cache-partitioningdavidkftam
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 

Was ist angesagt? (20)

Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
linux.conf.au-HAminiconf-pgsql91-20120116
linux.conf.au-HAminiconf-pgsql91-20120116linux.conf.au-HAminiconf-pgsql91-20120116
linux.conf.au-HAminiconf-pgsql91-20120116
 
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON Protocol
 
Preventing multi master conflicts with tungsten
Preventing multi master conflicts with tungstenPreventing multi master conflicts with tungsten
Preventing multi master conflicts with tungsten
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Vizuri exadata virtual
Vizuri exadata virtualVizuri exadata virtual
Vizuri exadata virtual
 
Solving MySQL replication problems with Tungsten
Solving MySQL replication problems with TungstenSolving MySQL replication problems with Tungsten
Solving MySQL replication problems with Tungsten
 
Bluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysisBluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysis
 
Ph.D. thesis presentation
Ph.D. thesis presentationPh.D. thesis presentation
Ph.D. thesis presentation
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
 
Jug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsJug Lugano - Scale over the limits
Jug Lugano - Scale over the limits
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2
 
Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 
Cache-partitioning
Cache-partitioningCache-partitioning
Cache-partitioning
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 

Ähnlich wie High Availability YARN with NDB MySQL Cluster

High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARNArinto Murdopo
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRVijay Rayapati
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You? EMC
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer lookDECK36
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloudaidanshribman
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph Community
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Ähnlich wie High Availability YARN with NDB MySQL Cluster (20)

Spark
SparkSpark
Spark
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
HugNov14
HugNov14HugNov14
HugNov14
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06
 
20140708hcj
20140708hcj20140708hcj
20140708hcj
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer look
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack
 
Mysql talk
Mysql talkMysql talk
Mysql talk
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Mehr von Mário Almeida

Empirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingEmpirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingMário Almeida
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeMário Almeida
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalabilityMário Almeida
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsMário Almeida
 
Self-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsSelf-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsMário Almeida
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelizationMário Almeida
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacksMário Almeida
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News AggregatorMário Almeida
 
Exploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsExploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsMário Almeida
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksMário Almeida
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytraceMário Almeida
 
Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabricMário Almeida
 

Mehr von Mário Almeida (13)

Empirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingEmpirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application Scheduling
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skype
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalability
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache Simulations
 
Self-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsSelf-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File Systems
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelization
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacks
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
 
Exploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsExploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed Systems
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytrace
 
Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabric
 
SOAP vs REST
SOAP vs RESTSOAP vs REST
SOAP vs REST
 

High Availability YARN with NDB MySQL Cluster

  • 1. Project presentation by Mário Almeida Implementation of Distributed Systems EMDC @ KTH 1
  • 2. Outline  What is YARN?  Why is YARN not Highly Available?  How to make it Highly Available?  What storage to use?  Why about NDB?  Our Contribution  Results  Future work  Conclusions  Our Team 2
  • 3. What is YARN?  Yarn or MapReduce v2 is a complete overhaul of the original MapReduce. No more M/R Split containers JobTracker Per-App AppMaster 3
  • 4. Is YARN Highly-Available? All jobs are lost! 4
  • 5. How to make it H.A?  Store application states! 5
  • 6. How to make it H.A?  Failure recovery RM1 Downtime RM1 store load 6
  • 7. How to make it H.A?  Failure recovery -> Fail-over chain RM1 No Downtime RM2 store load 7
  • 8. How to make it H.A?  Failure recovery -> Fail-over chain -> Stateless RM RM1 RM2 RM3 The Scheduler would have to be sync! 8
  • 9. What storage to use?  Hadoop proposed:  Hadoop Distributed File System (HDFS).  Fault-tolerant, large datasets, streaming access to data and more.  Zookeeper – highly reliable distributed coordination.  Wait-free, FIFO client ordering, linearizable writes and more. 9
  • 10. What about NDB?  NDB MySQL Cluster is a scalable, ACID-compliant transactional database  Some features:  Auto-sharding for R/W scalability;  SQL and NoSQL interfaces;  No single point of failure;  In-memory data;  Load balancing;  Adding nodes = no Downtime;  Fast R/W rate  Fine grained locking  Now for G.A! 10
  • 11. What about NDB? Connected to all clustered storage nodes Configuration and network partitioning 11
  • 12. What about NDB? Linear horizontal scalability Up to 4.3 Billion reads p/minute! 12
  • 13. Our Contribution  Two phases, dependent on YARN patch releases.  Phase 1 Not really H.A!  Apache  Implemented Resource Manager recovery using a Memory Store (MemoryRMStateStore).  Stores the Application State and Application Attempt State.  We Up to 10.5x  Implemented NDB MySQL Cluster Store faster than openjpa-jdbc (NdbRMStateStore) using clusterj.  Implemented TestNdbRMRestart to prove the H.A of YARN. 13
  • 14. Our Contribution  testNdbRMRestart Restarts all unfinished jobs 14
  • 15. Our Contribution  Phase 2:  Apache  Implemented Zookeeper Store (ZKRMStateStore).  Implemented FileSystem Store (FileSystemRMStateStore).  We  Developed a storage benchmark framework  To benchmark both performances with our store.  https://github.com/4knahs/zkndb For supporting clusterj 15
  • 16. Our contribution  Zkndb architecture: 16
  • 17. Our Contribution  Zkndb extensibility: 17
  • 18. Results Runed multiple experiments: ZK is limited by the store 1 nodes 12 Threads, 60 seconds HDFS has problems Each node with: with creation Dual Six-core CPUs of files @2.6Ghz All clusters with 3 nodes. Not good Same code as for small Hadoop (ZK & HDFS) files! 18
  • 19. Results Runed multiple ZK could experiments: scale a bit more! 3 nodes 12 Threads each, 30 seconds Gets even Each node with: worse due to Dual Six-core CPUs root lock in @2.6Ghz NameNode All clusters with 3 nodes. Same code as Hadoop (ZK & HDFS) 19
  • 20. Future work  Implement stateless architecture.  Study the overhead of writing state to NDB. 20
  • 21. Conclusions  HDFS and Zookeeper have both disadvantages for this purpose.  HDFS performs badly for multiple small file creation, so it would not be suitable for storing state from the Application Masters.  Zookeeper serializes all updates through a single leader (up to 50K requests). Horizontal scalability?  NDB throughput outperforms both HDFS and ZK.  A combination of HDFS and ZK does support apache’s proposal with a few restrictions. 21
  • 22. Our team!  Mário Almeida (site – 4knahs(at)gmail)  Arinto Murdopo (site – arinto(at)gmail)  Strahinja Lazetic (strahinja1984(at)gmail)  Umit Buyuksahin (ucbuyuksahin(at)gmail)  Special thanks  Jim Dowling (SICS, supervisor)  Vasia Kalavri (EMJD-DC, supervisor)  Johan Montelius (EMDC coordinator, course teacher) 22

Hinweis der Redaktion

  1. Guest talks + student presentations
  2. Data nodes manage the storage and access to data. Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healing.
  3. MySQL Cluster is deployed in the some of the largest web, telecomsThe storage nodes (SN) are the main nodes of the system. All data is stored on the storage nodes.Data is replicated between storage nodes to ensure data is continuously available in case one ormore storage nodes fail. The storage nodes handle all database transactions.The management server nodes (MGM) handle the system configuration and are used to changethe setup of the system. Usually only one management server node is used, but there is also apossibility to run several. The management server node is only used at startup and system reconfiguration,which means that storage nodes are operable without the management nodes.