SlideShare a Scribd company logo
1 of 26
Download to read offline
Neo4j
   High Availability
  New Auto-Cluster

Michael Hunger - @mesirii
                            1
High Availability Cluster
  ๏Neo4j Enterprise
  ๏Master-Slave Replication
  ๏read-scaling and fault-tolerance
  ๏eventual consistency
    • write to master (push_factor)

    • write to slaves
                                      2
3 Separate Concerns (I)
๏Cluster Management
  •   Members join/leave/heartbeat
๏Failover
  •   Master Election

  • Distribution of Master-Status


                                     3
3 Separate Concerns (II)
๏Replication
  •synchronized id-generation

  • distributed locks

  • pull, push of transactions

  • initial store synchronization


                                    4
Pre 1.9 - Zookeeper


                  5
Pre 1.9
๏Apache Zookeeper took care of concerns
  •   Cluster Management
      ‣new members register with ZK
  •   Failover
      ‣ZK stores Master and last TX-Id
      ‣ZK uses ZAB to determine new Master
       and distribute information
                                         6
HA Cluster

Coordinator              RO-                Coordinator
                         Slave




                       Master

              Slave                 Slave




                      Coordinator



                                                     7
Pre 1.9 - Problems
๏Additional setup and operations of a separate
   component

๏unreliable operation / hiccups
๏longterm stability
๏no dynamic reconfig of the ZK cluster
   important for cloud setup

                                         8
Post 1.9 -
Neo4j Auto Cluster


                 9
Replace Zookeeper!?
๏Implement Multi-Paxos ourselves
๏simple, testable code
๏only covers
  • cluster management,

  • master election


                                   10
HA Cluster




             11
What is Paxos?
๏reliable consensus making
๏broadcasting
๏works even with unreliable communication
  •message lost

  • delays, invalid order
๏does not guarantee progress
                                       12
What is Paxos?




                 13
Implementation
๏everything is a State Machines
  • SM = stateless enums + context

  • Message = type enum + payload

  • State = enum instance

  • switch on msg-type, implement logic
    Transition = handle() messages,


                                          14
Implementation (II)
๏everything is a State Machines
  •   use timeouts for reliability

  • handle failing messages

  • decouple network and time
      ‣for testability
  •   listeners interact on messages with
        outside world, sync or async        15
Implementation (II)
๏Paxos (3 roles)                   Acceptor



  •   Proposer-SM                 Paxos

  • Acceptor-SM
                       Proposer                  Learner




  • Learner-SM                    ClusterState


๏Cluster
  •
                    Heartbeat
      Heartbeat
                                                           16
Multi-Paxos (happy path)
                                                              Acceptor
              Learner              Proposer
                                                              (2 * f + 1)

                         PREPARE


                                                    PREPARE

                                              TIMEOUT

                                                                       VALUE
                                                   PROMISE             MATCH
                                                     OR
                                                   REJECT            NO MATCH



                                                    ACCEPT
                                                                      MATCHES
                                                  TIMEOUT
                                                                      PROMISE?

                            CHECK ,                                  STORE
                            STORE                  ACCEPTED
                                                                     VALUE
                          RESPONSES                   OR
                           IF QUORUM               REJECTED            NO
                          MET, CANCEL
                             TIMEOUT
       STORE




                                    ...
       VALUE               LEARN
      OUT OF
      ORDER
       MSG
     HANDLING
                                         other
      DELIVER       A VALUE IS          Learner
     ALL VALID       MISSING

  ATOMIC BC
                        LEARN TIMEOUT
    WE STILL
                                                                                 17
                        LEARN TIMEOUT
     DON'T
     KNOW
TIMEOUT




Multi-Paxos (happy path)                        PROMISE



                                                ACCEPT




         ...
                                                           MATCHES
                                              TIMEOUT
                                                           PROMISE?

                       CHECK ,                             STORE
                       STORE                    ACCEPTED
                                                           VALUE
                     RESPONSES                        OR
                     IF QUORUM                  REJECTED    NO
                    MET, CANCEL
                       TIMEOUT
      STORE
      VALUE          LEARN
     OUT OF
     ORDER
      MSG
    HANDLING
                                     other
     DELIVER    A VALUE IS          Learner
    ALL VALID    MISSING

 ATOMIC BC
                 LEARN TIMEOUT
   WE STILL        LEARN TIMEOUT
    DON'T
    KNOW            LEARN REQ
                 LEARN TIMEOUT

                                               HAVE
                        LEARN
                                              VALUE
                             OR
                       LEARN FAIL         DON'T
                                          KNOW




                                                                      18
Acceptor State Machine




                         19
Heartbeat State Machine




                          20
Implementation (III)
๏HA Implementation uses state machines as
   infrastructure

๏notifications via listeners
๏piggyback heartbeat on messages
๏master election
  • (all - failed) have to agree

  • Paxos BC needs quorum of total     21
Multi-Paxos
๏everything is a State Machines
  •   use timeouts for reliability

  • handle failing messages

  • decouple network and time
      ‣for testability
  •   listeners interact on messages with
        outside world, sync or async        22
Unit-Testing

•   Mock Time
    ‣fast running tests despite timeouts
•   Mock Network
    ‣simulate delays, failing messages




                                           23
Unit-Test-Example




                    24
Setup   •Config

        • Video

        • Auto-Setup Script (Demo)




                                     25
Thank You - Questions?



                         26

More Related Content

More from jexp

GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxjexp
 
Neo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesNeo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesjexp
 
How Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the DotsHow Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the Dotsjexp
 
The Home Office. Does it really work?
The Home Office. Does it really work?The Home Office. Does it really work?
The Home Office. Does it really work?jexp
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVMjexp
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Editionjexp
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metricsjexp
 
Spring Data Neo4j Intro SpringOne 2012
Spring Data Neo4j Intro SpringOne 2012Spring Data Neo4j Intro SpringOne 2012
Spring Data Neo4j Intro SpringOne 2012jexp
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypherjexp
 
Geekout publish
Geekout publishGeekout publish
Geekout publishjexp
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
Neo4j & (J) Ruby Presentation JRubyConf.EU
Neo4j & (J) Ruby Presentation JRubyConf.EUNeo4j & (J) Ruby Presentation JRubyConf.EU
Neo4j & (J) Ruby Presentation JRubyConf.EUjexp
 
Intro to Spring Data Neo4j
Intro to Spring Data Neo4jIntro to Spring Data Neo4j
Intro to Spring Data Neo4jjexp
 

More from jexp (20)

GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Neo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesNeo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFiles
 
How Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the DotsHow Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the Dots
 
The Home Office. Does it really work?
The Home Office. Does it really work?The Home Office. Does it really work?
The Home Office. Does it really work?
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVM
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Edition
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metrics
 
Spring Data Neo4j Intro SpringOne 2012
Spring Data Neo4j Intro SpringOne 2012Spring Data Neo4j Intro SpringOne 2012
Spring Data Neo4j Intro SpringOne 2012
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Geekout publish
Geekout publishGeekout publish
Geekout publish
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Neo4j & (J) Ruby Presentation JRubyConf.EU
Neo4j & (J) Ruby Presentation JRubyConf.EUNeo4j & (J) Ruby Presentation JRubyConf.EU
Neo4j & (J) Ruby Presentation JRubyConf.EU
 
Intro to Spring Data Neo4j
Intro to Spring Data Neo4jIntro to Spring Data Neo4j
Intro to Spring Data Neo4j
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

New Neo4j Auto HA Cluster

  • 1. Neo4j High Availability New Auto-Cluster Michael Hunger - @mesirii 1
  • 2. High Availability Cluster ๏Neo4j Enterprise ๏Master-Slave Replication ๏read-scaling and fault-tolerance ๏eventual consistency • write to master (push_factor) • write to slaves 2
  • 3. 3 Separate Concerns (I) ๏Cluster Management • Members join/leave/heartbeat ๏Failover • Master Election • Distribution of Master-Status 3
  • 4. 3 Separate Concerns (II) ๏Replication •synchronized id-generation • distributed locks • pull, push of transactions • initial store synchronization 4
  • 5. Pre 1.9 - Zookeeper 5
  • 6. Pre 1.9 ๏Apache Zookeeper took care of concerns • Cluster Management ‣new members register with ZK • Failover ‣ZK stores Master and last TX-Id ‣ZK uses ZAB to determine new Master and distribute information 6
  • 7. HA Cluster Coordinator RO- Coordinator Slave Master Slave Slave Coordinator 7
  • 8. Pre 1.9 - Problems ๏Additional setup and operations of a separate component ๏unreliable operation / hiccups ๏longterm stability ๏no dynamic reconfig of the ZK cluster important for cloud setup 8
  • 9. Post 1.9 - Neo4j Auto Cluster 9
  • 10. Replace Zookeeper!? ๏Implement Multi-Paxos ourselves ๏simple, testable code ๏only covers • cluster management, • master election 10
  • 12. What is Paxos? ๏reliable consensus making ๏broadcasting ๏works even with unreliable communication •message lost • delays, invalid order ๏does not guarantee progress 12
  • 14. Implementation ๏everything is a State Machines • SM = stateless enums + context • Message = type enum + payload • State = enum instance • switch on msg-type, implement logic Transition = handle() messages, 14
  • 15. Implementation (II) ๏everything is a State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 15
  • 16. Implementation (II) ๏Paxos (3 roles) Acceptor • Proposer-SM Paxos • Acceptor-SM Proposer Learner • Learner-SM ClusterState ๏Cluster • Heartbeat Heartbeat 16
  • 17. Multi-Paxos (happy path) Acceptor Learner Proposer (2 * f + 1) PREPARE PREPARE TIMEOUT VALUE PROMISE MATCH OR REJECT NO MATCH ACCEPT MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE ... VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL 17 LEARN TIMEOUT DON'T KNOW
  • 18. TIMEOUT Multi-Paxos (happy path) PROMISE ACCEPT ... MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL LEARN TIMEOUT DON'T KNOW LEARN REQ LEARN TIMEOUT HAVE LEARN VALUE OR LEARN FAIL DON'T KNOW 18
  • 21. Implementation (III) ๏HA Implementation uses state machines as infrastructure ๏notifications via listeners ๏piggyback heartbeat on messages ๏master election • (all - failed) have to agree • Paxos BC needs quorum of total 21
  • 22. Multi-Paxos ๏everything is a State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 22
  • 23. Unit-Testing • Mock Time ‣fast running tests despite timeouts • Mock Network ‣simulate delays, failing messages 23
  • 25. Setup •Config • Video • Auto-Setup Script (Demo) 25
  • 26. Thank You - Questions? 26