New Neo4j Auto HA Cluster
Upcoming SlideShare
Loading in...5
×
 

New Neo4j Auto HA Cluster

on

  • 3,035 Views

In this talk, Michael Hunger is going to shed some light over the new High Availability architecture for the popular Neo4j Graph Database. We are going to look at the different variants of the Paxos ...

In this talk, Michael Hunger is going to shed some light over the new High Availability architecture for the popular Neo4j Graph Database. We are going to look at the different variants of the Paxos protocol, master failover strategies and cluster management state handling. This piece of infrastructure poses non-trivial challenges to distributed consensus-finding, an interesting session for anyone into scalable systems.

Statistiken

Views

Gesamtviews
3,035
Views auf SlideShare
3,015
Views einbetten
20

Actions

Gefällt mir
4
Downloads
17
Kommentare
0

2 Einbettungen 20

https://twitter.com 18
http://www.docshut.com 2

Zugänglichkeit

Kategorien

Details hochladen

Uploaded via as Adobe PDF

Benutzerrechte

© Alle Rechte vorbehalten

Report content

Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Ihre Nachricht erscheint hier
    Processing...
Kommentar posten
Kommentar bearbeiten

    New Neo4j Auto HA Cluster New Neo4j Auto HA Cluster Presentation Transcript

    • Neo4j High Availability New Auto-ClusterMichael Hunger - @mesirii 1
    • High Availability Cluster ๏Neo4j Enterprise ๏Master-Slave Replication ๏read-scaling and fault-tolerance ๏eventual consistency • write to master (push_factor) • write to slaves 2
    • 3 Separate Concerns (I)๏Cluster Management • Members join/leave/heartbeat๏Failover • Master Election • Distribution of Master-Status 3
    • 3 Separate Concerns (II)๏Replication •synchronized id-generation • distributed locks • pull, push of transactions • initial store synchronization 4
    • Pre 1.9 - Zookeeper 5
    • Pre 1.9๏Apache Zookeeper took care of concerns • Cluster Management ‣new members register with ZK • Failover ‣ZK stores Master and last TX-Id ‣ZK uses ZAB to determine new Master and distribute information 6
    • HA ClusterCoordinator RO- Coordinator Slave Master Slave Slave Coordinator 7
    • Pre 1.9 - Problems๏Additional setup and operations of a separate component๏unreliable operation / hiccups๏longterm stability๏no dynamic reconfig of the ZK cluster important for cloud setup 8
    • Post 1.9 -Neo4j Auto Cluster 9
    • Replace Zookeeper!?๏Implement Multi-Paxos ourselves๏simple, testable code๏only covers • cluster management, • master election 10
    • HA Cluster 11
    • What is Paxos?๏reliable consensus making๏broadcasting๏works even with unreliable communication •message lost • delays, invalid order๏does not guarantee progress 12
    • What is Paxos? 13
    • Implementation๏everything is a State Machines • SM = stateless enums + context • Message = type enum + payload • State = enum instance • switch on msg-type, implement logic Transition = handle() messages, 14
    • Implementation (II)๏everything is a State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 15
    • Implementation (II)๏Paxos (3 roles) Acceptor • Proposer-SM Paxos • Acceptor-SM Proposer Learner • Learner-SM ClusterState๏Cluster • Heartbeat Heartbeat 16
    • Multi-Paxos (happy path) Acceptor Learner Proposer (2 * f + 1) PREPARE PREPARE TIMEOUT VALUE PROMISE MATCH OR REJECT NO MATCH ACCEPT MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE ... VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL 17 LEARN TIMEOUT DONT KNOW
    • TIMEOUTMulti-Paxos (happy path) PROMISE ACCEPT ... MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL LEARN TIMEOUT DONT KNOW LEARN REQ LEARN TIMEOUT HAVE LEARN VALUE OR LEARN FAIL DONT KNOW 18
    • Acceptor State Machine 19
    • Heartbeat State Machine 20
    • Implementation (III)๏HA Implementation uses state machines as infrastructure๏notifications via listeners๏piggyback heartbeat on messages๏master election • (all - failed) have to agree • Paxos BC needs quorum of total 21
    • Multi-Paxos๏everything is a State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 22
    • Unit-Testing• Mock Time ‣fast running tests despite timeouts• Mock Network ‣simulate delays, failing messages 23
    • Unit-Test-Example 24
    • Setup •Config • Video • Auto-Setup Script (Demo) 25
    • Thank You - Questions? 26