In this talk, Michael Hunger is going to shed some light over the new High Availability architecture for the popular Neo4j Graph Database. We are going to look at the different variants of the Paxos protocol, master failover strategies and cluster management state handling. This piece of infrastructure poses non-trivial challenges to distributed consensus-finding, an interesting session for anyone into scalable systems.
6. Pre 1.9
๏Apache Zookeeper took care of concerns
• Cluster Management
‣new members register with ZK
• Failover
‣ZK stores Master and last TX-Id
‣ZK uses ZAB to determine new Master
and distribute information
6
8. Pre 1.9 - Problems
๏Additional setup and operations of a separate
component
๏unreliable operation / hiccups
๏longterm stability
๏no dynamic reconfig of the ZK cluster
important for cloud setup
8
12. What is Paxos?
๏reliable consensus making
๏broadcasting
๏works even with unreliable communication
•message lost
• delays, invalid order
๏does not guarantee progress
12
14. Implementation
๏everything is a State Machines
• SM = stateless enums + context
• Message = type enum + payload
• State = enum instance
• switch on msg-type, implement logic
Transition = handle() messages,
14
15. Implementation (II)
๏everything is a State Machines
• use timeouts for reliability
• handle failing messages
• decouple network and time
‣for testability
• listeners interact on messages with
outside world, sync or async 15
17. Multi-Paxos (happy path)
Acceptor
Learner Proposer
(2 * f + 1)
PREPARE
PREPARE
TIMEOUT
VALUE
PROMISE MATCH
OR
REJECT NO MATCH
ACCEPT
MATCHES
TIMEOUT
PROMISE?
CHECK , STORE
STORE ACCEPTED
VALUE
RESPONSES OR
IF QUORUM REJECTED NO
MET, CANCEL
TIMEOUT
STORE
...
VALUE LEARN
OUT OF
ORDER
MSG
HANDLING
other
DELIVER A VALUE IS Learner
ALL VALID MISSING
ATOMIC BC
LEARN TIMEOUT
WE STILL
17
LEARN TIMEOUT
DON'T
KNOW
18. TIMEOUT
Multi-Paxos (happy path) PROMISE
ACCEPT
...
MATCHES
TIMEOUT
PROMISE?
CHECK , STORE
STORE ACCEPTED
VALUE
RESPONSES OR
IF QUORUM REJECTED NO
MET, CANCEL
TIMEOUT
STORE
VALUE LEARN
OUT OF
ORDER
MSG
HANDLING
other
DELIVER A VALUE IS Learner
ALL VALID MISSING
ATOMIC BC
LEARN TIMEOUT
WE STILL LEARN TIMEOUT
DON'T
KNOW LEARN REQ
LEARN TIMEOUT
HAVE
LEARN
VALUE
OR
LEARN FAIL DON'T
KNOW
18
21. Implementation (III)
๏HA Implementation uses state machines as
infrastructure
๏notifications via listeners
๏piggyback heartbeat on messages
๏master election
• (all - failed) have to agree
• Paxos BC needs quorum of total 21
22. Multi-Paxos
๏everything is a State Machines
• use timeouts for reliability
• handle failing messages
• decouple network and time
‣for testability
• listeners interact on messages with
outside world, sync or async 22