Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks
1. Clock-RSM: Low-Latency Inter-Datacenter
State Machine Replication Using Loosely
Synchronized Physical Clocks
Jiaqing Du, Daniele Sciascia, Sameh Elnikety
Willy Zwaenepoel, Fernando Pedone
EPFL, University of Lugano, Microsoft Research
2. Replicated State Machines (RSM)
⢠Strong consistency
â Execute same commands in same order
â Reach same state from same initial state
⢠Fault tolerance
â Store data at multiple replicas
â Failure masking / fast failover
2
4. Leader-Based Protocols
⢠Order commands by a leader replica
⢠Require extra ordering messages at follower
Leader
client request client reply
Ordering
Replication
High latency for geo replication
Ordering
4
Follower
5. Clock-RSM
⢠Orders commands using physical clocks
⢠Overlaps ordering and replication
5
client request client reply
Ordering + Replication
Low latency for geo replication
10. Major Message Steps
⢠Prep: Ask everyone to log a command
⢠PrepOK: Tell everyone after logging a command
R0
R2
R1
client request
R3
R4
Prep
PrepOK
PrepOK
cmd1.ts = 24
PrepOK
PrepOK
cmd1 committed?
client request
cmd2.ts = 23
10
11. Commit Conditions
⢠A command is committed if
â Replicated by a majority
â All commands ordered before are committed
⢠Wait until three conditions hold
C1: Majority replication
C2: Stable order
C3: Prefix replication
11
12. C1: Majority Replication
⢠More than half replicas log cmd1
R0
R2
R1
client request
R3
R4
PrepOK
PrepOK
cmd1.ts = 24
Prep
Replicated by R0, R1, R2
1 RTT: between R0 and majority
12
13. C2: Stable Order
⢠Replica knows all commands ordered before cmd1
â Receives a greater timestamp from every other replica
R0
R2
R1
client request
R3
R4
24
cmd1.ts = 24
2523
25
25
25
0.5 RTT: between R0 and farthest peer
cmd1 is stable at R0
13
Prep / PrepOK / ClockTime
14. C3: Prefix Replication
⢠All commands ordered before cmd1 are replicated
by a majority
14
R0
R2
R1
client request
R3
R4
cmd1.ts = 24
cmd2 is replicated
by R1, R2, R3
cmd2.ts = 23
Prep
PrepOk
1 RTT: R4 to majority + majority to R0
client request
Prep
Prep
PrepOkPrepOk
25. Overlapping vs. Separate Steps
CA VA
IR
SG
JP
25
CA VA (L)
IR
SG
JP
Clock-RSM latency: max of three
Paxos-bcast latency: sum of three
client request
client request