SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
PAXOS (AND A LITTLE ABOUT OTHER CONSENSUSES)
//RUSLAN SHEVCHENKO
Leslie Lamport
The Part-Time Parlament
PAXOS PAPERS
1. Leslie Lamport. The Part-Time Parliament.
ACM Transactions on Computer Systems. 1990 — rejected
resubmitted in 1998 — published
https://www.microsoft.com/en-us/research/publication/part-time-parliament/
2. Leslie Lamport. Paxos Made Simple
ACM SIGACT News (Distributed Computing Column) 32, 4
(Whole Number 121, December 2001) 51-58.
https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
3. Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone
Paxos Made Live - an Engineering Perspective.
ACM, Symposium on Principles of Distributed Computing. 2007
https://ai.google/research/pubs/pub33002
NON-PAXOS PAPERS
// will talk a little, details can be a theme for another PWL.
1. Diego Ongaro and John Ousterhout
In Search of an Understandable Consensus Algorithm
2014 USENIX Annual Technical Conference.
https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf
https://raft.github.io/
2. Miguel Castro and Barbara Liskov
Practical Byzantine Fault Tolerance
Third Symposium on Operating Systems Design and Implementation,
New Orleans, USA, February 1999
http://pmg.csail.mit.edu/papers/osdi99.pdf
https://github.com/luckydonald/pbft
3. Leslie Lamport.
Byzantizing Paxos by Refinement.
Distributed Computing: 25th International Symposium: DISC 2011, David Peleg, editor.  Springer-
Verlag (2011) 211-224.
http://lamport.azurewebsites.net/pubs/web-byzpaxos.pdf
DISTRIBUTED SYSTEMS: FUNDAMENTALS
• N nodes.
• Can send messages via network
• Send(m,i,j)
• Receive: j => (m,i)
NETWORK
ASYNCHRONOUS
ASYNCHRONOUS
WITH FAILURE DETECTOR
PARTIALLY
SYNCHRONOUS
SYNCHRONOUS
We don’t know: is it delivered
know, when node or channel is down
∃δt : ti(send(ping, i, j)) − ti(receive(pong, i)) < δt
i j
ping/pong timeout
send(ping,i,j)
receive(ping,i,j)
send(pong,j,i)
receive(pong,j,i)
universal time.
/∃
⋆
†
CONSENSUS PROBLEM
DISTRIBUTED SYSTEM: FUNDAMENTALS
∃c : ∀i, j ∈ N : result(pi) = c
pi - program on node i.
ASYNCHRONOUS CONSENSUS IS IMPOSSIBLE
DISTRIBUTED SYSTEMS: FUNDAMENTALS
• Fischer, Lynch and Patterson.  ‘Impossibility of Distributed
Consensus with One Faulty Process’, J. ACM 32, 2 (April 1985),
374-382. http://cs-www.cs.yale.edu/homes/arvind/cs425/doc/
fischer.pdf
• Proof Idea:
≈
PART-TIME PARLIAMENT
PAXOS
• Archeologist, describing voting system.
• Parlament ‘work from home/travel’
• Messages are sent by part-time courier
• Any member at any time can fire issue
• When the majority of votes is collected,
the decision had to be done
• members can be non-reliable or reply with long delay
PART-TIME PARLIAMENT
PACTS
Roles:
•Proposer,
•Acceptor
•Learner
On practice, merged in one.
Messages:
• prepare
• promise
• accept
• accepted
• optional, implicit
prepare
promise
accept
accepted accepted
Proposer Acceptor Learner
commit
PART-TIME PARLIAMENT
PACTS
Roles:
•Proposer,
•Acceptor
•Learner
On practice, merged in one.
Messages:
• prepare
• promise
• accept
• accepted
• optional, implicit
prepare
promise
accept
accepted accepted
Proposer Acceptors Learner
commit
Quorum. |i ∈ Q| > N/2
FORMALIZATION
PACTOS
• Nodes: , each node have numberni ∈ N
prepare(nproposer : N, issue : I, round : N)
promise(nacceptor : N, issue : I, round : N, value : Option[V])
accept(nproposer : N, issue : I, round : N, value : V)
accepted(nacceptor : N, issue : I, round : N, value : V)
issue ∈ I
value ∈ V
ballot(message) = (message . value, message . round)
acceptNack(nacceptor : N, issue : I, round : N)
Optional
PROPOSER
PAXOS
State: myProposal: Option[Value], round:Nat,
round <- myN
broadcast(Prepare(me,issue,round))
receive{
case p:Promise if checkRound(p) =>
myProposal=best(myProposal, ballot(p))
if (prepare-quorum-ready) {
value = myProposal.getOrElse(random)
broadcast Accept(me,issue,round,value)
}
if (quorum-failed) {
fail-round
}
case m: Accepted if (checkRound(m)) =>
if (accepted-quorum-ready(m)){
//(local Commit(myProposal,round))
finish(myProposal)
} else if (quorum-failed) {
failRound
}
case m:Commit => finish(m.value)
}
prepare(nproposer : N, issue : I, round : N)
promise(nacceptor : N, issue : I, round : N, value : Optio
accept(nproposer : N, issue : I, round : N, value : V)
accepted(nacceptor : N, issue : I, round : N, value : V)
commit(nproposer : N, issue : I, round : N, value : V)
def fail-round() = {
round = ((round/N)+1)*N + myN
myPropsal = None
wait-some-time
restart()
}
ACCEPTOR
PAXOS
State: myValue: Option[Value], round:N,
receive{
case p:Prepare =>
r = if (myValue.isEmpty) p.round
else this.round
reply Promise(me,p.issue,r,myValue)
case m: Accept if (p.round > round) =>
myValue match {
case None =>
myValue = p.value
round = p.round
reply Accepted(me,m.issue,round,m.value)
case Some(v) =>
if (m.value == v) {
reply Accepted(me,m.issue,round,m.value)
} else {
reply AcceptedNack
}
case m:Commit => finish(m.value)
}
Accepted(v, r) ⇒ ∀k > r : Promise(v, k)
LEARNER
PAXOS
State: myValue: Option[Value], round:N,
receive{
case m:Accepted =>
if (checkQuorum(m)) {
learn(m.issue,m.value)
// generate local commit(m)
}
}
// On practice, Proposal, Acceptor, Learner — the same.
LET WE HAVE 3 NODES, ALL PLAY BOTH ROLES
PAXOS - RUN
A B C
prepare(A,0)
prepare(A,0)
promise(B,0,None) promise(C,0,None)
prepare(A,0)
promise(A,0,None)
accept(A,0,x)
accept(A,0,x)
accept(A,0,x)
accepted(B,0,x)
accepted(C,0,x)
accepted(A,0,x)
Happy path
LET WE HAVE 3 NODES, ALL PLAY BOTH ROLES
PAXOS - RUN
A B C
prepare(A,0)
promise(B,0,None) promise(C,0,None)
prepare(A,0)
promise(A,0,None)
accept(A,0,x)
accept(A,0,x)
accept(A,0,x)
accepted(B,0,x)
accepted(C,0,x)
A fail,
B initiate next round
restart
prepare(A,0)
prepare(A,0)
prepare(B,1)
promise(B,1,None)
promise(C,1,x)
promise(B,1,x)
accept(B,1,x)
accept(B,1,x)
PROOFS
PROPERTIES
• Safety:
• Only a value that has been proposed may be chosen.
• Only a single value is chosen. (non-trivial)
• Only chosen values are learned.
• Liveness impossible (due to FLP), eventual progress instead:
• Proposal will learn previous highest number
ONLY A SINGLE VALUE HAS CHOSEN
PAXOS SAFETY
∀n ∈ N : Accepted(n, round, v) → ∀k > round : Accepted(n, k, v)
Chosen(v, r) → ∀k > r : Chosen(v, k)
∃Q ∈ Qourm(N) : ∀n ∈ Q : Accepted(n, r, v) → Chosen(v, r)
Chosen(v1,r) & Chosen(v2,k) → Chosen(v1,max(r, k)) ∧ Chosen(v2,max(r, k))
Let we have 2 values chosen:
∃Q1 ∈ Quorum(N), Q2 ∈ Quorum(N) : ∀n1 ∈ Q1 ∀n2inQ2 :
Accepted(n1, v1, r) ∧ Accepted(n2, v2, r)
v1 ≠ v2 → Q1 ∩ Q2 = ∅
But any two Quorums should be intersect
Q1 ∈ Qourum(N), Q2 ∈ Qourum(N) → (Q1 ∧ Q2 ≠ ∅) !Contradiction!
ONLY A SINGLE VALUE HAS CHOSEN
PAXOS SAFETY
∀n ∈ N : Accepted(n, round, v) → ∀k > round : Accepted(n, k, v)
Chosen(v, r) → ∀k > r : Chosen(v, k)
∃Q ∈ Qourm(N) : ∀n ∈ Q : Accepted(n, r, v) → Chosen(v, r)
Chosen(v1,r) & Chosen(v2,k) → Chosen(v1,max(r, k)) ∧ Chosen(v2,max(r, k))
Let we have 2 values chosen:
∃Q1 ∈ Quorum(N), Q2 ∈ Quorum(N) : ∀n1 ∈ Q1 ∀n2inQ2 :
Accepted(n1, v1, r) ∧ Accepted(n2, v2, r)
v1 ≠ v2 → Q1 ∩ Q2 = ∅
But any two Quorums should be intersect
Q1 ∈ Qourum(N), Q2 ∈ Qourum(N) → (Q1 ∧ Q2 ≠ ∅) !Contradiction!
PROGRESS
PACTS
fail(n, r) → ∃n ∈ N, r1 > r : Prepare(n, r1)
// scheduling run on randomized timeout
Modifications:
Multi-Paxos: same leader process events until failure
not need in Prepare, until we have no failures.
CHUBBY — GOOGLE LOCK SERVICE
PACT-OS IMPLEMENTATION
3. Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone
Paxos Made Live - an Engineering Perspective.
ACM, Symposium on Principles of Distributed Computing. 2007
https://ai.google/research/pubs/pub33002
2 pages pseudocode => k* 10^2 LOC in C++.
Code generated from own specification language
+ Determenistics variant for testing.
MultiPaxos + optimizations
- lagging.
Performance superior to 3DB
Testing fail-over solutions is hard.
IN SEARCH FOR UNDESTANDABLE CONSENSUS ALGORITHM
(2014)
RAFT
• MultiPaxos usually used as part of state-machine replication.
• One writer (leader), do something and send to followers. When leader
failed, new leader is elected.
• RAFT - state machine replication, simpler (in theory) than paxos with
similar performance characteristics.
STATE MACHINE REPLICATION
RAFT
• https://raft.github.io/
• Instead
• Can be viewed as ‘refactoring’ of PaxOS: merging roles into one,
• use heartbeat for failure detection
• on leader failure, node become a candidate and request other node
to vote (choose one) as leader.
BYZANTINE FAULT TOLERANCE
BPFT
• Replicated state machine
• Some nodes can lie (f):
• Client submit item to leader
• Gather f+1 replies.
• If all replies
• are the same: all ok
• otherwise - resend.
leaderclient followers
ei
r
e
reply(e1
r )
reply(ef+1
r )
3-stage
protocol.
BYZANTINE FAULT TOLERANCE
BPFT
• Replicated state machine
• Some nodes can lie (f):
• Client submit item to leader
• Gather f+1 replies.
• If all replies
• are the same: all ok
• otherwise - resend.
leaderclient followers
pre − prepare(digest(ei
r), seq(ei
r))
e
reply(e1
r )
reply(ef+1
r )
3-stage
protocol.
(assigned − seq(e))
prepare(ei
r, seq(ei
r))
prepared(e, seq)
commit
commited
LESLI LAMPORT, 2011
BYZANTIZING PAXOS BY REFINEMENT
• BPCon — Byzantine fault tolerance (prev. slides)
• PCon — Paxos
• Byzantine: Algorithm(N) => Algorithm(N+f)
• Byzantine(PCon) = BPCon N > 3f
f+1 round (in worst case)
Idea: S → M ⇒ BzQourum(S) → M
S → M ⇒ BzQourum(S) → Proof(BzQourm(S), M)
PAPERS WE LOVE, KYIV, 27.08.2018
THANKS FOR LISTENING
//The Story, told by Ruslan Shevchenko.
ruslan@shevchenko.kiev.ua
@rssh1

Weitere ähnliche Inhalte

Was ist angesagt?

Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
Collections3b datatransferstreams
Collections3b datatransferstreamsCollections3b datatransferstreams
Collections3b datatransferstreamsDilip Krishna
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherAmirul Wiramuda
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeWuBinbo
 
CiE 2010 talk
CiE 2010 talkCiE 2010 talk
CiE 2010 talkilyaraz
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Design limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLLDesign limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLLIDES Editor
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 

Was ist angesagt? (15)

Data science
Data scienceData science
Data science
 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
Coding matlab
Coding matlabCoding matlab
Coding matlab
 
1 tracking systems1
1 tracking systems11 tracking systems1
1 tracking systems1
 
Collections3b datatransferstreams
Collections3b datatransferstreamsCollections3b datatransferstreams
Collections3b datatransferstreams
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
Ph.D. Defense
Ph.D. DefensePh.D. Defense
Ph.D. Defense
 
CiE 2010 talk
CiE 2010 talkCiE 2010 talk
CiE 2010 talk
 
Advances in Directed Spanners
Advances in Directed SpannersAdvances in Directed Spanners
Advances in Directed Spanners
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Design limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLLDesign limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLL
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
Deep parking
Deep parkingDeep parking
Deep parking
 

Ähnlich wie Papers We Love / Kyiv : PAXOS (and little about other consensuses )

Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresSusan Potter
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
Developing a Multiplayer RTS with the Unreal Engine 3
Developing a Multiplayer RTS with the Unreal Engine 3Developing a Multiplayer RTS with the Unreal Engine 3
Developing a Multiplayer RTS with the Unreal Engine 3Nick Pruehs
 
Robust and Tuneable Family of Gossiping Algorithms
Robust and Tuneable Family of Gossiping AlgorithmsRobust and Tuneable Family of Gossiping Algorithms
Robust and Tuneable Family of Gossiping AlgorithmsVincenzo De Florio
 
Unit 3- Greedy Method.pptx
Unit 3- Greedy Method.pptxUnit 3- Greedy Method.pptx
Unit 3- Greedy Method.pptxMaryJacob24
 
4.3 real time game physics
4.3 real time game physics4.3 real time game physics
4.3 real time game physicsSayed Ahmed
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...NoSQLmatters
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Dsoop (co 221) 1
Dsoop (co 221) 1Dsoop (co 221) 1
Dsoop (co 221) 1Puja Koch
 
Pwl rewal-slideshare
Pwl rewal-slidesharePwl rewal-slideshare
Pwl rewal-slidesharepalvaro
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverPraveen Narayanan
 
Consensus in distributed computing
Consensus in distributed computingConsensus in distributed computing
Consensus in distributed computingRuben Tan
 
Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Vissarion Fisikopoulos
 
Shor’s algorithm the ppt
Shor’s algorithm the pptShor’s algorithm the ppt
Shor’s algorithm the pptMrinal Mondal
 
Functional Operations - Susan Potter
Functional Operations - Susan PotterFunctional Operations - Susan Potter
Functional Operations - Susan Potterdistributed matters
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in EngineeringPrince Jain
 

Ähnlich wie Papers We Love / Kyiv : PAXOS (and little about other consensuses ) (20)

Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Developing a Multiplayer RTS with the Unreal Engine 3
Developing a Multiplayer RTS with the Unreal Engine 3Developing a Multiplayer RTS with the Unreal Engine 3
Developing a Multiplayer RTS with the Unreal Engine 3
 
Robust and Tuneable Family of Gossiping Algorithms
Robust and Tuneable Family of Gossiping AlgorithmsRobust and Tuneable Family of Gossiping Algorithms
Robust and Tuneable Family of Gossiping Algorithms
 
Unit 3- Greedy Method.pptx
Unit 3- Greedy Method.pptxUnit 3- Greedy Method.pptx
Unit 3- Greedy Method.pptx
 
LSH
LSHLSH
LSH
 
4.3 real time game physics
4.3 real time game physics4.3 real time game physics
4.3 real time game physics
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Dsoop (co 221) 1
Dsoop (co 221) 1Dsoop (co 221) 1
Dsoop (co 221) 1
 
Pwl rewal-slideshare
Pwl rewal-slidesharePwl rewal-slideshare
Pwl rewal-slideshare
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
 
Consensus in distributed computing
Consensus in distributed computingConsensus in distributed computing
Consensus in distributed computing
 
Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.
 
Shor’s algorithm the ppt
Shor’s algorithm the pptShor’s algorithm the ppt
Shor’s algorithm the ppt
 
Functional Operations - Susan Potter
Functional Operations - Susan PotterFunctional Operations - Susan Potter
Functional Operations - Susan Potter
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in Engineering
 
Dynomite Nosql
Dynomite NosqlDynomite Nosql
Dynomite Nosql
 

Mehr von Ruslan Shevchenko

Embedding Generic Monadic Transformer into Scala. [Tfp2022]
Embedding Generic Monadic Transformer into Scala. [Tfp2022]Embedding Generic Monadic Transformer into Scala. [Tfp2022]
Embedding Generic Monadic Transformer into Scala. [Tfp2022]Ruslan Shevchenko
 
Scala / Technology evolution
Scala  / Technology evolutionScala  / Technology evolution
Scala / Technology evolutionRuslan Shevchenko
 
{co/contr} variance from LSP
{co/contr} variance  from LSP{co/contr} variance  from LSP
{co/contr} variance from LSPRuslan Shevchenko
 
Scala-Gopher: CSP-style programming techniques with idiomatic Scala.
Scala-Gopher: CSP-style programming techniques with idiomatic Scala.Scala-Gopher: CSP-style programming techniques with idiomatic Scala.
Scala-Gopher: CSP-style programming techniques with idiomatic Scala.Ruslan Shevchenko
 
SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.Ruslan Shevchenko
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scalaRuslan Shevchenko
 
Why scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with thisWhy scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with thisRuslan Shevchenko
 
Java & low latency applications
Java & low latency applicationsJava & low latency applications
Java & low latency applicationsRuslan Shevchenko
 
Jslab rssh: JS as language platform
Jslab rssh:  JS as language platformJslab rssh:  JS as language platform
Jslab rssh: JS as language platformRuslan Shevchenko
 
Behind OOD: domain modelling in post-OO world.
Behind OOD:  domain modelling in post-OO world.Behind OOD:  domain modelling in post-OO world.
Behind OOD: domain modelling in post-OO world.Ruslan Shevchenko
 
scala-gopher: async implementation of CSP for scala
scala-gopher:  async implementation of CSP  for  scalascala-gopher:  async implementation of CSP  for  scala
scala-gopher: async implementation of CSP for scalaRuslan Shevchenko
 
Programming Languages: some news for the last N years
Programming Languages: some news for the last N yearsProgramming Languages: some news for the last N years
Programming Languages: some news for the last N yearsRuslan Shevchenko
 
JDays Lviv 2014: Java8 vs Scala: Difference points & innovation stream
JDays Lviv 2014:  Java8 vs Scala:  Difference points & innovation streamJDays Lviv 2014:  Java8 vs Scala:  Difference points & innovation stream
JDays Lviv 2014: Java8 vs Scala: Difference points & innovation streamRuslan Shevchenko
 

Mehr von Ruslan Shevchenko (20)

Embedding Generic Monadic Transformer into Scala. [Tfp2022]
Embedding Generic Monadic Transformer into Scala. [Tfp2022]Embedding Generic Monadic Transformer into Scala. [Tfp2022]
Embedding Generic Monadic Transformer into Scala. [Tfp2022]
 
Svitla talks 2021_03_25
Svitla talks 2021_03_25Svitla talks 2021_03_25
Svitla talks 2021_03_25
 
Akka / Lts behavior
Akka / Lts behaviorAkka / Lts behavior
Akka / Lts behavior
 
Scala / Technology evolution
Scala  / Technology evolutionScala  / Technology evolution
Scala / Technology evolution
 
{co/contr} variance from LSP
{co/contr} variance  from LSP{co/contr} variance  from LSP
{co/contr} variance from LSP
 
N flavors of streaming
N flavors of streamingN flavors of streaming
N flavors of streaming
 
Scala-Gopher: CSP-style programming techniques with idiomatic Scala.
Scala-Gopher: CSP-style programming techniques with idiomatic Scala.Scala-Gopher: CSP-style programming techniques with idiomatic Scala.
Scala-Gopher: CSP-style programming techniques with idiomatic Scala.
 
SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scala
 
Why scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with thisWhy scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with this
 
Scala jargon cheatsheet
Scala jargon cheatsheetScala jargon cheatsheet
Scala jargon cheatsheet
 
Java & low latency applications
Java & low latency applicationsJava & low latency applications
Java & low latency applications
 
Csp scala wixmeetup2016
Csp scala wixmeetup2016Csp scala wixmeetup2016
Csp scala wixmeetup2016
 
IDLs
IDLsIDLs
IDLs
 
R ext world/ useR! Kiev
R ext world/ useR!  KievR ext world/ useR!  Kiev
R ext world/ useR! Kiev
 
Jslab rssh: JS as language platform
Jslab rssh:  JS as language platformJslab rssh:  JS as language platform
Jslab rssh: JS as language platform
 
Behind OOD: domain modelling in post-OO world.
Behind OOD:  domain modelling in post-OO world.Behind OOD:  domain modelling in post-OO world.
Behind OOD: domain modelling in post-OO world.
 
scala-gopher: async implementation of CSP for scala
scala-gopher:  async implementation of CSP  for  scalascala-gopher:  async implementation of CSP  for  scala
scala-gopher: async implementation of CSP for scala
 
Programming Languages: some news for the last N years
Programming Languages: some news for the last N yearsProgramming Languages: some news for the last N years
Programming Languages: some news for the last N years
 
JDays Lviv 2014: Java8 vs Scala: Difference points & innovation stream
JDays Lviv 2014:  Java8 vs Scala:  Difference points & innovation streamJDays Lviv 2014:  Java8 vs Scala:  Difference points & innovation stream
JDays Lviv 2014: Java8 vs Scala: Difference points & innovation stream
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Papers We Love / Kyiv : PAXOS (and little about other consensuses )

  • 1. PAXOS (AND A LITTLE ABOUT OTHER CONSENSUSES) //RUSLAN SHEVCHENKO Leslie Lamport The Part-Time Parlament
  • 2. PAXOS PAPERS 1. Leslie Lamport. The Part-Time Parliament. ACM Transactions on Computer Systems. 1990 — rejected resubmitted in 1998 — published https://www.microsoft.com/en-us/research/publication/part-time-parliament/ 2. Leslie Lamport. Paxos Made Simple ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) 51-58. https://lamport.azurewebsites.net/pubs/paxos-simple.pdf 3. Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone Paxos Made Live - an Engineering Perspective. ACM, Symposium on Principles of Distributed Computing. 2007 https://ai.google/research/pubs/pub33002
  • 3. NON-PAXOS PAPERS // will talk a little, details can be a theme for another PWL. 1. Diego Ongaro and John Ousterhout In Search of an Understandable Consensus Algorithm 2014 USENIX Annual Technical Conference. https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf https://raft.github.io/ 2. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Third Symposium on Operating Systems Design and Implementation, New Orleans, USA, February 1999 http://pmg.csail.mit.edu/papers/osdi99.pdf https://github.com/luckydonald/pbft 3. Leslie Lamport. Byzantizing Paxos by Refinement. Distributed Computing: 25th International Symposium: DISC 2011, David Peleg, editor.  Springer- Verlag (2011) 211-224. http://lamport.azurewebsites.net/pubs/web-byzpaxos.pdf
  • 4. DISTRIBUTED SYSTEMS: FUNDAMENTALS • N nodes. • Can send messages via network • Send(m,i,j) • Receive: j => (m,i) NETWORK ASYNCHRONOUS ASYNCHRONOUS WITH FAILURE DETECTOR PARTIALLY SYNCHRONOUS SYNCHRONOUS We don’t know: is it delivered know, when node or channel is down ∃δt : ti(send(ping, i, j)) − ti(receive(pong, i)) < δt i j ping/pong timeout send(ping,i,j) receive(ping,i,j) send(pong,j,i) receive(pong,j,i) universal time. /∃ ⋆ †
  • 5. CONSENSUS PROBLEM DISTRIBUTED SYSTEM: FUNDAMENTALS ∃c : ∀i, j ∈ N : result(pi) = c pi - program on node i.
  • 6. ASYNCHRONOUS CONSENSUS IS IMPOSSIBLE DISTRIBUTED SYSTEMS: FUNDAMENTALS • Fischer, Lynch and Patterson.  ‘Impossibility of Distributed Consensus with One Faulty Process’, J. ACM 32, 2 (April 1985), 374-382. http://cs-www.cs.yale.edu/homes/arvind/cs425/doc/ fischer.pdf • Proof Idea: ≈
  • 7. PART-TIME PARLIAMENT PAXOS • Archeologist, describing voting system. • Parlament ‘work from home/travel’ • Messages are sent by part-time courier • Any member at any time can fire issue • When the majority of votes is collected, the decision had to be done • members can be non-reliable or reply with long delay
  • 8. PART-TIME PARLIAMENT PACTS Roles: •Proposer, •Acceptor •Learner On practice, merged in one. Messages: • prepare • promise • accept • accepted • optional, implicit prepare promise accept accepted accepted Proposer Acceptor Learner commit
  • 9. PART-TIME PARLIAMENT PACTS Roles: •Proposer, •Acceptor •Learner On practice, merged in one. Messages: • prepare • promise • accept • accepted • optional, implicit prepare promise accept accepted accepted Proposer Acceptors Learner commit Quorum. |i ∈ Q| > N/2
  • 10. FORMALIZATION PACTOS • Nodes: , each node have numberni ∈ N prepare(nproposer : N, issue : I, round : N) promise(nacceptor : N, issue : I, round : N, value : Option[V]) accept(nproposer : N, issue : I, round : N, value : V) accepted(nacceptor : N, issue : I, round : N, value : V) issue ∈ I value ∈ V ballot(message) = (message . value, message . round) acceptNack(nacceptor : N, issue : I, round : N) Optional
  • 11. PROPOSER PAXOS State: myProposal: Option[Value], round:Nat, round <- myN broadcast(Prepare(me,issue,round)) receive{ case p:Promise if checkRound(p) => myProposal=best(myProposal, ballot(p)) if (prepare-quorum-ready) { value = myProposal.getOrElse(random) broadcast Accept(me,issue,round,value) } if (quorum-failed) { fail-round } case m: Accepted if (checkRound(m)) => if (accepted-quorum-ready(m)){ //(local Commit(myProposal,round)) finish(myProposal) } else if (quorum-failed) { failRound } case m:Commit => finish(m.value) } prepare(nproposer : N, issue : I, round : N) promise(nacceptor : N, issue : I, round : N, value : Optio accept(nproposer : N, issue : I, round : N, value : V) accepted(nacceptor : N, issue : I, round : N, value : V) commit(nproposer : N, issue : I, round : N, value : V) def fail-round() = { round = ((round/N)+1)*N + myN myPropsal = None wait-some-time restart() }
  • 12. ACCEPTOR PAXOS State: myValue: Option[Value], round:N, receive{ case p:Prepare => r = if (myValue.isEmpty) p.round else this.round reply Promise(me,p.issue,r,myValue) case m: Accept if (p.round > round) => myValue match { case None => myValue = p.value round = p.round reply Accepted(me,m.issue,round,m.value) case Some(v) => if (m.value == v) { reply Accepted(me,m.issue,round,m.value) } else { reply AcceptedNack } case m:Commit => finish(m.value) } Accepted(v, r) ⇒ ∀k > r : Promise(v, k)
  • 13. LEARNER PAXOS State: myValue: Option[Value], round:N, receive{ case m:Accepted => if (checkQuorum(m)) { learn(m.issue,m.value) // generate local commit(m) } } // On practice, Proposal, Acceptor, Learner — the same.
  • 14. LET WE HAVE 3 NODES, ALL PLAY BOTH ROLES PAXOS - RUN A B C prepare(A,0) prepare(A,0) promise(B,0,None) promise(C,0,None) prepare(A,0) promise(A,0,None) accept(A,0,x) accept(A,0,x) accept(A,0,x) accepted(B,0,x) accepted(C,0,x) accepted(A,0,x) Happy path
  • 15. LET WE HAVE 3 NODES, ALL PLAY BOTH ROLES PAXOS - RUN A B C prepare(A,0) promise(B,0,None) promise(C,0,None) prepare(A,0) promise(A,0,None) accept(A,0,x) accept(A,0,x) accept(A,0,x) accepted(B,0,x) accepted(C,0,x) A fail, B initiate next round restart prepare(A,0) prepare(A,0) prepare(B,1) promise(B,1,None) promise(C,1,x) promise(B,1,x) accept(B,1,x) accept(B,1,x)
  • 16. PROOFS PROPERTIES • Safety: • Only a value that has been proposed may be chosen. • Only a single value is chosen. (non-trivial) • Only chosen values are learned. • Liveness impossible (due to FLP), eventual progress instead: • Proposal will learn previous highest number
  • 17. ONLY A SINGLE VALUE HAS CHOSEN PAXOS SAFETY ∀n ∈ N : Accepted(n, round, v) → ∀k > round : Accepted(n, k, v) Chosen(v, r) → ∀k > r : Chosen(v, k) ∃Q ∈ Qourm(N) : ∀n ∈ Q : Accepted(n, r, v) → Chosen(v, r) Chosen(v1,r) & Chosen(v2,k) → Chosen(v1,max(r, k)) ∧ Chosen(v2,max(r, k)) Let we have 2 values chosen: ∃Q1 ∈ Quorum(N), Q2 ∈ Quorum(N) : ∀n1 ∈ Q1 ∀n2inQ2 : Accepted(n1, v1, r) ∧ Accepted(n2, v2, r) v1 ≠ v2 → Q1 ∩ Q2 = ∅ But any two Quorums should be intersect Q1 ∈ Qourum(N), Q2 ∈ Qourum(N) → (Q1 ∧ Q2 ≠ ∅) !Contradiction!
  • 18. ONLY A SINGLE VALUE HAS CHOSEN PAXOS SAFETY ∀n ∈ N : Accepted(n, round, v) → ∀k > round : Accepted(n, k, v) Chosen(v, r) → ∀k > r : Chosen(v, k) ∃Q ∈ Qourm(N) : ∀n ∈ Q : Accepted(n, r, v) → Chosen(v, r) Chosen(v1,r) & Chosen(v2,k) → Chosen(v1,max(r, k)) ∧ Chosen(v2,max(r, k)) Let we have 2 values chosen: ∃Q1 ∈ Quorum(N), Q2 ∈ Quorum(N) : ∀n1 ∈ Q1 ∀n2inQ2 : Accepted(n1, v1, r) ∧ Accepted(n2, v2, r) v1 ≠ v2 → Q1 ∩ Q2 = ∅ But any two Quorums should be intersect Q1 ∈ Qourum(N), Q2 ∈ Qourum(N) → (Q1 ∧ Q2 ≠ ∅) !Contradiction!
  • 19. PROGRESS PACTS fail(n, r) → ∃n ∈ N, r1 > r : Prepare(n, r1) // scheduling run on randomized timeout Modifications: Multi-Paxos: same leader process events until failure not need in Prepare, until we have no failures.
  • 20. CHUBBY — GOOGLE LOCK SERVICE PACT-OS IMPLEMENTATION 3. Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone Paxos Made Live - an Engineering Perspective. ACM, Symposium on Principles of Distributed Computing. 2007 https://ai.google/research/pubs/pub33002 2 pages pseudocode => k* 10^2 LOC in C++. Code generated from own specification language + Determenistics variant for testing. MultiPaxos + optimizations - lagging. Performance superior to 3DB Testing fail-over solutions is hard.
  • 21. IN SEARCH FOR UNDESTANDABLE CONSENSUS ALGORITHM (2014) RAFT • MultiPaxos usually used as part of state-machine replication. • One writer (leader), do something and send to followers. When leader failed, new leader is elected. • RAFT - state machine replication, simpler (in theory) than paxos with similar performance characteristics.
  • 22. STATE MACHINE REPLICATION RAFT • https://raft.github.io/ • Instead • Can be viewed as ‘refactoring’ of PaxOS: merging roles into one, • use heartbeat for failure detection • on leader failure, node become a candidate and request other node to vote (choose one) as leader.
  • 23. BYZANTINE FAULT TOLERANCE BPFT • Replicated state machine • Some nodes can lie (f): • Client submit item to leader • Gather f+1 replies. • If all replies • are the same: all ok • otherwise - resend. leaderclient followers ei r e reply(e1 r ) reply(ef+1 r ) 3-stage protocol.
  • 24. BYZANTINE FAULT TOLERANCE BPFT • Replicated state machine • Some nodes can lie (f): • Client submit item to leader • Gather f+1 replies. • If all replies • are the same: all ok • otherwise - resend. leaderclient followers pre − prepare(digest(ei r), seq(ei r)) e reply(e1 r ) reply(ef+1 r ) 3-stage protocol. (assigned − seq(e)) prepare(ei r, seq(ei r)) prepared(e, seq) commit commited
  • 25. LESLI LAMPORT, 2011 BYZANTIZING PAXOS BY REFINEMENT • BPCon — Byzantine fault tolerance (prev. slides) • PCon — Paxos • Byzantine: Algorithm(N) => Algorithm(N+f) • Byzantine(PCon) = BPCon N > 3f f+1 round (in worst case) Idea: S → M ⇒ BzQourum(S) → M S → M ⇒ BzQourum(S) → Proof(BzQourm(S), M)
  • 26. PAPERS WE LOVE, KYIV, 27.08.2018 THANKS FOR LISTENING //The Story, told by Ruslan Shevchenko. ruslan@shevchenko.kiev.ua @rssh1