SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Flexible Paxos:
Reaching agreement
without majorities
Heidi Howard, Dahlia Malkhi, Alexander Spiegelman
University of Cambridge, VMware Research, Technion
heidi.howard@cl.cam.ac.uk
@heidiann360
hh360.user.srcf.net
fpaxos.github.io
1
2
3
TL;DR Majorities are not necessary to safety reach
distributed consensus.
4
Aims
1. Understand the widely adopted Paxos algorithm
and how Flexible Paxos safely generalizes it
2. Introduce a broad spectrum of possibilities for
reaching consensus and
3. Promote the idea that Paxos is not the best point
on this spectrum for most practical systems
5
Defining Consensus
Reaching agreement in an asynchronous distributed
system in the face of crash failures.
More specifically:
• Compromising safety is never an option
• No clock synchronization is assumed
• Participants fail by crashing and messages may be
lost but neither can act arbitrarily
6
Consensus is not a solved
problem
• Consensus is not scalable - often limited to 3, 5 or 7
participants.
• Consensus is slow - each decision requires at least
majority agreement, more when failures occur.
• Consensus is not available - cannot handle majority
failures or many network partitions.
You pay a high price even if no failures occur. Consider by
many as “best avoided”
7
Back in the good old days
…
8
Single server system
Client
Client
Client
Server
9
Single server system
Client
Client
Client
Server
Append
10
Single server system
Client
Client
Client
Server
OK @ 4
11
Single server system
Client
Client
Client
ServerRead 4
12
Single server system
Client
Client
Client
Server
13
Single server system
Client
Client
Client
Server
14
Single server system
Client
Client
Client
Server
15
Distributed systems
came to the rescue
16
Replicate for availability
Client
Client
Client
Server
Server
Server
17
Replicate for availability
Client
Client
Client
Leader
Server
Server
Append
18
Replicate for availability
Client
Client
Client
Leader
Server
Server
19
Replicate for availability
Client
Client
Client
Leader
Server
Server
OK
OK
20
Replicate for availability
Client
Client
Client
Leader
Server
Server
OK
21
How many copies is
enough?
We refer to any group of nodes which is ‘enough’ as a
replication quorum
More copies,
More resilient
Fewer copies,
Faster replication
22
Now what happens when a
server fails?
23
Client
Client
Client
Leader
Server
Server
24
Client
Client
Client
Leader
Server
Server
Append
25
Client
Client
Client
Leader
Server
Server
26
Client
Client
Client
Leader
Server
Server
27
OK
OK
What happens when the
leader fails?
28
Client
Client
Client
Leader
Server
Server
29
Client
Client
Client
Leader
Server
Leader
30
Recall that we never
compromise safety
Thus the new leader has two jobs:
1. To stop the old leader. We cannot say for sure that
they have failed.
2. To learn about which log entries have already
been agreed and not overwrite them.
The order is important here!
31
How to stop past leaders?
They may come back to haunt us.
• Ask them to stop
• Use leases and wait for them to expire
• Ignore them by having nodes promise to stop
replicating for them
32
How many promises are
enough?
Leaders cannot wait forever.
We need a minimum of one node from each
replication quorum to promise not to replicate new
entries.
We refer to this as the leader election quorum.
33
Promises are tricky to safely
break
Let’s assume each leadership term has a unique term
number.
Problem: We don’t always know who the past
leaders are. We need a deterministic scheme for
breaking promises.
Solution: Each node stores the promise as “ignore
all smaller terms”.
34
Learn all committed entries
The new leader must never overwrite committed
entries.
It must learn about the committed entries before it
starts adding new entries without leaving holes.
35
Safely handle in-progress
commands
Any in-progress entries already replicated on the
leader election quorum nodes must be finished by
the new leader.
If there are conflicting in-progress entries from
different leaders, we choose the log entry with the
highest view.
36
Context
This mechanism of views and promises is widely
utilised. This core idea is commonly referred to as
Paxos.
It’s a recurring foundation in the literature:
Viewstamped Replication [PODC’88], Paxos
[TOCS’98], Zab/Zookeeper [ATC’10] and Raft
[ATC’14] and many more.
37
What’s changed?
Traditionally, it was thought that all quorums needed
to intersect, regardless of whether the quorum was
used for replication or leader election.
Majorities are the widely adopted approach to this.
We have just seen that the only requirement is that
leader election quorums intersect with replication
quorums.
38
Implications
In theory:
• Helps us to understand Paxos
• Generalisation & weakening of the requirements
• Orthogonal to any particular algorithm built upon Paxos
In practice:
• Many new quorum schemes now become feasible
• Introduces a new breed of scalable, resilient consensus
algorithms.
39
Quorums Systems
40
Majorities
Replication quorum Leader election quorum
41
Majorities - 1
Replication quorum Leader election quorum
42
Counting
Replication quorum + Leader election quorum =
Number of nodes + 1
More copies,
More resilient
Fewer copies,
Faster replication
43
Replication quorum = 2/6
Replication quorum Leader election quorum
44
Replication quorum = 3/8
Replication quorum Leader election quorum
45
Not all members were
created equal
• Machines and the networks which connect them
are heterogeneous
• Failures are not independent: members are more
likely to simultaneously fail if they are located in the
same host, rack or even datacenter
46
Groups
Replication quorum Leader election quorum
47
Grids
Replication quorum Leader election quorum
48
Any many more…
Weighed Voting
2
2
3
3
1
1
Grid Paths
Combined Schemes Hierarchy
49
Majorities are no longer
‘one size fits all’
Majorities are unlikely to be optimal for most practical
systems
• Requirements of real systems differ. Performance
really matters.
• Leader election is rare so often we optimize for the
common replication phase.
50
Summary
• The only quorum intersection requirement is that
leader election quorums must intersect with the
replication quorums.
• Majorities are not the only option for safely reaching
consensus and are often not best suited to
practical systems.
• Don’t give up on strong consistency guarantees.
Distributed consensus can be performant and
scalable.
51
Ongoing Work
• Adoption into existing production systems
• Building new algorithms offering interesting
compromises
• Practical analysis of quorum systems for
consensus
• Extending to other algorithms and different failures
models e.g. Flexible Fast Paxos and Flexible
Byzantine Paxos
52
Question?
Let’s continue the discussion:
Heidi Howard
heidi.howard@cl.cam.ac.uk
@heidiann360
53

Weitere ähnliche Inhalte

Ähnlich wie Flexible Paxos: Reaching agreement without majorities

Building Reactive Scalable Systems
Building Reactive Scalable SystemsBuilding Reactive Scalable Systems
Building Reactive Scalable SystemsKnoldus Inc.
 
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphoresVaibhav Khanna
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...javier ramirez
 
How to Build Your Blockchain Project with Chainstack
How to Build Your Blockchain Project with ChainstackHow to Build Your Blockchain Project with Chainstack
How to Build Your Blockchain Project with ChainstackChainstack
 
DDB_lec_05_Concurrency_Control.pdf
DDB_lec_05_Concurrency_Control.pdfDDB_lec_05_Concurrency_Control.pdf
DDB_lec_05_Concurrency_Control.pdfAhmedImmamImmam
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloodsRaymond Tay
 
Consensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & AnalysisConsensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & AnalysisZak Cole
 
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017Ensar Basri Kahveci
 
CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45Bilal Ahmed
 
Indirect communication is defined as communication between entities in a dist...
Indirect communication is defined as communication between entities in a dist...Indirect communication is defined as communication between entities in a dist...
Indirect communication is defined as communication between entities in a dist...nandepovanhu
 
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Ensar Basri Kahveci
 
نظم موزعة Distributed systems slides.01.pdf
نظم موزعة Distributed systems slides.01.pdfنظم موزعة Distributed systems slides.01.pdf
نظم موزعة Distributed systems slides.01.pdfBilal Al-samaee
 
The Role of DevPortals in Digital Transformation
The Role of DevPortals in Digital TransformationThe Role of DevPortals in Digital Transformation
The Role of DevPortals in Digital TransformationPronovix
 
JUG CH December 2022 - Generic or specific?
JUG CH December 2022 - Generic or specific?JUG CH December 2022 - Generic or specific?
JUG CH December 2022 - Generic or specific?Bert Jan Schrijver
 

Ähnlich wie Flexible Paxos: Reaching agreement without majorities (20)

Hyperchains
HyperchainsHyperchains
Hyperchains
 
Building Reactive Scalable Systems
Building Reactive Scalable SystemsBuilding Reactive Scalable Systems
Building Reactive Scalable Systems
 
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphores
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
 
How to Build Your Blockchain Project with Chainstack
How to Build Your Blockchain Project with ChainstackHow to Build Your Blockchain Project with Chainstack
How to Build Your Blockchain Project with Chainstack
 
Lecture_8.ppt
Lecture_8.pptLecture_8.ppt
Lecture_8.ppt
 
Sh ch01
Sh ch01Sh ch01
Sh ch01
 
DDB_lec_05_Concurrency_Control.pdf
DDB_lec_05_Concurrency_Control.pdfDDB_lec_05_Concurrency_Control.pdf
DDB_lec_05_Concurrency_Control.pdf
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
 
Consensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & AnalysisConsensus Algorithms: An Introduction & Analysis
Consensus Algorithms: An Introduction & Analysis
 
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017
 
CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45
 
Indirect communication is defined as communication between entities in a dist...
Indirect communication is defined as communication between entities in a dist...Indirect communication is defined as communication between entities in a dist...
Indirect communication is defined as communication between entities in a dist...
 
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
 
Introduction
IntroductionIntroduction
Introduction
 
Block Chain Basics
Block Chain BasicsBlock Chain Basics
Block Chain Basics
 
نظم موزعة Distributed systems slides.01.pdf
نظم موزعة Distributed systems slides.01.pdfنظم موزعة Distributed systems slides.01.pdf
نظم موزعة Distributed systems slides.01.pdf
 
The Role of DevPortals in Digital Transformation
The Role of DevPortals in Digital TransformationThe Role of DevPortals in Digital Transformation
The Role of DevPortals in Digital Transformation
 
JUG CH December 2022 - Generic or specific?
JUG CH December 2022 - Generic or specific?JUG CH December 2022 - Generic or specific?
JUG CH December 2022 - Generic or specific?
 
Unit_4_Fault_Tolerance.pptx
Unit_4_Fault_Tolerance.pptxUnit_4_Fault_Tolerance.pptx
Unit_4_Fault_Tolerance.pptx
 

Kürzlich hochgeladen

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Kürzlich hochgeladen (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Flexible Paxos: Reaching agreement without majorities