SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Clock-RSM: Low-Latency Inter-Datacenter
State Machine Replication Using Loosely
Synchronized Physical Clocks
Jiaqing Du, Daniele Sciascia, Sameh Elnikety
Willy Zwaenepoel, Fernando Pedone
EPFL, University of Lugano, Microsoft Research
Replicated State Machines (RSM)
• Strong consistency
– Execute same commands in same order
– Reach same state from same initial state
• Fault tolerance
– Store data at multiple replicas
– Failure masking / fast failover
2
Geo-Replication
Data Center
Data Center
Data Center
Data Center
Data Center
• High latency among replicas
• Messaging dominates replication latency
3
Leader-Based Protocols
• Order commands by a leader replica
• Require extra ordering messages at follower
Leader
client request client reply
Ordering
Replication
High latency for geo replication
Ordering
4
Follower
Clock-RSM
• Orders commands using physical clocks
• Overlaps ordering and replication
5
client request client reply
Ordering + Replication
Low latency for geo replication
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
6
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
7
Property and Assumption
• Provides linearizability
• Tolerates failure of minority replicas
• Assumptions
– Asynchronous FIFO channels
– Non-Byzantine faults
– Loosely synchronized physical clocks
8
Protocol Overview
client request client reply
client request client reply
9
PrepOK
cmd1.ts = Clock()
cmd2.ts = Clock()
Clock-RSM
cmd1cmd2
cmd1cmd2
cmd1cmd2
cmd1cmd2
cmd1cmd2
Major Message Steps
• Prep: Ask everyone to log a command
• PrepOK: Tell everyone after logging a command
R0
R2
R1
client request
R3
R4
Prep
PrepOK
PrepOK
cmd1.ts = 24
PrepOK
PrepOK
cmd1 committed?
client request
cmd2.ts = 23
10
Commit Conditions
• A command is committed if
– Replicated by a majority
– All commands ordered before are committed
• Wait until three conditions hold
C1: Majority replication
C2: Stable order
C3: Prefix replication
11
C1: Majority Replication
• More than half replicas log cmd1
R0
R2
R1
client request
R3
R4
PrepOK
PrepOK
cmd1.ts = 24
Prep
Replicated by R0, R1, R2
1 RTT: between R0 and majority
12
C2: Stable Order
• Replica knows all commands ordered before cmd1
– Receives a greater timestamp from every other replica
R0
R2
R1
client request
R3
R4
24
cmd1.ts = 24
2523
25
25
25
0.5 RTT: between R0 and farthest peer
cmd1 is stable at R0
13
Prep / PrepOK / ClockTime
C3: Prefix Replication
• All commands ordered before cmd1 are replicated
by a majority
14
R0
R2
R1
client request
R3
R4
cmd1.ts = 24
cmd2 is replicated
by R1, R2, R3
cmd2.ts = 23
Prep
PrepOk
1 RTT: R4 to majority + majority to R0
client request
Prep
Prep
PrepOkPrepOk
Overlapping Steps
15
R0
R2
R1
client request
R3
R4
Latency of cmd1 : about 1 RTT to majority
client reply
Majority replication
Stable order
Prefix replication
PrepOK
PrepOK
Prep
Log(cmd1)
Log(cmd1)
24 2523
25
25
25
Prep
Prep
PrepOk
PrepOk
cmd1.ts = 24
Commit Latency
Step Latency
Majority replication 1 RTT (majority1)
Stable order 0.5 RTT (farthest)
Prefix replication 1 RTT (majority2)
Overall latency =
MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) }
16
If 0.5 RTT (farthest) < 1 RTT (majority),
then overall latency ≈ 1 RTT (majority).
R0
Topology Examples
Majority1
Farthest
R0
Majority1
Farthest
R3
R4
R2
R1
R4
R3
R2
R1
17
client request
client request
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
18
Paxos 1: Multi-Paxos
• Single leader orders commands
– Logical clock: 0, 1, 2, 3, ...
R0
Leader R2
R1
client request
Prep
CommitForward
client reply
PrepOK
R3
R4
Latency at followers: 2 RTTs (leader & majority) 19
Paxos 2: Paxos-bcast
• Every replica broadcasts PrepOK
– Trades off message complexity for latency
R0
Leader R2
R1
client request
Prep
Forward
client reply
PrepOK
R3
R4
Latency at followers: 1.5 RTTs (leader & majority)
20
Clock-RSM vs. Paxos
• With realistic topologies, Clock-RSM has
– Lower latency at Paxos follower replicas
– Similar / slightly higher latency at Paxos leader
21
Protocol Latency
Clock-RSM All replicas: 1 RTT (majority)
if 0.5 RTT (farthest) < 1 RTT (majority)
Paxos-bcast Leader: 1 RTT (majority)
Follower: 1.5 RTTs (leader & majority)
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
22
Experiment Setup
• Replicated key-value store
• Deployed on Amazon EC2
California (CA)
Virginia (VA)
Ireland (IR)
Singapore (SG)
Japan (JP)
23
Latency (1/2)
• All replicas serve client requests
24
Overlapping vs. Separate Steps
CA VA
IR
SG
JP
25
CA VA (L)
IR
SG
JP
Clock-RSM latency: max of three
Paxos-bcast latency: sum of three
client request
client request
Latency (2/2)
• Paxos leader is changed to CA
26
Throughput
• Five replicas on a local cluster
• Message batching is key
27
Also in the Paper
• A reconfiguration protocol
• Comparison with Mencius
• Latency analysis of protocols
28
Conclusion
• Clock-RSM: low latency geo-replication
– Uses loosely synchronized physical clocks
– Overlaps ordering and replication
• Leader-based protocols can incur high latency
29

Weitere ähnliche Inhalte

Was ist angesagt?

Real Time Application Interface for Linux
Real Time Application Interface for LinuxReal Time Application Interface for Linux
Real Time Application Interface for LinuxSarah Hussein
 
SCHEDULING ALGORITHMS
SCHEDULING ALGORITHMSSCHEDULING ALGORITHMS
SCHEDULING ALGORITHMSDhaval Sakhiya
 
Free OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server PerformanceFree OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server PerformanceManageEngine, Zoho Corporation
 
Free OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoringFree OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoringManageEngine, Zoho Corporation
 
Round Robin Algorithm.pptx
Round Robin Algorithm.pptxRound Robin Algorithm.pptx
Round Robin Algorithm.pptxSanad Bhowmik
 
Free OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classificationFree OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classificationManageEngine, Zoho Corporation
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 
Linux Administation
Linux AdministationLinux Administation
Linux Administationrkulandaivel
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
 
Centos failover link
Centos failover link Centos failover link
Centos failover link Ediga Watson
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotPaul V. Novarese
 
System performance monitoring pcp + vector
System performance monitoring   pcp + vectorSystem performance monitoring   pcp + vector
System performance monitoring pcp + vectorSandeep Kunkunuru
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure CallAbdelrahman Al-Ogail
 
MidTerm-RatanMohapatra
MidTerm-RatanMohapatraMidTerm-RatanMohapatra
MidTerm-RatanMohapatraRatan Mohapatra
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slidesrlrevell
 
Supporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSSupporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSNamHyuk Ahn
 
Week5 lec1-bscs1
Week5 lec1-bscs1Week5 lec1-bscs1
Week5 lec1-bscs1syedhaiderraza
 

Was ist angesagt? (20)

Real Time Application Interface for Linux
Real Time Application Interface for LinuxReal Time Application Interface for Linux
Real Time Application Interface for Linux
 
SCHEDULING ALGORITHMS
SCHEDULING ALGORITHMSSCHEDULING ALGORITHMS
SCHEDULING ALGORITHMS
 
Free OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server PerformanceFree OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server Performance
 
Free OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoringFree OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoring
 
Round Robin Algorithm.pptx
Round Robin Algorithm.pptxRound Robin Algorithm.pptx
Round Robin Algorithm.pptx
 
Free OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classificationFree OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classification
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
Linux Administation
Linux AdministationLinux Administation
Linux Administation
 
Raft presentation
Raft presentationRaft presentation
Raft presentation
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
 
Centos failover link
Centos failover link Centos failover link
Centos failover link
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
 
System performance monitoring pcp + vector
System performance monitoring   pcp + vectorSystem performance monitoring   pcp + vector
System performance monitoring pcp + vector
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure Call
 
MidTerm-RatanMohapatra
MidTerm-RatanMohapatraMidTerm-RatanMohapatra
MidTerm-RatanMohapatra
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slides
 
Supporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSSupporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OS
 
Dns
DnsDns
Dns
 
PCP
PCPPCP
PCP
 
Week5 lec1-bscs1
Week5 lec1-bscs1Week5 lec1-bscs1
Week5 lec1-bscs1
 

Ähnlich wie Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks

3 process scheduling
3 process scheduling3 process scheduling
3 process schedulingahad alam
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.pptShrutiArora343479
 
fggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggffffffffffffffffffffggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggfffffffffffffffffffadugnanegero
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.pptAbdulRahman491811
 
Process Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating SystemsProcess Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating SystemsKathirvelRajan2
 
3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------DivyaBorade3
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...Redis Labs
 
Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Muhammad Noor Ifansyah
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet CountAmazon Web Services
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)Amazon Web Services
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Community
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateDanielle Womboldt
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 
dataprocess using different technology.ppt
dataprocess using different technology.pptdataprocess using different technology.ppt
dataprocess using different technology.pptssuserf6eb9b
 
08Mapping.ppt
08Mapping.ppt08Mapping.ppt
08Mapping.pptMalikNuman8
 

Ähnlich wie Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (20)

3 process scheduling
3 process scheduling3 process scheduling
3 process scheduling
 
Real time database
Real time databaseReal time database
Real time database
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
13 risc
13 risc13 risc
13 risc
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.ppt
 
fggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggffffffffffffffffffffggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggfffffffffffffffffff
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.ppt
 
Process Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating SystemsProcess Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating Systems
 
3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
rtos.ppt
rtos.pptrtos.ppt
rtos.ppt
 
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
 
Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
dataprocess using different technology.ppt
dataprocess using different technology.pptdataprocess using different technology.ppt
dataprocess using different technology.ppt
 
08Mapping.ppt
08Mapping.ppt08Mapping.ppt
08Mapping.ppt
 

KĂźrzlich hochgeladen

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

KĂźrzlich hochgeladen (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks

  • 1. Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety Willy Zwaenepoel, Fernando Pedone EPFL, University of Lugano, Microsoft Research
  • 2. Replicated State Machines (RSM) • Strong consistency – Execute same commands in same order – Reach same state from same initial state • Fault tolerance – Store data at multiple replicas – Failure masking / fast failover 2
  • 3. Geo-Replication Data Center Data Center Data Center Data Center Data Center • High latency among replicas • Messaging dominates replication latency 3
  • 4. Leader-Based Protocols • Order commands by a leader replica • Require extra ordering messages at follower Leader client request client reply Ordering Replication High latency for geo replication Ordering 4 Follower
  • 5. Clock-RSM • Orders commands using physical clocks • Overlaps ordering and replication 5 client request client reply Ordering + Replication Low latency for geo replication
  • 6. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 6
  • 7. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 7
  • 8. Property and Assumption • Provides linearizability • Tolerates failure of minority replicas • Assumptions – Asynchronous FIFO channels – Non-Byzantine faults – Loosely synchronized physical clocks 8
  • 9. Protocol Overview client request client reply client request client reply 9 PrepOK cmd1.ts = Clock() cmd2.ts = Clock() Clock-RSM cmd1cmd2 cmd1cmd2 cmd1cmd2 cmd1cmd2 cmd1cmd2
  • 10. Major Message Steps • Prep: Ask everyone to log a command • PrepOK: Tell everyone after logging a command R0 R2 R1 client request R3 R4 Prep PrepOK PrepOK cmd1.ts = 24 PrepOK PrepOK cmd1 committed? client request cmd2.ts = 23 10
  • 11. Commit Conditions • A command is committed if – Replicated by a majority – All commands ordered before are committed • Wait until three conditions hold C1: Majority replication C2: Stable order C3: Prefix replication 11
  • 12. C1: Majority Replication • More than half replicas log cmd1 R0 R2 R1 client request R3 R4 PrepOK PrepOK cmd1.ts = 24 Prep Replicated by R0, R1, R2 1 RTT: between R0 and majority 12
  • 13. C2: Stable Order • Replica knows all commands ordered before cmd1 – Receives a greater timestamp from every other replica R0 R2 R1 client request R3 R4 24 cmd1.ts = 24 2523 25 25 25 0.5 RTT: between R0 and farthest peer cmd1 is stable at R0 13 Prep / PrepOK / ClockTime
  • 14. C3: Prefix Replication • All commands ordered before cmd1 are replicated by a majority 14 R0 R2 R1 client request R3 R4 cmd1.ts = 24 cmd2 is replicated by R1, R2, R3 cmd2.ts = 23 Prep PrepOk 1 RTT: R4 to majority + majority to R0 client request Prep Prep PrepOkPrepOk
  • 15. Overlapping Steps 15 R0 R2 R1 client request R3 R4 Latency of cmd1 : about 1 RTT to majority client reply Majority replication Stable order Prefix replication PrepOK PrepOK Prep Log(cmd1) Log(cmd1) 24 2523 25 25 25 Prep Prep PrepOk PrepOk cmd1.ts = 24
  • 16. Commit Latency Step Latency Majority replication 1 RTT (majority1) Stable order 0.5 RTT (farthest) Prefix replication 1 RTT (majority2) Overall latency = MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) } 16 If 0.5 RTT (farthest) < 1 RTT (majority), then overall latency ≈ 1 RTT (majority).
  • 18. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 18
  • 19. Paxos 1: Multi-Paxos • Single leader orders commands – Logical clock: 0, 1, 2, 3, ... R0 Leader R2 R1 client request Prep CommitForward client reply PrepOK R3 R4 Latency at followers: 2 RTTs (leader & majority) 19
  • 20. Paxos 2: Paxos-bcast • Every replica broadcasts PrepOK – Trades off message complexity for latency R0 Leader R2 R1 client request Prep Forward client reply PrepOK R3 R4 Latency at followers: 1.5 RTTs (leader & majority) 20
  • 21. Clock-RSM vs. Paxos • With realistic topologies, Clock-RSM has – Lower latency at Paxos follower replicas – Similar / slightly higher latency at Paxos leader 21 Protocol Latency Clock-RSM All replicas: 1 RTT (majority) if 0.5 RTT (farthest) < 1 RTT (majority) Paxos-bcast Leader: 1 RTT (majority) Follower: 1.5 RTTs (leader & majority)
  • 22. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 22
  • 23. Experiment Setup • Replicated key-value store • Deployed on Amazon EC2 California (CA) Virginia (VA) Ireland (IR) Singapore (SG) Japan (JP) 23
  • 24. Latency (1/2) • All replicas serve client requests 24
  • 25. Overlapping vs. Separate Steps CA VA IR SG JP 25 CA VA (L) IR SG JP Clock-RSM latency: max of three Paxos-bcast latency: sum of three client request client request
  • 26. Latency (2/2) • Paxos leader is changed to CA 26
  • 27. Throughput • Five replicas on a local cluster • Message batching is key 27
  • 28. Also in the Paper • A reconfiguration protocol • Comparison with Mencius • Latency analysis of protocols 28
  • 29. Conclusion • Clock-RSM: low latency geo-replication – Uses loosely synchronized physical clocks – Overlaps ordering and replication • Leader-based protocols can incur high latency 29