SlideShare ist ein Scribd-Unternehmen logo
1 von 3
Downloaden Sie, um offline zu lesen
High Availability via
Asynchronous Virtual
Machine Replication
Review by MĂĄrio Almeida (EMDC)

Summary
High availability requires the usage of redundancy techniques that are capable of maintaining
and switching to backups in case of failure. Commercial high availability systems generally use
specialized hardware and/or customized software to achieve this purpose.

This paper describes a system called Remus. It provides OS and application agnostic high
availability on commodity hardware. It performs virtualization to migrate running VMs between
physical hosts, and extends the technique to replicate snapshots of an entire running OS
instance at very high frequencies between a pair of physical machines. It discretizes the system
into a serie of replicated snapshots.

Any transmitted network packets is not released until the system state that produced it has been
replicated. It allows a single host to execute speculatively and then checkpoint and replicate
its state asynchronously. System state is not made externally visible until the checkpoint is
committed.

Remus ensures that regardless of the moment at which the primary fails, no externally visible
state is ever lost. It aims to make mission-critical availability accessible to mid- and low-end
systems.

Remus goals:
  ● Generality - High availability should be provided as a low-level service, with common
     mechanisms that apply regardless of the application being protected or the hardware on
     which it runs.

   ●   Transparency - High availability should not require that OS or application code be
       modiïŹed to support facilities such as failure detection or state recovery.
●   Seamless failure recovery - No externally visible state should ever be lost in the case
       of single-host failure. Failure recovery should be fast. Established TCP connections
       should not be lost or reset.

Remus runs paired servers in an active-passive conïŹguration. Speculative execution decouples
external output from synchronization points. Synchronization with the replicated server is
performed asynchronously. The basic stages of operation in Remus are the following:




Some characteristics:

   ●   VM-based whole-system replication.

   ●   Speculative execution - Replication may be achieved either by copying the state of a
       system. The state of the replica is synchronized with the primary only when the output of
       the primary has become externally visible. It buffers output until a more convenient time,
       performing computation speculatively ahead of synchronization points.

   ●   Asynchronous replication - due to buffering output at the primary server. The primary
       host can resume execution when its machine state has been captured, without waiting
       for an ack.

Remus failure model provides the following properties:

   ●   The fail-stop failure of any single host is tolerable.
●   Should both the primary and backup hosts fail concurrently, the protected system’s data
       will be left in a crash-consistent state.

   ●   No output will be made externally visible until the associated system state has been
       committed to the replica.

It uses a simple failure detector integrated in the checkpointing stream. A timeout of the backup
responding to commit requests will result in the primary assuming that the backup has crashed
and disabling protection. Similarly, a timeout of new checkpoints being transmitted from the
primary will result in the backup assuming that the primary has crashed and resuming execution
from the most recent checkpoint.

Remus also has pipelined checkpoints since it uses an epoch-based system in which execution
of the active VM is bounded by brief pauses in execution in which changed state is atomically
captured, and external output is released when that state has been propagated to the backup.


Lesson
High availability is possible through virtual machine replication using existing software and
running on commodity hardware. Remus performs frequent global checkpoints to replicate the
state of a single speculatively executing virtual machine.


Critique
It comes with the price of introducing a small performance overhead due to the network
buffering required to ensure consistent replication.

Weitere Àhnliche Inhalte

Andere mochten auch

1phase induction motors
1phase induction motors1phase induction motors
1phase induction motorsVijay Raskar
 
Electric drives
Electric drivesElectric drives
Electric drivesraj_e2004
 
Op amp(operational amplifier)
Op amp(operational amplifier)Op amp(operational amplifier)
Op amp(operational amplifier)Kausik das
 
Illumination Lighting
Illumination LightingIllumination Lighting
Illumination LightingVijay Raskar
 
Circuit switching packet switching
Circuit switching  packet  switchingCircuit switching  packet  switching
Circuit switching packet switchingSneha Dalvi
 
8051 MICROCONTROLLER
8051 MICROCONTROLLER 8051 MICROCONTROLLER
8051 MICROCONTROLLER THANDAIAH PRABU
 
Electric drives
Electric drivesElectric drives
Electric drivesSamsu Deen
 
Power Supply Project
Power Supply ProjectPower Supply Project
Power Supply Projectusmanhadi91
 
Induction motor
Induction motorInduction motor
Induction motorsodanforeva
 
Electric traction
Electric tractionElectric traction
Electric tractionPiyush Kumar
 

Andere mochten auch (11)

1phase induction motors
1phase induction motors1phase induction motors
1phase induction motors
 
Electric drives
Electric drivesElectric drives
Electric drives
 
Op amp(operational amplifier)
Op amp(operational amplifier)Op amp(operational amplifier)
Op amp(operational amplifier)
 
Illumination Lighting
Illumination LightingIllumination Lighting
Illumination Lighting
 
Circuit switching packet switching
Circuit switching  packet  switchingCircuit switching  packet  switching
Circuit switching packet switching
 
8051 MICROCONTROLLER
8051 MICROCONTROLLER 8051 MICROCONTROLLER
8051 MICROCONTROLLER
 
Electric drives
Electric drivesElectric drives
Electric drives
 
Circuit breaker
Circuit breakerCircuit breaker
Circuit breaker
 
Power Supply Project
Power Supply ProjectPower Supply Project
Power Supply Project
 
Induction motor
Induction motorInduction motor
Induction motor
 
Electric traction
Electric tractionElectric traction
Electric traction
 

Mehr von MĂĄrio Almeida

Empirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingEmpirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingMĂĄrio Almeida
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeMĂĄrio Almeida
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)MĂĄrio Almeida
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalabilityMĂĄrio Almeida
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsMĂĄrio Almeida
 
Self-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsSelf-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsMĂĄrio Almeida
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelizationMĂĄrio Almeida
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacksMĂĄrio Almeida
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News AggregatorMĂĄrio Almeida
 
Exploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsExploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsMĂĄrio Almeida
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksMĂĄrio Almeida
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytraceMĂĄrio Almeida
 
Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabricMĂĄrio Almeida
 

Mehr von MĂĄrio Almeida (15)

Empirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingEmpirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application Scheduling
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skype
 
Spark
SparkSpark
Spark
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalability
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache Simulations
 
Self-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File SystemsSelf-Adapting, Energy-Conserving Distributed File Systems
Self-Adapting, Energy-Conserving Distributed File Systems
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelization
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacks
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
 
Exploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsExploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed Systems
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytrace
 
Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabric
 
SOAP vs REST
SOAP vs RESTSOAP vs REST
SOAP vs REST
 

KĂŒrzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

KĂŒrzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

High availability via asynchronous virtual machine replication

  • 1. High Availability via Asynchronous Virtual Machine Replication Review by MĂĄrio Almeida (EMDC) Summary High availability requires the usage of redundancy techniques that are capable of maintaining and switching to backups in case of failure. Commercial high availability systems generally use specialized hardware and/or customized software to achieve this purpose. This paper describes a system called Remus. It provides OS and application agnostic high availability on commodity hardware. It performs virtualization to migrate running VMs between physical hosts, and extends the technique to replicate snapshots of an entire running OS instance at very high frequencies between a pair of physical machines. It discretizes the system into a serie of replicated snapshots. Any transmitted network packets is not released until the system state that produced it has been replicated. It allows a single host to execute speculatively and then checkpoint and replicate its state asynchronously. System state is not made externally visible until the checkpoint is committed. Remus ensures that regardless of the moment at which the primary fails, no externally visible state is ever lost. It aims to make mission-critical availability accessible to mid- and low-end systems. Remus goals: ● Generality - High availability should be provided as a low-level service, with common mechanisms that apply regardless of the application being protected or the hardware on which it runs. ● Transparency - High availability should not require that OS or application code be modiïŹed to support facilities such as failure detection or state recovery.
  • 2. ● Seamless failure recovery - No externally visible state should ever be lost in the case of single-host failure. Failure recovery should be fast. Established TCP connections should not be lost or reset. Remus runs paired servers in an active-passive conïŹguration. Speculative execution decouples external output from synchronization points. Synchronization with the replicated server is performed asynchronously. The basic stages of operation in Remus are the following: Some characteristics: ● VM-based whole-system replication. ● Speculative execution - Replication may be achieved either by copying the state of a system. The state of the replica is synchronized with the primary only when the output of the primary has become externally visible. It buffers output until a more convenient time, performing computation speculatively ahead of synchronization points. ● Asynchronous replication - due to buffering output at the primary server. The primary host can resume execution when its machine state has been captured, without waiting for an ack. Remus failure model provides the following properties: ● The fail-stop failure of any single host is tolerable.
  • 3. ● Should both the primary and backup hosts fail concurrently, the protected system’s data will be left in a crash-consistent state. ● No output will be made externally visible until the associated system state has been committed to the replica. It uses a simple failure detector integrated in the checkpointing stream. A timeout of the backup responding to commit requests will result in the primary assuming that the backup has crashed and disabling protection. Similarly, a timeout of new checkpoints being transmitted from the primary will result in the backup assuming that the primary has crashed and resuming execution from the most recent checkpoint. Remus also has pipelined checkpoints since it uses an epoch-based system in which execution of the active VM is bounded by brief pauses in execution in which changed state is atomically captured, and external output is released when that state has been propagated to the backup. Lesson High availability is possible through virtual machine replication using existing software and running on commodity hardware. Remus performs frequent global checkpoints to replicate the state of a single speculatively executing virtual machine. Critique It comes with the price of introducing a small performance overhead due to the network buffering required to ensure consistent replication.