SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Detecting and
           Correcting Transient
            Hardware Errors
            John Byrne (john.l.byrne@hp.com), Norman P. Jouppi,
            Laura Ramirez, Parthasarathy Ranganathan, Bruce J. Walker
            HP Labs

            Nidhi Aggarwal, Kewal K. Saluja, James E. Smith
            University of Wisconsin – Madison




© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Availability/Reliability Spectrum
    High Availability;    Fault Tolerance; No lost
                                                     Fault Correction; keep
    Restart/Failover;     work; keep running,
                                                     Running with correct results
    Lose ongoing work     even with bad results




2      24 February 2009
Availability/Reliability Spectrum
    High Availability;             Fault Tolerance; No lost
                                                                Fault Correction; keep
    Restart VM;                    work; keep running,
                                                                Running with correct results
    Lose ongoing work              even with bad results

                      Ongoing FT Work:
                          Remus
                                                Checkpoint and restart on complete node failure
                          Kemari
                                              Output comparison; pick one if they don’t match
                          Marathon
       Lockstep
                          VMware
       Execution                               Backup takes over on complete node failure




3      24 February 2009
Availability/Reliability Spectrum
    High Availability;             Fault Tolerance; No lost
                                                                Fault Correction; keep
    Restart VM;                    work; keep running,
                                                                Running with correct results
    Lose ongoing work              even with bad results

                      Ongoing FT Work:
                          Remus
                                                Checkpoint and restart on complete node failure
                          Kemari
                                              Output comparison; pick one if they don’t match
                          Marathon
       Lockstep
                          VMware
       Execution                               Backup takes over on complete node failure

     Theme: Deal with Complete Node Failure


    No one is detecting or correcting transient processor failures



4      24 February 2009
Transient Hardware Errors
    International Technology Roadmap for Semiconductors
•
    has predicted significant reliability problems
    Intel study in 2005 indicated 100-fold increase in
•
    transient faults in scaling from 180nm to 16nm

Errors can:
a. Crash the OS;
b. Corrupt data;
c. Cause execution to take a different path;


Goal: Detect and Correct Transient Errors


5   24 February 2009
Detect and Correct Transient Errors
      Lockstep VMs with ongoing checkpoints.
1.
      Tee input to both VMs
2.
      Compare output from VMs and re-execute on
3.
      miscompare
      Compare checkpoints and re-execute on
4.
      miscompare
      Log input, interrupts and non-deterministic
5.
      instructions to allow completely accurate re-
      execution;



6    24 February 2009
Lockstep VMs
• Create 2 identical images at VM start time;
• Ensure response to each VMexit is identical;
    − Force them to be identical if necessary (rdtsc)
• Deliver              interrupt at the identical instruction
    − At each VMexit, deliver interrupts if they are pending;
    − Use the count-down PMU counter to force a
      synchronization point if no VMexits happen and deliver
      interrupts at that point.
• Log VMexit return values as necessary (for replay);
• Log when interrupts are delivered (for replay)


7   24 February 2009
I/O
• Input        is sent to both VMs (network and disk);
    − Input is logged as being part of a specific checkpoint so
      on replay inputs can come from the log;
• Output               is compared and if equal, is sent out;
    − If not equal, re-execute from the last good checkpoint
    − Blktap driver modified to allow disk output to be
      compared
    − Network backend driver modified to allow network
      output to be compared;
    − Output is counted so on replay the correct number of re-
      created outputs can be discarded and not output twice.

8   24 February 2009
Checkpointing and Comparing
    Incremental, periodic checkpoints of each VM
•
    − At exactly the same instruction;
    − Utilize COW or copy at checkpoint time;
    − Mark the checkpoint event in the input and output streams
    − Immediate continue execution after checkpoint done;
    After checkpoint X is done, compare the incremental
•
    checkpoints
    − If equal, delete one of the checkpoint
    − If equal, delete checkpoint X-1 + any logs for x-1;
    − If not equal, then do a replay from checkpoint X-1;
  At checkpoint event, tell input to start a new log;
•
• At checkpoint event, tell output to record the output count
  and start a new count for the next checkpoint


9    24 February 2009
Replay from Checkpoint X
• Restore                the registers and memory image
• Tell  input i/o to replay input from log for
     checkpoint X
• Tell  output that we are replaying from checkpoint X
     and it then throws away as many i/o’s as were
     done since checkpoint X.




10    24 February 2009
Initial Limitations
• Uniprocessor            VMs
• HVM            guests
• Single           Node implementation




11   24 February 2009
Performance Data to Date
• Implementation      to date includes lockstep VMs and
     disk i/o input funneling and output checking
     (network i/o is implemented but not yet tested)
• Bonnie   i/o benchmark didn’t show any
     degradation;
• SpecCPU     benchmark suite showed 2-5%
     degradation.




12    24 February 2009
Plans
• Implement             the checkpoint and replay code
• Work            on performance of lockstep and checkpoint
• Investigate           UP, HVM and single-site restrictions.




13   24 February 2009
XS Oracle 2009 Error Detection

Weitere ähnliche Inhalte

Was ist angesagt?

How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceHow to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceStarWind Software
 
Linux On V Mware ESXi
Linux On V Mware ESXiLinux On V Mware ESXi
Linux On V Mware ESXiMasafumi Ohta
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisBlauge
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksAmit Gatenyo
 
Windows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализацииWindows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализацииВиталий Стародубцев
 
i//:squared Business Continuity Event
i//:squared Business Continuity Eventi//:squared Business Continuity Event
i//:squared Business Continuity EventJonathan Allmayer
 
Building vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring ToolsBuilding vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring ToolsPablo Roesch
 

Was ist angesagt? (20)

XS Japan 2008 Services English
XS Japan 2008 Services EnglishXS Japan 2008 Services English
XS Japan 2008 Services English
 
XS Boston 2008 Project Status
XS Boston 2008 Project StatusXS Boston 2008 Project Status
XS Boston 2008 Project Status
 
XS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO EmulationXS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO Emulation
 
Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008Ian Prattlinuxworld Xen Aug2008
Ian Prattlinuxworld Xen Aug2008
 
XS Oracle 2009 PVOps
XS Oracle 2009 PVOpsXS Oracle 2009 PVOps
XS Oracle 2009 PVOps
 
XS 2008 Boston VTPM
XS 2008 Boston VTPMXS 2008 Boston VTPM
XS 2008 Boston VTPM
 
Nakajima hvm-be final
Nakajima hvm-be finalNakajima hvm-be final
Nakajima hvm-be final
 
XS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolarisXS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolaris
 
XS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt EnglishXS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt English
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceHow to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
How to Optimize Microsoft Hyper-V Failover Cluster and Double Performance
 
Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008Ina Pratt Fosdem Feb2008
Ina Pratt Fosdem Feb2008
 
Linux On V Mware ESXi
Linux On V Mware ESXiLinux On V Mware ESXi
Linux On V Mware ESXi
 
XS Oracle 2009 Just Run It
XS Oracle 2009 Just Run ItXS Oracle 2009 Just Run It
XS Oracle 2009 Just Run It
 
Hyper V And Scvmm Best Practis
Hyper V And Scvmm Best PractisHyper V And Scvmm Best Practis
Hyper V And Scvmm Best Practis
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and Tricks
 
Презентация RDS & App-V, VDI
Презентация RDS & App-V, VDIПрезентация RDS & App-V, VDI
Презентация RDS & App-V, VDI
 
Windows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализацииWindows Server "10": что нового в виртуализации
Windows Server "10": что нового в виртуализации
 
i//:squared Business Continuity Event
i//:squared Business Continuity Eventi//:squared Business Continuity Event
i//:squared Business Continuity Event
 
Building vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring ToolsBuilding vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring Tools
 

Andere mochten auch

Andere mochten auch (7)

XS Oracle 2009 Networking 10gig
XS Oracle 2009 Networking 10gigXS Oracle 2009 Networking 10gig
XS Oracle 2009 Networking 10gig
 
Xen Directions HXEN Slides
Xen Directions HXEN SlidesXen Directions HXEN Slides
Xen Directions HXEN Slides
 
Ese Neno Da Rua
Ese Neno Da RuaEse Neno Da Rua
Ese Neno Da Rua
 
Apollotheme presentation
Apollotheme presentationApollotheme presentation
Apollotheme presentation
 
XS Boston 2008 Guest Spinning
XS Boston 2008 Guest SpinningXS Boston 2008 Guest Spinning
XS Boston 2008 Guest Spinning
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64
 

Ähnlich wie XS Oracle 2009 Error Detection

Network Connectivity Test Script
Network Connectivity Test ScriptNetwork Connectivity Test Script
Network Connectivity Test ScriptMersedeh Arvaneh
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
 
The Verification Methodology Landscape
The Verification Methodology LandscapeThe Verification Methodology Landscape
The Verification Methodology LandscapeDVClub
 
Everything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to askEverything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to asklauraxthomson
 
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUISPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUIandrew.macleod
 
Designing and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale ServicesDesigning and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale Servicesbigqiang zou
 
20080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture0120080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture01Computer Science Club
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go BadSteve Loughran
 
Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010Clemens Reijnen
 
Verify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e testsVerify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e testsKen'ichi Ohmichi
 
How Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME ResultsHow Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME ResultsSafe Software
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Cωνσtantίnoς Giannoulis
 
Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxmani723
 
Reliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated TestsReliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated TestsWaseem Hamshawi
 
Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3Alex Tregubov
 
Yield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applicationsYield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applicationsPete Sarson, PH.D
 

Ähnlich wie XS Oracle 2009 Error Detection (20)

Network Connectivity Test Script
Network Connectivity Test ScriptNetwork Connectivity Test Script
Network Connectivity Test Script
 
A05
A05A05
A05
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
The Verification Methodology Landscape
The Verification Methodology LandscapeThe Verification Methodology Landscape
The Verification Methodology Landscape
 
Everything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to askEverything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to ask
 
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUISPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
SPA 2009 - Acceptance Testing AJAX Web Applications through the GUI
 
Ruby и TestComplete
Ruby и TestCompleteRuby и TestComplete
Ruby и TestComplete
 
Designing and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale ServicesDesigning and Deploying Internet-Scale Services
Designing and Deploying Internet-Scale Services
 
20080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture0120080501 software verification_sharygina_lecture01
20080501 software verification_sharygina_lecture01
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
 
Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010Technical meeting automated testing with vs2010
Technical meeting automated testing with vs2010
 
Verify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e testsVerify Your Kubernetes Clusters with Upstream e2e tests
Verify Your Kubernetes Clusters with Upstream e2e tests
 
How Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME ResultsHow Not To Be Caught Flat-footed With Unpredictable FME Results
How Not To Be Caught Flat-footed With Unpredictable FME Results
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
 
Ug. marketplace testing
Ug. marketplace testingUg. marketplace testing
Ug. marketplace testing
 
Wndows Phone 7 Marketplace testing
Wndows Phone 7 Marketplace testingWndows Phone 7 Marketplace testing
Wndows Phone 7 Marketplace testing
 
Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptx
 
Reliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated TestsReliability Patterns for Large-Scale Automated Tests
Reliability Patterns for Large-Scale Automated Tests
 
Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3Continuous delivery continuous integration 0.3
Continuous delivery continuous integration 0.3
 
Yield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applicationsYield improvement of an eeprom for automotive applications
Yield improvement of an eeprom for automotive applications
 

Mehr von The Linux Foundation

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleThe Linux Foundation
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...The Linux Foundation
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...The Linux Foundation
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...The Linux Foundation
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather ReportThe Linux Foundation
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...The Linux Foundation
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxThe Linux Foundation
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...The Linux Foundation
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderThe Linux Foundation
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...The Linux Foundation
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...The Linux Foundation
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixThe Linux Foundation
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdThe Linux Foundation
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...The Linux Foundation
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DThe Linux Foundation
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsThe Linux Foundation
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...The Linux Foundation
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...The Linux Foundation
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...The Linux Foundation
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEThe Linux Foundation
 

Mehr von The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

XS Oracle 2009 Error Detection

  • 1. Detecting and Correcting Transient Hardware Errors John Byrne (john.l.byrne@hp.com), Norman P. Jouppi, Laura Ramirez, Parthasarathy Ranganathan, Bruce J. Walker HP Labs Nidhi Aggarwal, Kewal K. Saluja, James E. Smith University of Wisconsin – Madison © 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • 2. Availability/Reliability Spectrum High Availability; Fault Tolerance; No lost Fault Correction; keep Restart/Failover; work; keep running, Running with correct results Lose ongoing work even with bad results 2 24 February 2009
  • 3. Availability/Reliability Spectrum High Availability; Fault Tolerance; No lost Fault Correction; keep Restart VM; work; keep running, Running with correct results Lose ongoing work even with bad results Ongoing FT Work: Remus Checkpoint and restart on complete node failure Kemari Output comparison; pick one if they don’t match Marathon Lockstep VMware Execution Backup takes over on complete node failure 3 24 February 2009
  • 4. Availability/Reliability Spectrum High Availability; Fault Tolerance; No lost Fault Correction; keep Restart VM; work; keep running, Running with correct results Lose ongoing work even with bad results Ongoing FT Work: Remus Checkpoint and restart on complete node failure Kemari Output comparison; pick one if they don’t match Marathon Lockstep VMware Execution Backup takes over on complete node failure Theme: Deal with Complete Node Failure No one is detecting or correcting transient processor failures 4 24 February 2009
  • 5. Transient Hardware Errors International Technology Roadmap for Semiconductors • has predicted significant reliability problems Intel study in 2005 indicated 100-fold increase in • transient faults in scaling from 180nm to 16nm Errors can: a. Crash the OS; b. Corrupt data; c. Cause execution to take a different path; Goal: Detect and Correct Transient Errors 5 24 February 2009
  • 6. Detect and Correct Transient Errors Lockstep VMs with ongoing checkpoints. 1. Tee input to both VMs 2. Compare output from VMs and re-execute on 3. miscompare Compare checkpoints and re-execute on 4. miscompare Log input, interrupts and non-deterministic 5. instructions to allow completely accurate re- execution; 6 24 February 2009
  • 7. Lockstep VMs • Create 2 identical images at VM start time; • Ensure response to each VMexit is identical; − Force them to be identical if necessary (rdtsc) • Deliver interrupt at the identical instruction − At each VMexit, deliver interrupts if they are pending; − Use the count-down PMU counter to force a synchronization point if no VMexits happen and deliver interrupts at that point. • Log VMexit return values as necessary (for replay); • Log when interrupts are delivered (for replay) 7 24 February 2009
  • 8. I/O • Input is sent to both VMs (network and disk); − Input is logged as being part of a specific checkpoint so on replay inputs can come from the log; • Output is compared and if equal, is sent out; − If not equal, re-execute from the last good checkpoint − Blktap driver modified to allow disk output to be compared − Network backend driver modified to allow network output to be compared; − Output is counted so on replay the correct number of re- created outputs can be discarded and not output twice. 8 24 February 2009
  • 9. Checkpointing and Comparing Incremental, periodic checkpoints of each VM • − At exactly the same instruction; − Utilize COW or copy at checkpoint time; − Mark the checkpoint event in the input and output streams − Immediate continue execution after checkpoint done; After checkpoint X is done, compare the incremental • checkpoints − If equal, delete one of the checkpoint − If equal, delete checkpoint X-1 + any logs for x-1; − If not equal, then do a replay from checkpoint X-1; At checkpoint event, tell input to start a new log; • • At checkpoint event, tell output to record the output count and start a new count for the next checkpoint 9 24 February 2009
  • 10. Replay from Checkpoint X • Restore the registers and memory image • Tell input i/o to replay input from log for checkpoint X • Tell output that we are replaying from checkpoint X and it then throws away as many i/o’s as were done since checkpoint X. 10 24 February 2009
  • 11. Initial Limitations • Uniprocessor VMs • HVM guests • Single Node implementation 11 24 February 2009
  • 12. Performance Data to Date • Implementation to date includes lockstep VMs and disk i/o input funneling and output checking (network i/o is implemented but not yet tested) • Bonnie i/o benchmark didn’t show any degradation; • SpecCPU benchmark suite showed 2-5% degradation. 12 24 February 2009
  • 13. Plans • Implement the checkpoint and replay code • Work on performance of lockstep and checkpoint • Investigate UP, HVM and single-site restrictions. 13 24 February 2009