SlideShare ist ein Scribd-Unternehmen logo
1 von 15
CprE 458/558: Real-Time Systems


                              Lecture 17
                  Fault-tolerant design techniques




CprE 458/558               G. Manimaran (ISU)
Fault Tolerant Strategies
    Fault tolerance in computer system is achieved through
     redundancy in hardware, software, information, and/or
     computations. Such redundancy can be implemented in
     static, dynamic, or hybrid configurations.
    Fault tolerance can be achieved by many techniques:
        Fault masking is any process that prevents faults in a system
         from introducing errors. Example: Error correcting memories and
         majority voting.
        Reconfiguration is the process of eliminating faulty component
         from a system and restoring the system to some operational state.




CprE 458/558                 G. Manimaran (ISU)                          2
Reconfiguration Approach
    Fault detection is the process of recognizing that a fault
     has occurred. Fault detection is often required before any
     recovery procedure can be initiated.
    Fault location is the process of determining where a fault
     has occurred so that an appropriate recovery can be
     initiated.
    Fault containment is the process of isolating a fault and
     preventing the effects of that fault from propagating
     throughout the system.
    Fault recovery is the process of remaining operational or
     regaining operational status via reconfiguration even in the
     presence of faults.

CprE 458/558             G. Manimaran (ISU)                     3
The Concept of Redundancy
    Redundancy is simply the addition of information,
     resources, or time beyond what is needed for normal
     system operation.
    Hardware redundancy is the addition of extra
     hardware, usually for the purpose either detecting or
     tolerating faults.
    Software redundancy is the addition of extra software,
     beyond what is needed to perform a given function, to
     detect and possibly tolerate faults.
    Information redundancy is the addition of extra
     information beyond that required to implement a given
     function; for example, error detection codes.


CprE 458/558            G. Manimaran (ISU)                    4
The Concept of Redundancy (Cont’d)
    Time redundancy uses additional time to perform the
     functions of a system such that fault detection and often
     fault tolerance can be achieved. Transient faults are
     tolerated by this.

    The use of redundancy can provide additional capabilities
     within a system. But, redundancy can have very important
     impact on a system's performance, size, weight, power
     consumption, and reliability.




CprE 458/558             G. Manimaran (ISU)                      5
Hardware Redundancy
    Passive techniques use the concept of fault masking.
     These techniques are designed to achieve fault tolerance
     without requiring any action on the part of the system.
     Relies on voting mechanisms.
    Active techniques achieve fault tolerance by detecting
     the existence of faults and performing some action to
     remove the faulty hardware from the system. That is,
     active techniques use fault detection, fault location, and
     fault recovery in an attempt to achieve fault tolerance.




CprE 458/558             G. Manimaran (ISU)                       6
Hardware Redundancy (Cont’d)


    Hybrid techniques combine the attractive features of
     both the passive
     and active approaches.
        Fault masking is used in hybrid systems to prevent erroneous
         results from being generated.
        Fault detection, location, and recovery are also used to improve
         fault tolerance by removing faulty hardware and replacing it with
         spares.




CprE 458/558                 G. Manimaran (ISU)                              7
Hardware Redundancy - A Taxonomy

     Title:
     hard-fault.fig
     Creator:
     fig2dev Version 3.1 Patchlevel 2
     Preview:
     This EPS picture was not saved
     with a preview included in it.
     Comment:
     This EPS picture will print to a
     PostScript printer, but not to
     other types of printers.




CprE 458/558                            G. Manimaran (ISU)   8
Triple Modular Redundancy (TMR)


          Title:
          tmr1.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   9
Software Redundancy - to Detect
        Software Faults
    There are two popular approaches: N-Version
     Programming (NVP) and Recovery Blocks (RB).

    NVP is a forward recovery scheme - it masks faults.
    NVP: multiple versions of the same task is executed
     concurrently.
    NVP relies on voting.

    RB is a backward error recovery scheme.
    RB: the versions of a task are executed serially.
    RB relies on acceptance test.

CprE 458/558              G. Manimaran (ISU)               10
N-Version Programming (NVP)
    NVP is based on the principle of design diversity, that is coding a
     software module by different teams of programmers, to have multiple
     versions.

    Diversity can also be introduced by employing different algorithms for
     obtaining the same solution or by choosing different programming
     languages.

    NVP can tolerate both hardware and software faults.

    Correlated faults are not tolerated by the NVP.

    In NVP, deciding the number of versions required to ensure acceptable
     levels of software reliability is an important design consideration.


CprE 458/558                  G. Manimaran (ISU)                          11
N-Version Programming (Cont’d)

          Title:
          nvp.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   12
Recovery Blocks (RB)
    RB uses multiple alternates (backups) to perform the same
     function; one module (task) is primary and the others are
     secondary.

    The primary task executes first. When the primary task
     completes execution, its outcome is checked by an
     acceptance test.

    If the output is not acceptable, a secondary task executes
     after undoing the effects of primary (i.e., rolling back to
     the state at which primary was invoked) until either an
     acceptable output is obtained or the alternates are
     exhausted.

CprE 458/558             G. Manimaran (ISU)                    13
Recovery Blocks (Cont’d)
         Title:
         rblocks.fig
         Creator:
         fig2dev Version 3.1 Patchlevel 2
         Preview:
         This EPS picture was not saved
         with a preview included in it.
         Comment:
         This EPS picture will print to a
         PostScript printer, but not to
         other types of printers.




CprE 458/558                                G. Manimaran (ISU)   14
Recovery Blocks (Cont’d)
    The acceptance tests are usually sanity checks; these
     consist of making sure that the output is within a certain
     acceptable range or that the output does not change at
     more than the allowed maximum rate.

    Selecting the range for acceptance test is crucial. If the
     allowed ranges are too small, the acceptance tests may
     label correct outputs as bad. If they are too large, the
     probability that incorrect outputs will be accepted is more.

    RB can tolerate software faults because the alternates are
     usually implemented with different approaches; RB is also
     known as Primary-Backup approach.

CprE 458/558              G. Manimaran (ISU)                      15

Weitere ähnliche Inhalte

Was ist angesagt?

Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical ClocksDilum Bandara
 
Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network LayerRahul Hada
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed SystemRajan Kumar
 
Deadlock in Distributed Systems
Deadlock in Distributed SystemsDeadlock in Distributed Systems
Deadlock in Distributed SystemsPritom Saha Akash
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machinesSyed Zaid Irshad
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating SystemsDr Sandeep Kumar Poonia
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Ravindra Raju Kolahalam
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 

Was ist angesagt? (20)

Message passing in Distributed Computing Systems
Message passing in Distributed Computing SystemsMessage passing in Distributed Computing Systems
Message passing in Distributed Computing Systems
 
Multiple access protocol
Multiple access protocolMultiple access protocol
Multiple access protocol
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
TinyOS
TinyOSTinyOS
TinyOS
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
 
Framing in data link layer
Framing in data link layerFraming in data link layer
Framing in data link layer
 
The medium access sublayer
 The medium  access sublayer The medium  access sublayer
The medium access sublayer
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network Layer
 
Stream oriented communication
Stream oriented communicationStream oriented communication
Stream oriented communication
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed System
 
Mainframe systems
Mainframe systemsMainframe systems
Mainframe systems
 
Deadlock in Distributed Systems
Deadlock in Distributed SystemsDeadlock in Distributed Systems
Deadlock in Distributed Systems
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
Distributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithmsDistributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithms
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 

Andere mochten auch

N-version programming
N-version programmingN-version programming
N-version programmingshabnam0102
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data BaseSiva Rushi
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Sri Prasanna
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)Pradeep Kumar TS
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentationardhita banu adji
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)Riswan
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008courseAnuj Sharma
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsWayne Jones Jnr
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systemscoolmirza143
 

Andere mochten auch (20)

N-version programming
N-version programmingN-version programming
N-version programming
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
Vxworks
VxworksVxworks
Vxworks
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentation
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008course
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time Systems
 
Ch21 real time software engineering
Ch21 real time software engineeringCh21 real time software engineering
Ch21 real time software engineering
 
DFD level-0 to 1
DFD level-0 to 1DFD level-0 to 1
DFD level-0 to 1
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systems
 

Ähnlich wie Fault tolerance

1 introduction
1 introduction1 introduction
1 introductionhanmya
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE2
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultAM Publications
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...vtunotesbysree
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisIDES Editor
 
Presentation
PresentationPresentation
Presentations1150056
 
Presentation
PresentationPresentation
Presentations1150056
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSFelipe Prado
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindseteva
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSvampugani
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share.Gastón. .Bx.
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsIRJET Journal
 

Ähnlich wie Fault tolerance (20)

1 introduction
1 introduction1 introduction
1 introduction
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software Fault
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage Analysis
 
580 584
580 584580 584
580 584
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Ch20
Ch20Ch20
Ch20
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
 

Kürzlich hochgeladen

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 

Kürzlich hochgeladen (20)

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

Fault tolerance

  • 1. CprE 458/558: Real-Time Systems Lecture 17 Fault-tolerant design techniques CprE 458/558 G. Manimaran (ISU)
  • 2. Fault Tolerant Strategies  Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations. Such redundancy can be implemented in static, dynamic, or hybrid configurations.  Fault tolerance can be achieved by many techniques:  Fault masking is any process that prevents faults in a system from introducing errors. Example: Error correcting memories and majority voting.  Reconfiguration is the process of eliminating faulty component from a system and restoring the system to some operational state. CprE 458/558 G. Manimaran (ISU) 2
  • 3. Reconfiguration Approach  Fault detection is the process of recognizing that a fault has occurred. Fault detection is often required before any recovery procedure can be initiated.  Fault location is the process of determining where a fault has occurred so that an appropriate recovery can be initiated.  Fault containment is the process of isolating a fault and preventing the effects of that fault from propagating throughout the system.  Fault recovery is the process of remaining operational or regaining operational status via reconfiguration even in the presence of faults. CprE 458/558 G. Manimaran (ISU) 3
  • 4. The Concept of Redundancy  Redundancy is simply the addition of information, resources, or time beyond what is needed for normal system operation.  Hardware redundancy is the addition of extra hardware, usually for the purpose either detecting or tolerating faults.  Software redundancy is the addition of extra software, beyond what is needed to perform a given function, to detect and possibly tolerate faults.  Information redundancy is the addition of extra information beyond that required to implement a given function; for example, error detection codes. CprE 458/558 G. Manimaran (ISU) 4
  • 5. The Concept of Redundancy (Cont’d)  Time redundancy uses additional time to perform the functions of a system such that fault detection and often fault tolerance can be achieved. Transient faults are tolerated by this.  The use of redundancy can provide additional capabilities within a system. But, redundancy can have very important impact on a system's performance, size, weight, power consumption, and reliability. CprE 458/558 G. Manimaran (ISU) 5
  • 6. Hardware Redundancy  Passive techniques use the concept of fault masking. These techniques are designed to achieve fault tolerance without requiring any action on the part of the system. Relies on voting mechanisms.  Active techniques achieve fault tolerance by detecting the existence of faults and performing some action to remove the faulty hardware from the system. That is, active techniques use fault detection, fault location, and fault recovery in an attempt to achieve fault tolerance. CprE 458/558 G. Manimaran (ISU) 6
  • 7. Hardware Redundancy (Cont’d)  Hybrid techniques combine the attractive features of both the passive and active approaches.  Fault masking is used in hybrid systems to prevent erroneous results from being generated.  Fault detection, location, and recovery are also used to improve fault tolerance by removing faulty hardware and replacing it with spares. CprE 458/558 G. Manimaran (ISU) 7
  • 8. Hardware Redundancy - A Taxonomy Title: hard-fault.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 8
  • 9. Triple Modular Redundancy (TMR) Title: tmr1.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 9
  • 10. Software Redundancy - to Detect Software Faults  There are two popular approaches: N-Version Programming (NVP) and Recovery Blocks (RB).  NVP is a forward recovery scheme - it masks faults.  NVP: multiple versions of the same task is executed concurrently.  NVP relies on voting.  RB is a backward error recovery scheme.  RB: the versions of a task are executed serially.  RB relies on acceptance test. CprE 458/558 G. Manimaran (ISU) 10
  • 11. N-Version Programming (NVP)  NVP is based on the principle of design diversity, that is coding a software module by different teams of programmers, to have multiple versions.  Diversity can also be introduced by employing different algorithms for obtaining the same solution or by choosing different programming languages.  NVP can tolerate both hardware and software faults.  Correlated faults are not tolerated by the NVP.  In NVP, deciding the number of versions required to ensure acceptable levels of software reliability is an important design consideration. CprE 458/558 G. Manimaran (ISU) 11
  • 12. N-Version Programming (Cont’d) Title: nvp.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 12
  • 13. Recovery Blocks (RB)  RB uses multiple alternates (backups) to perform the same function; one module (task) is primary and the others are secondary.  The primary task executes first. When the primary task completes execution, its outcome is checked by an acceptance test.  If the output is not acceptable, a secondary task executes after undoing the effects of primary (i.e., rolling back to the state at which primary was invoked) until either an acceptable output is obtained or the alternates are exhausted. CprE 458/558 G. Manimaran (ISU) 13
  • 14. Recovery Blocks (Cont’d) Title: rblocks.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 14
  • 15. Recovery Blocks (Cont’d)  The acceptance tests are usually sanity checks; these consist of making sure that the output is within a certain acceptable range or that the output does not change at more than the allowed maximum rate.  Selecting the range for acceptance test is crucial. If the allowed ranges are too small, the acceptance tests may label correct outputs as bad. If they are too large, the probability that incorrect outputs will be accepted is more.  RB can tolerate software faults because the alternates are usually implemented with different approaches; RB is also known as Primary-Backup approach. CprE 458/558 G. Manimaran (ISU) 15