SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Software Architecture


Quality Attributes & Tactics (4)
Availability




  Vakgroep Informatietechnologie – IBCN
Availability

Availability is about system failure and its consequences.

  Faults          & Failures :
        Faults become failures if not corrected or masked.
        A failure is observable by the system user; a fault not.
  Areas          of concern:
        Fault detection and frequency
        Reduced operations
        Recovery and Prevention

                   Availability =                      MTBF
                                               MTBF + MTTR
   Vakgroep Informatietechnologie – Onderzoeksgroep IBCN            p. 2
Availability Generic Scenario




Vakgroep Informatietechnologie – Onderzoeksgroep IBCN   p. 3
Availability generic scenario (1/4)

Source of stimulus:              ……….. who or what ?
      We differentiate between internal and external indications of faults or
       failure since the desired system response may be different.

Stimulus: …………………does something ?
A fault of one of the following classes occurs.
      Omission. A component fails to respond to an input.
      Crash. The component repeatedly suffers omission faults.
      Timing. A component responds but the response is early or late.
      Bad response. A component responds with an incorrect value.

Artifact: …………. to the system or part of it ?
This specifies the resource that is required to be highly
available
      Processor,
      Communication channel,
      Process,
      Storage.

    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN                        p. 4
Availability generic scenario (2/4)




Environment: …….under certain conditions
The state of the system affects the desired system response.
   Normal mode: if this is the first fault observed, some degradation of
    response time or function may be preferred
   Degraded mode: if the system has already seen some faults it may
    be desirable to shut it down totally.
   Overload mode:




    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN             p. 5
Availability generic scenario (3/4)


Response: ………how the system reacts ?
The System should detect the event & :
      Record it
      Notify appropriate parties, including the user and other
       systems
      Disable sources of events that cause fault or failure
       according to defined rules
      be unavailable for a specified interval, where interval
       depends on criticality of system
      Continue to operate in normal or degraded mode



    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN         p. 6
Availability generic scenario (4/4)

Response Measure…how can you measure this ?
   Time interval when the system must be available
   Availability time
   Time interval in which system can be in degraded mode
   Repair time




    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN   p. 7
Availability Specific Scenario

“An unanticipated external message (DOS attack) is
received by a process during normal operation. The
process logs the receipt of the message, notifies the
operator and continues with no downtime”




  Vakgroep Informatietechnologie – Onderzoeksgroep IBCN   p. 8
Case: Digital Signage – Public Transport

                                      Availability QAS :


SOURCE              who or what                              A random event
STIMULUS            does something                           ... causes a failure
ARTIFACT            to the system or part of it              ... to the communication system
ENVIRONMENT         under certain conditions                 ...during normal operations
RESPONSE            how the system reacts                    All displays must start showing
                                                             scheduled arrival times for all
                                                             buses
MEASURE             how you can measure this                 ... Within 30 seconds of failure
                                                             detection

      Q: What is the architectural impact of this requirement ?

     Vakgroep Informatietechnologie – Onderzoeksgroep IBCN                                 p. 9
Availability Tactics
                               Tactics
                              to Control                Fault Masked
    Fault
                              Availability              or Repaired

   Fault Detection
    Echo
    Heartbeat
    Exceptions
   Fault Recovery
    Preparing for recovery
    Accomplishing the recovery
   Fault Prevention
Vakgroep Informatietechnologie – Onderzoeksgroep IBCN                  p. 10
Fault Recovery Tactics (1/4)
    Voting Tactic:
          Processes running on redundant processors each take the
           input, compute and report the results to the “vote-counter.”
               Majority rules
               Preferred Component


    Preferred component:
               This corrects faulty operation of components, algorithms or
                processors.
               The more severe the consequences of failures the more stringent
                the effort to ensure that the redundancy is independent.
                 –    Separate processors, separate implementation teams, … dissimilar
                      platforms




    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN                                p. 11
Fault Recovery Tactics (2/4)

   Active redundancy (hot restart):
        All redundant components respond to events in parallel
        Redundant components synchronized at start then first
         to return is the answer.
        This covers some faults. A faulty processor will be
         slower to respond.
        When a failure occurs the downtime is usually only
         milliseconds (switching to another component).
        Often used in client-server applications involving back-
         end databases.
        In high availability for LANs the redundancy may be
         separate paths so that failure of a bridge or router is not
         fatal. Note the synchronization demands here.



Vakgroep Informatietechnologie – Onderzoeksgroep IBCN                  p. 12
Fault Recovery Tactics (3/4)
   Passive Redundancy:
        One component responds to events and informs the standbys
         of state updates.
   Upon failure the system must:
        Ensure that the backup is sufficiently fresh.
        Restart points, checkpoints, log points ???
        Remap the system to switch which system is the active
         component.
   Often used in control systems
        Example : Air traffic Control
             Chapter 6: Air Traffic Control: A Case Study in
              Designing for High Availability




    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN        p. 13
Fault Recovery Tactics (4/4)
   Switchovers
        Upon failure or Periodic
   Synchronization:
        is the responsibility of the primary component, broadcasting
         synchronization signals to the redundant components.




    Vakgroep Informatietechnologie – Onderzoeksgroep IBCN               p. 14
Fault Prevention Tactics

   Removal from service
     To perform some preventive actions, e.g.,
      rebooting to prevent slow memory leaks from
      causing problems
   Transactions
     the bundling of a sequence of steps so that
      they can be done all at once
   Process monitor
     Once a fault in a process is detected;
              remove–reinstantiate-reinitialize state


Vakgroep Informatietechnologie – Onderzoeksgroep IBCN    p. 15
Availability Tactics Hierarchy
                                      Availability

     Fault detection         Recovery               Recovery          Prevention
                            Preparation           Reintroduction
 Fault                       and repair
                                                                                    Fault
Arrives                                                                            Masked
                                                                                     or
                                                                                   Repaired
          Ping/echo        Voting
          Heartbeat                                Shadow
                           Active red.             State resync.   Removal from
          Exception        Passive red.            Rollback
                           Spare                                   Service

                                                                   Transactions
                                                                   Process
                                                                   Monitor



     Vakgroep Informatietechnologie – Onderzoeksgroep IBCN                           p. 16

Weitere ähnliche Inhalte

Was ist angesagt?

Software Maintenance and Evolution
Software Maintenance and EvolutionSoftware Maintenance and Evolution
Software Maintenance and Evolutionkim.mens
 
Ch13-Software Engineering 9
Ch13-Software Engineering 9Ch13-Software Engineering 9
Ch13-Software Engineering 9Ian Sommerville
 
Ian Sommerville, Software Engineering, 9th Edition Ch 4
Ian Sommerville,  Software Engineering, 9th Edition Ch 4Ian Sommerville,  Software Engineering, 9th Edition Ch 4
Ian Sommerville, Software Engineering, 9th Edition Ch 4Mohammed Romi
 
Software Evolution and Maintenance Models
Software Evolution and Maintenance ModelsSoftware Evolution and Maintenance Models
Software Evolution and Maintenance ModelsMoutasm Tamimi
 
Software archiecture lecture05
Software archiecture   lecture05Software archiecture   lecture05
Software archiecture lecture05Luktalja
 
Essential Test Management and Planning
Essential Test Management and PlanningEssential Test Management and Planning
Essential Test Management and PlanningTechWell
 
10. Software testing overview
10. Software testing overview10. Software testing overview
10. Software testing overviewghayour abbas
 
Software maintenance
Software maintenanceSoftware maintenance
Software maintenanceAnsh Kapoor
 
Software evolution and maintenance basic concepts and preliminaries
Software evolution and maintenance   basic concepts and preliminariesSoftware evolution and maintenance   basic concepts and preliminaries
Software evolution and maintenance basic concepts and preliminariesMoutasm Tamimi
 
Ian Sommerville, Software Engineering, 9th Edition Ch2
Ian Sommerville,  Software Engineering, 9th Edition Ch2Ian Sommerville,  Software Engineering, 9th Edition Ch2
Ian Sommerville, Software Engineering, 9th Edition Ch2Mohammed Romi
 
Software engineering Questions and Answers
Software engineering Questions and AnswersSoftware engineering Questions and Answers
Software engineering Questions and AnswersBala Ganesh
 
SE18_SE_Lec 12_ Project Management 1
SE18_SE_Lec 12_ Project Management 1SE18_SE_Lec 12_ Project Management 1
SE18_SE_Lec 12_ Project Management 1Amr E. Mohamed
 
Sofware Engineering Important Past Paper 2019
Sofware Engineering Important Past Paper 2019Sofware Engineering Important Past Paper 2019
Sofware Engineering Important Past Paper 2019MuhammadTalha436
 

Was ist angesagt? (20)

Software Maintenance and Evolution
Software Maintenance and EvolutionSoftware Maintenance and Evolution
Software Maintenance and Evolution
 
Ch13-Software Engineering 9
Ch13-Software Engineering 9Ch13-Software Engineering 9
Ch13-Software Engineering 9
 
Ch3. agile sw dev
Ch3. agile sw devCh3. agile sw dev
Ch3. agile sw dev
 
Ian Sommerville, Software Engineering, 9th Edition Ch 4
Ian Sommerville,  Software Engineering, 9th Edition Ch 4Ian Sommerville,  Software Engineering, 9th Edition Ch 4
Ian Sommerville, Software Engineering, 9th Edition Ch 4
 
Software Evolution and Maintenance Models
Software Evolution and Maintenance ModelsSoftware Evolution and Maintenance Models
Software Evolution and Maintenance Models
 
Ch21 real time software engineering
Ch21 real time software engineeringCh21 real time software engineering
Ch21 real time software engineering
 
Software archiecture lecture05
Software archiecture   lecture05Software archiecture   lecture05
Software archiecture lecture05
 
Essential Test Management and Planning
Essential Test Management and PlanningEssential Test Management and Planning
Essential Test Management and Planning
 
Software Maintenance
Software MaintenanceSoftware Maintenance
Software Maintenance
 
10. Software testing overview
10. Software testing overview10. Software testing overview
10. Software testing overview
 
Software Evolution
Software EvolutionSoftware Evolution
Software Evolution
 
Software maintenance
Software maintenanceSoftware maintenance
Software maintenance
 
Software evolution and maintenance basic concepts and preliminaries
Software evolution and maintenance   basic concepts and preliminariesSoftware evolution and maintenance   basic concepts and preliminaries
Software evolution and maintenance basic concepts and preliminaries
 
A2
A2A2
A2
 
Ian Sommerville, Software Engineering, 9th Edition Ch2
Ian Sommerville,  Software Engineering, 9th Edition Ch2Ian Sommerville,  Software Engineering, 9th Edition Ch2
Ian Sommerville, Software Engineering, 9th Edition Ch2
 
Software engineering Questions and Answers
Software engineering Questions and AnswersSoftware engineering Questions and Answers
Software engineering Questions and Answers
 
SE18_SE_Lec 12_ Project Management 1
SE18_SE_Lec 12_ Project Management 1SE18_SE_Lec 12_ Project Management 1
SE18_SE_Lec 12_ Project Management 1
 
Ch11 reliability engineering
Ch11 reliability engineeringCh11 reliability engineering
Ch11 reliability engineering
 
Sofware Engineering Important Past Paper 2019
Sofware Engineering Important Past Paper 2019Sofware Engineering Important Past Paper 2019
Sofware Engineering Important Past Paper 2019
 
Ch2 sw processes
Ch2 sw processesCh2 sw processes
Ch2 sw processes
 

Andere mochten auch

Beyond MOOCs ctd. (2015)
Beyond MOOCs ctd. (2015)Beyond MOOCs ctd. (2015)
Beyond MOOCs ctd. (2015)Frank Gielen
 
Sa 004 quality_attributes
Sa 004 quality_attributesSa 004 quality_attributes
Sa 004 quality_attributesFrank Gielen
 
Sa 005 performance
Sa 005 performanceSa 005 performance
Sa 005 performanceFrank Gielen
 
You have been MOOCed
You have been MOOCedYou have been MOOCed
You have been MOOCedFrank Gielen
 
Delaware presentation nov2012
Delaware presentation nov2012Delaware presentation nov2012
Delaware presentation nov2012Frank Gielen
 
VC Do's and Don'ts - Jurgen Ingels
VC Do's and Don'ts  - Jurgen Ingels VC Do's and Don'ts  - Jurgen Ingels
VC Do's and Don'ts - Jurgen Ingels Frank Gielen
 
The Phonegap Architecture
The Phonegap ArchitectureThe Phonegap Architecture
The Phonegap ArchitectureFrank Gielen
 
Sa 008 architecture_views
Sa 008 architecture_viewsSa 008 architecture_views
Sa 008 architecture_viewsFrank Gielen
 
Pr 005 qa_workshop
Pr 005 qa_workshopPr 005 qa_workshop
Pr 005 qa_workshopFrank Gielen
 
The Research Canvas
The Research CanvasThe Research Canvas
The Research CanvasFrank Gielen
 
Long acting hormonal contraceptives
Long acting hormonal contraceptivesLong acting hormonal contraceptives
Long acting hormonal contraceptivesAkiseku Adeniyi
 

Andere mochten auch (19)

Vision workshop
Vision workshopVision workshop
Vision workshop
 
Beyond MOOCs ctd. (2015)
Beyond MOOCs ctd. (2015)Beyond MOOCs ctd. (2015)
Beyond MOOCs ctd. (2015)
 
Sa 004 quality_attributes
Sa 004 quality_attributesSa 004 quality_attributes
Sa 004 quality_attributes
 
Ws002 use cases
Ws002 use casesWs002 use cases
Ws002 use cases
 
Ws01 sota 2
Ws01 sota 2Ws01 sota 2
Ws01 sota 2
 
Figure1
Figure1Figure1
Figure1
 
Sa 005 performance
Sa 005 performanceSa 005 performance
Sa 005 performance
 
Pr crc
Pr crcPr crc
Pr crc
 
Sop test planning
Sop test planningSop test planning
Sop test planning
 
You have been MOOCed
You have been MOOCedYou have been MOOCed
You have been MOOCed
 
Delaware presentation nov2012
Delaware presentation nov2012Delaware presentation nov2012
Delaware presentation nov2012
 
Ds 001 nabc
Ds 001 nabcDs 001 nabc
Ds 001 nabc
 
VC Do's and Don'ts - Jurgen Ingels
VC Do's and Don'ts  - Jurgen Ingels VC Do's and Don'ts  - Jurgen Ingels
VC Do's and Don'ts - Jurgen Ingels
 
Sa 009 add
Sa 009 addSa 009 add
Sa 009 add
 
The Phonegap Architecture
The Phonegap ArchitectureThe Phonegap Architecture
The Phonegap Architecture
 
Sa 008 architecture_views
Sa 008 architecture_viewsSa 008 architecture_views
Sa 008 architecture_views
 
Pr 005 qa_workshop
Pr 005 qa_workshopPr 005 qa_workshop
Pr 005 qa_workshop
 
The Research Canvas
The Research CanvasThe Research Canvas
The Research Canvas
 
Long acting hormonal contraceptives
Long acting hormonal contraceptivesLong acting hormonal contraceptives
Long acting hormonal contraceptives
 

Ähnlich wie Sa 007 availability

Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10Codenomicon
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approachDerek Chang
 
5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded SystemsAngela Hauber
 
5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded SystemsMEN Mikro Elektronik GmbH
 
5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded SystemsMEN Micro
 
An Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud ComputingAn Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud Computingijtsrd
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET Journal
 
Proposed Algorithm for Surveillance Applications
Proposed Algorithm for Surveillance ApplicationsProposed Algorithm for Surveillance Applications
Proposed Algorithm for Surveillance ApplicationsEditor IJCATR
 
Intelligent Devices Delivering The Promise
Intelligent Devices Delivering The PromiseIntelligent Devices Delivering The Promise
Intelligent Devices Delivering The PromisePeter Ashley
 
Achieving observability-in-modern-applications
Achieving observability-in-modern-applicationsAchieving observability-in-modern-applications
Achieving observability-in-modern-applicationsJulio Antúnez Tarín
 
Debs 2011 tutorial on non functional properties of event processing
Debs 2011 tutorial  on non functional properties of event processingDebs 2011 tutorial  on non functional properties of event processing
Debs 2011 tutorial on non functional properties of event processingOpher Etzion
 

Ähnlich wie Sa 007 availability (20)

Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
 
Tolerance
ToleranceTolerance
Tolerance
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approach
 
Module3 part1
Module3 part1Module3 part1
Module3 part1
 
111 118
111 118111 118
111 118
 
111 118
111 118111 118
111 118
 
5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems
 
5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems
 
5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems5 Techniques to Achieve Functional Safety for Embedded Systems
5 Techniques to Achieve Functional Safety for Embedded Systems
 
Ecbs2000
Ecbs2000Ecbs2000
Ecbs2000
 
An Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud ComputingAn Investigation of Fault Tolerance Techniques in Cloud Computing
An Investigation of Fault Tolerance Techniques in Cloud Computing
 
The key to improving your availability is fracas
The key to improving your availability is fracasThe key to improving your availability is fracas
The key to improving your availability is fracas
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
 
Proposed Algorithm for Surveillance Applications
Proposed Algorithm for Surveillance ApplicationsProposed Algorithm for Surveillance Applications
Proposed Algorithm for Surveillance Applications
 
Intelligent Devices Delivering The Promise
Intelligent Devices Delivering The PromiseIntelligent Devices Delivering The Promise
Intelligent Devices Delivering The Promise
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Achieving observability-in-modern-applications
Achieving observability-in-modern-applicationsAchieving observability-in-modern-applications
Achieving observability-in-modern-applications
 
Debs 2011 tutorial on non functional properties of event processing
Debs 2011 tutorial  on non functional properties of event processingDebs 2011 tutorial  on non functional properties of event processing
Debs 2011 tutorial on non functional properties of event processing
 

Mehr von Frank Gielen

I mindsx4howest v2
I mindsx4howest v2I mindsx4howest v2
I mindsx4howest v2Frank Gielen
 
I mindsx learning analytics v2
I mindsx learning analytics v2I mindsx learning analytics v2
I mindsx learning analytics v2Frank Gielen
 
Beyond MOOCs (2014)
Beyond MOOCs (2014)Beyond MOOCs (2014)
Beyond MOOCs (2014)Frank Gielen
 
Defining the opportunity 2013
Defining the opportunity 2013Defining the opportunity 2013
Defining the opportunity 2013Frank Gielen
 
KPMG Legal and Tax September 2013
KPMG Legal and Tax September 2013KPMG Legal and Tax September 2013
KPMG Legal and Tax September 2013Frank Gielen
 
Dare 2 Start - Course outline
Dare 2 Start - Course outlineDare 2 Start - Course outline
Dare 2 Start - Course outlineFrank Gielen
 
Debt & Equity - Wouter Haerick
Debt & Equity - Wouter HaerickDebt & Equity - Wouter Haerick
Debt & Equity - Wouter HaerickFrank Gielen
 

Mehr von Frank Gielen (10)

I mindsx4howest v2
I mindsx4howest v2I mindsx4howest v2
I mindsx4howest v2
 
I mindsx learning analytics v2
I mindsx learning analytics v2I mindsx learning analytics v2
I mindsx learning analytics v2
 
Beyond MOOCs (2014)
Beyond MOOCs (2014)Beyond MOOCs (2014)
Beyond MOOCs (2014)
 
Defining the opportunity 2013
Defining the opportunity 2013Defining the opportunity 2013
Defining the opportunity 2013
 
KPMG Legal and Tax September 2013
KPMG Legal and Tax September 2013KPMG Legal and Tax September 2013
KPMG Legal and Tax September 2013
 
Dare 2 Start - Course outline
Dare 2 Start - Course outlineDare 2 Start - Course outline
Dare 2 Start - Course outline
 
Sa 008 patterns
Sa 008 patternsSa 008 patterns
Sa 008 patterns
 
Debt & Equity - Wouter Haerick
Debt & Equity - Wouter HaerickDebt & Equity - Wouter Haerick
Debt & Equity - Wouter Haerick
 
Sa 003 mvp
Sa 003 mvpSa 003 mvp
Sa 003 mvp
 
Sa002 abc
Sa002 abcSa002 abc
Sa002 abc
 

Sa 007 availability

  • 1. Software Architecture Quality Attributes & Tactics (4) Availability Vakgroep Informatietechnologie – IBCN
  • 2. Availability Availability is about system failure and its consequences. Faults & Failures :  Faults become failures if not corrected or masked.  A failure is observable by the system user; a fault not. Areas of concern:  Fault detection and frequency  Reduced operations  Recovery and Prevention Availability = MTBF MTBF + MTTR Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 2
  • 3. Availability Generic Scenario Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 3
  • 4. Availability generic scenario (1/4) Source of stimulus: ……….. who or what ?  We differentiate between internal and external indications of faults or failure since the desired system response may be different. Stimulus: …………………does something ? A fault of one of the following classes occurs.  Omission. A component fails to respond to an input.  Crash. The component repeatedly suffers omission faults.  Timing. A component responds but the response is early or late.  Bad response. A component responds with an incorrect value. Artifact: …………. to the system or part of it ? This specifies the resource that is required to be highly available  Processor,  Communication channel,  Process,  Storage. Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 4
  • 5. Availability generic scenario (2/4) Environment: …….under certain conditions The state of the system affects the desired system response.  Normal mode: if this is the first fault observed, some degradation of response time or function may be preferred  Degraded mode: if the system has already seen some faults it may be desirable to shut it down totally.  Overload mode: Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 5
  • 6. Availability generic scenario (3/4) Response: ………how the system reacts ? The System should detect the event & :  Record it  Notify appropriate parties, including the user and other systems  Disable sources of events that cause fault or failure according to defined rules  be unavailable for a specified interval, where interval depends on criticality of system  Continue to operate in normal or degraded mode Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 6
  • 7. Availability generic scenario (4/4) Response Measure…how can you measure this ?  Time interval when the system must be available  Availability time  Time interval in which system can be in degraded mode  Repair time Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 7
  • 8. Availability Specific Scenario “An unanticipated external message (DOS attack) is received by a process during normal operation. The process logs the receipt of the message, notifies the operator and continues with no downtime” Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 8
  • 9. Case: Digital Signage – Public Transport Availability QAS : SOURCE who or what A random event STIMULUS does something ... causes a failure ARTIFACT to the system or part of it ... to the communication system ENVIRONMENT under certain conditions ...during normal operations RESPONSE how the system reacts All displays must start showing scheduled arrival times for all buses MEASURE how you can measure this ... Within 30 seconds of failure detection Q: What is the architectural impact of this requirement ? Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 9
  • 10. Availability Tactics Tactics to Control Fault Masked Fault Availability or Repaired  Fault Detection  Echo  Heartbeat  Exceptions  Fault Recovery  Preparing for recovery  Accomplishing the recovery  Fault Prevention Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 10
  • 11. Fault Recovery Tactics (1/4)  Voting Tactic:  Processes running on redundant processors each take the input, compute and report the results to the “vote-counter.”  Majority rules  Preferred Component  Preferred component:  This corrects faulty operation of components, algorithms or processors.  The more severe the consequences of failures the more stringent the effort to ensure that the redundancy is independent. – Separate processors, separate implementation teams, … dissimilar platforms Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 11
  • 12. Fault Recovery Tactics (2/4)  Active redundancy (hot restart):  All redundant components respond to events in parallel  Redundant components synchronized at start then first to return is the answer.  This covers some faults. A faulty processor will be slower to respond.  When a failure occurs the downtime is usually only milliseconds (switching to another component).  Often used in client-server applications involving back- end databases.  In high availability for LANs the redundancy may be separate paths so that failure of a bridge or router is not fatal. Note the synchronization demands here. Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 12
  • 13. Fault Recovery Tactics (3/4)  Passive Redundancy:  One component responds to events and informs the standbys of state updates.  Upon failure the system must:  Ensure that the backup is sufficiently fresh.  Restart points, checkpoints, log points ???  Remap the system to switch which system is the active component.  Often used in control systems  Example : Air traffic Control  Chapter 6: Air Traffic Control: A Case Study in Designing for High Availability Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 13
  • 14. Fault Recovery Tactics (4/4)  Switchovers  Upon failure or Periodic  Synchronization:  is the responsibility of the primary component, broadcasting synchronization signals to the redundant components. Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 14
  • 15. Fault Prevention Tactics  Removal from service  To perform some preventive actions, e.g., rebooting to prevent slow memory leaks from causing problems  Transactions  the bundling of a sequence of steps so that they can be done all at once  Process monitor  Once a fault in a process is detected;  remove–reinstantiate-reinitialize state Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 15
  • 16. Availability Tactics Hierarchy Availability Fault detection Recovery Recovery Prevention Preparation Reintroduction Fault and repair Fault Arrives Masked or Repaired Ping/echo Voting Heartbeat Shadow Active red. State resync. Removal from Exception Passive red. Rollback Spare Service Transactions Process Monitor Vakgroep Informatietechnologie – Onderzoeksgroep IBCN p. 16

Hinweis der Redaktion

  1. Issues with Ping/Ech/Heartbeat: Measure “are you alive”. >Functionality simple: 1) Response time under high load ? 2) Capacity of the ping server 3) Availability of the communication channel Complexity: - Tradeoff with performance : - periodic - datacontent