Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Dependable Systems
!
Dependability Attributes
Dr. Peter Tröger

!
Sources: 

!
J.C. Laprie. Dependability: Basic Concepts ...
Dependable Systems Course PT 2014
Dependability
• Umbrella term for operational requirements on a system

• IFIP WG 10.4: ...
Dependable Systems Course PT 2014
Dependability Tree (Laprie)
3
Dependable Systems Course PT 2014
Attributes of Dependability
• Non-functional attributes such as reliability and maintain...
Dependable Systems Course PT 2014
Attributes of Dependability
• Reliability (,Zuverlässigkeit‘) - Continuity of service

•...
Dependable Systems Course PT 2014
Observations on Dependability Attributes
• Availability is always required

• Reliabilit...
Dependable Systems Course PT 2014
Attributes of Dependability
• Safety - Avoidance of catastrophic consequences on the env...
Dependable Systems Course PT 2014
In Detail
• Reliability - Function R(t) 

• Probability that a system is functioning pro...
Dependable Systems Course PT 2014
Probability of Events
9
(C) mathforum.org
Dependable Systems Course PT 2014
PDF & CDF
• Probability density function pdf for random variable X

• Discrete random va...
Dependable Systems Course PT 2014
PDF Examples
• Well-known statistical distributions, each describing a random variable b...
Dependable Systems Course PT 2014
The Reliability Function R(t)
• Reliability: Probability R(t) that a component 

works f...
Dependable Systems Course PT 2014
Failure Rate
!
• Time to failure is not always measured in calendar time, might be discr...
Dependable Systems Course PT 2014
Why Exponential ?
• Distribution function that models the memoryless property of the Poi...
• Failures occur continuously and 

independently at a constant 

average rate (Poisson process)

• Increasing probability...
Dependable Systems Course PT 2014
Variable Failure Rate in Real World
16
Burn in Use Wear out Integration
& Test
Use Obsol...
Dependable Systems Course PT 2014
Weibull Distribution
• Most widely used life distribution beside exponential

• Named af...
Dependable Systems Course PT 2014
Hardware Failure Rate
18
DS – SR&C - 6
SW Failure Rates – Industrial Practice
Dependable Systems Course PT 2014
Software Failure Rate
• Industrial ...
Dependable Systems Course PT 2014
Failure Rate Examples
• Standards from experience provide base data for component reliab...
Dependable Systems Course PT 2014
Life-Stress Relationship
• Formulate a model that includes the life
distribution and how...
Dependable Systems Course PT 2014
Other Reliability Distributions
• (Mixed) Weibull distribution

• Normal distribution

•...
• Mean time to failure (MTTF) - 

Average time it takes to fail

-> average uptime

• Mean time to recover / repair (MTTR)...
• Expressing availability with MTTF/MTBF describes a repairable systems behavior
under infinite time assumption

• MTTF and...
R(100hours) = e 100
= 0.96
= ln0.96
100 = 0, 000408
MTTF = 1
= 2449, 66hours
Dependable Systems Course PT 2014
Example
• T...
Dependable Systems Course PT 2014
MTBF / MTTF in Practice
• Often express average failure behavior (statistics) for a comp...
Dependable Systems Course PT 2014
Steady-State Availability
27
Availability Downtime per year Downtime per week
90.0 % (1 ...
Dependable Systems Course PT 2014
Amazon EC2 SLA (2012)
28
Dependable Systems Course PT 2014
Operational Availability Calculation [Misra]
• Uptime elements: Standby time, operating ...
Dependable Systems Course PT 2014
Example: Item-Level Sparing Analysis [Misra]
• Sparing analysis challenges

• How many s...
Dependable Systems Course PT 2014
MTTR Examples
• Hardware MTTR with spares onsite

• Operator available - 30min

• Operat...
Dependable Systems Course PT 2014
MTTR << MTTF [Fox]
• Armando Fox on ,Recovery-Oriented Computing‘

• A = MTTF / (MTTF + ...
Dependable Systems Course PT 2014
MTTR << MTTF [Fox]
• Proposal: Utility curve for recovery time

• Factors: Length of rec...
Dependable Systems Course PT 2014
Availability
34
Nächste SlideShare
Wird geladen in …5
×

Dependable Systems -Dependability Attributes (5/16)

1.019 Aufrufe

Veröffentlicht am

This unit of the Dependable Systems course covers the typical ways to quantify and assess dependability as attributes, such as reliability and safety.

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Dependable Systems -Dependability Attributes (5/16)

  1. 1. Dependable Systems ! Dependability Attributes Dr. Peter Tröger ! Sources: ! J.C. Laprie. Dependability: Basic Concepts and Terminology Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008 Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990. Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452 !
  2. 2. Dependable Systems Course PT 2014 Dependability • Umbrella term for operational requirements on a system • IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]" • IEC IEV: "dependability (is) the collective term used to describe the availability performance and its influencing factors : reliability performance, maintainability performance and maintenance support performance" • Laprie: „ Trustworthiness of a computer system such that 
 reliance can be placed on the service it delivers to the user “ • Adds a third dimension to system quality • General question: How to deal with unexpected events ? • In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘
 2
  3. 3. Dependable Systems Course PT 2014 Dependability Tree (Laprie) 3
  4. 4. Dependable Systems Course PT 2014 Attributes of Dependability • Non-functional attributes such as reliability and maintainability • Complementary nature of viewpoints in the area of dependability • In comparison to functional properties • ... hard to define • ... hard to abstract • ... ,Divide and conquer‘ does not work as good • ... difficult interrelationships • ... often probabilistic dependencies 4
  5. 5. Dependable Systems Course PT 2014 Attributes of Dependability • Reliability (,Zuverlässigkeit‘) - Continuity of service • Initial goal for computer system trustworthiness • Other disciplines have different understanding • „Reliability is not doing the wrong thing.“ [Gray85] • „Reliability: Ability of a system or component to perform its required functions under stated conditions for a specified period of time“ [IEEE] • „Reliability is the probability that an item will not fail.“ [Misra] • Availability (,Verfügbarkeit‘) - Readiness for usage • „Probability that a system is able to deliver correctly its service at any given time.“ [Goloubeva] • „Maintainability is the probability that the item can be successfully restored to operation after failure; and availability ... is a function of reliability and maintainability .“ [Misra]
 5
  6. 6. Dependable Systems Course PT 2014 Observations on Dependability Attributes • Availability is always required • Reliability, safety, and security may be optional • Reliability might be analyzed for hardware / software components • Availability is always from the system view point 6
  7. 7. Dependable Systems Course PT 2014 Attributes of Dependability • Safety - Avoidance of catastrophic consequences on the environment • Critical applications • Specification needs to describe things that should not happen • Security - Prevention of unauthorized access and / or information handling • Became especially relevant with distributed systems • Confidentiality - Absence of unauthorized disclosure of information • Integrity - Absence of improper system alteration • With respect to either accidental or intentional faults • Maintainability - Ability to undergo modifications 
 and repairs 7
  8. 8. Dependable Systems Course PT 2014 In Detail • Reliability - Function R(t) • Probability that a system is functioning properly and constantly over time period t • Assumes that system was fully operational at t=0 • Denotes failure-free interval of operation • Availability - Statement if a system is operational at a point in time / fraction of time • Describe system behavior in presence of error treatment mechanisms • Instantaneous availability (at t) - Probability that a system is performing correctly at time t; equal to reliability for non-repairable systems: Ai(t) = R(t) • Steady-state availability - Probability that a system will be operational at any random point of time, expressed as the fraction of time a system is operational during its expected lifetime: As = Uptime / Lifetime 8
  9. 9. Dependable Systems Course PT 2014 Probability of Events 9 (C) mathforum.org
  10. 10. Dependable Systems Course PT 2014 PDF & CDF • Probability density function pdf for random variable X • Discrete random variable: Probability that X will be x • Continuous variable: Probability that X is in [a, b] • Computed as area under the density function in 
 this range
 
 • Cumulative distribution function cdf(x): Probability that 
 the value of the random variable is at most x
 
 • Limits of integration depend on the nature of the distribution function • Value of cdf at x is always area under pdf from 0 to x 10 (C) weibull.com
  11. 11. Dependable Systems Course PT 2014 PDF Examples • Well-known statistical distributions, each describing a random variable behavior • Continuous version described by PDF (discrete pendant would be histogram) 11 Normal distribution
 (mean, variance) Exponential distribution
 (rate parameter) Probability density
 function Cumulative distribution
 function
  12. 12. Dependable Systems Course PT 2014 The Reliability Function R(t) • Reliability: Probability R(t) that a component 
 works for time period [0,t] • Idea: Express time period of correct operation
 as continuos random variable X 
 -> time to failure • cdf(t) of this variable: Describes probability of 
 failure before t -> Unreliability Function F(t) • 1-cdf(t): Describes probability of a 
 failure after t -> time to failure -> Reliability Function R(t) • This works since (A) working / non-working is a binary decision, (B) the area under the complete pdf is 1 and (C) the ,red‘ area is the result of the cdf function • Typically, failures are modeled as Poisson process • Poisson properties lead to exponential distribution for the time between events • This time therefore only depends on failure rate parameter 12 (C) weibull.com
  13. 13. Dependable Systems Course PT 2014 Failure Rate ! • Time to failure is not always measured in calendar time, might be discrete • Number of kilometers driven with a car • Number of rotations / cycles for a mechanical component • Treat pdf for time-to-failure random variable X as failure density function • Can be computed as derivative of the unreliability function
 
 • Failure rate / hazard rate function - mean frequency of failures at time t • Conditional probability of a failure between a and b, given the survival until t 13 f(t) = dF(t)/dt (t) = f(t) R(t) = for constant failure rate
  14. 14. Dependable Systems Course PT 2014 Why Exponential ? • Distribution function that models the memoryless property of the Poisson process • P(T > t + s|T > t) = P(T > s), e.g. PFailure(5 years|T > 2 years) = PFailure(3 years) • Failure is not the result of wear-out • Models ,intrinsic failure‘ behavior, assumed for the majority of hardware life time • Weibull distribution as alternative, can also model tear-in and wear-out • Some natural phenomena have constant failure rate (e.g. cosmic ray particles) 14
  15. 15. • Failures occur continuously and 
 independently at a constant 
 average rate (Poisson process) • Increasing probability of failure 
 with increasing t - cdt function • Failure rate from experience
 or complexity measures • Cumulative distribution function:
 ! • Reliability function (survival probability) for exponential failure distribution: Dependable Systems Course PT 2014 The Reliability Function R(t) 15 R(t) = P(X > t) = 1 F(t) = e x with F(x) = 1 e x CDF:Probabilityofa failurebeforet
  16. 16. Dependable Systems Course PT 2014 Variable Failure Rate in Real World 16 Burn in Use Wear out Integration & Test Use Obsolete Hardware Software • Failure rate is treated as constant parameter of the exponential distribution • (maybe invalid) simplification, mostly combined solution in practice: • Exponential distribution when failure rate is constant • Weibull distribution when failure rate is monotonic decreasing / increasing
  17. 17. Dependable Systems Course PT 2014 Weibull Distribution • Most widely used life distribution beside exponential • Named after Swedish professor Waloddi Weibull (1887 - 1979) • Originally invented for modeling material strength • Very flexible through parametrization, can model many other probability distribution types • k: Shape parameter • k=1: constant hazard rate, like exponential distribution • k<1: Hazard rate decreases over time (,infant mortality‘) • k>1: Hazard rate increases with time (,aging‘), 
 like with (log)normal distribution • : Scale parameter 17 pdf cdf
  18. 18. Dependable Systems Course PT 2014 Hardware Failure Rate 18
  19. 19. DS – SR&C - 6 SW Failure Rates – Industrial Practice Dependable Systems Course PT 2014 Software Failure Rate • Industrial practice • When do you stop testing ? - No more time, or no more money ... 19 (C)Malek
  20. 20. Dependable Systems Course PT 2014 Failure Rate Examples • Standards from experience provide base data for component reliability • Society of Automotive Engineers (SAE) reliability model
 
 
 • Predicted failure rate • Base failure rate for the component • Various modification factors • Component composition • Ambient temperature • Location in the vehicle 20 p = b b i=1⇥i p b i
  21. 21. Dependable Systems Course PT 2014 Life-Stress Relationship • Formulate a model that includes the life distribution and how outside factors (such as stress) change this distribution • Example - load sharing redundancy: Component reliability depends indirectly on the number of previousely failed components • Single component failure distribution is no longer sufficient for describing reliability • Model must describe the effect of load and the failure probability • Typical approach: Define load- dependent parameter of the distribution function 21 (C) weibull.com
  22. 22. Dependable Systems Course PT 2014 Other Reliability Distributions • (Mixed) Weibull distribution • Normal distribution • Lognormal distribution • (Generalized) Gamma distribution • Logistic distribution • Loglogistic distribution 22
  23. 23. • Mean time to failure (MTTF) - 
 Average time it takes to fail
 -> average uptime • Mean time to recover / repair (MTTR) - Average time it takes to recover • Mean time between failures (MTBF) - Average time between two successive failures • Availability = Uptime / Lifetime
 = MTTF / MTBF Dependable Systems Course PT 2014 Steady-State Availability 23 up down up MTBF down up up C1 up C3 up C2 MTTF
  24. 24. • Expressing availability with MTTF/MTBF describes a repairable systems behavior under infinite time assumption • MTTF and MTTR get stable over longer time periods • Fulfils also the steady-state condition • Expressing dependability with MTTF ,should‘ imply a non-repairable system, 
 expressing dependability with MTBF ,should‘ imply a repairable system • Sometimes MTBF means mean time BEFORE failure = MTTF 
 -> typical source of confusion • Exponential distribution: 
 Reciprocal of the rate parameter is equivalent to the distribution mean
 (Example: With 4 events per hour, you can expect one roughly every 15 minutes). Dependable Systems Course PT 2014 Steady-State Availability and MTBF 24 = 1 MT T F
  25. 25. R(100hours) = e 100 = 0.96 = ln0.96 100 = 0, 000408 MTTF = 1 = 2449, 66hours Dependable Systems Course PT 2014 Example • Test population with 50 HDDs and 100 hours of testing, 2 drives fail during the test • As usual, we assume exponential distribution of the time to failure • Reliability at t=100 is known to be 96% (48/50) • Reciprocal of the according failure rate is the MTTF 25 R(t) = P(X > t) = 1 F(t) = e x with F(x) = 1 e x
  26. 26. Dependable Systems Course PT 2014 MTBF / MTTF in Practice • Often express average failure behavior (statistics) for a component population • Good for relative comparison, not for expected life time expectation of one unit • Example: Hard disk with MTTF of 500.000 hours and 5 years of expected operation 
 (,service life‘) • Drive of this type is expected to run 5 years without problems • Large group of such drives will (on average) have one failed drive after 500.000 hours of accumulated life time • MTBF in practice is a weak approximation - sum of up-phases / number of failures • Assumes renewable system with homogeneous failure distribution over lifetime 26
  27. 27. Dependable Systems Course PT 2014 Steady-State Availability 27 Availability Downtime per year Downtime per week 90.0 % (1 nine) 36.5 days 16.8 hours 99.0 % (2 nines) 3.65 days 1.68 hours 99.9 % (3 nines) 8.76 hours 10.1 min 99.99 % (4 nines) 52.6 min 1.01 min 99.999 % (5 nines) 5.26 min 6.05 s 99.9999 % (6 nines) 31.5 s 0.605 s 99.99999 % (7 nines) 0.3 s 6 ms A = Uptime Uptime+Downtime = MT T F MT T F +MT T R
  28. 28. Dependable Systems Course PT 2014 Amazon EC2 SLA (2012) 28
  29. 29. Dependable Systems Course PT 2014 Operational Availability Calculation [Misra] • Uptime elements: Standby time, operating time • Downtime elements • Logistic: Spares availability, spares location, transportation of spares • Preventive maintenance: Inspection, servicing • Administrative delay • Finding personnel, reviewing manuals, complying with supply procedures, locating tools, setting up test equipment • Corrective maintenance • Preparation time, fault location diagnosis, getting parts, correcting faults, testing 29
  30. 30. Dependable Systems Course PT 2014 Example: Item-Level Sparing Analysis [Misra] • Sparing analysis challenges • How many spares do you need to keep the system available at the desired rate ? • When are you going to need to spares (manufacturing time) ? • Where the spares should be kept ? • What system level you want to spare at ? 30
  31. 31. Dependable Systems Course PT 2014 MTTR Examples • Hardware MTTR with spares onsite • Operator available - 30min • Operator on call - 2 hours • Operator available during working hours - 14h • Without spares - at least 24h • SW MTTR with watchdog • Reboot from ROM - 30s • Reboot from disk - 3 min • Reboot from network - 10 min 31
  32. 32. Dependable Systems Course PT 2014 MTTR << MTTF [Fox] • Armando Fox on ,Recovery-Oriented Computing‘ • A = MTTF / (MTTF + MTTR) • 10x decrease of MTTR as good as 10x increase of MTTF ? • MTTF‘s are not claimable, but MTTR claims are verifiable • Proving MTTF numbers demands system-years of observation and experience • Lowering MTTR directly improves user experience of one specific outage, 
 since MTTF is typically longer than one user session • HCI factor of failed system • Miller, 1968: >1sec “sluggish”, >10sec “distracted” (user moves away) • 2001 Web user study: ~5sec „acceptable”, ~10sec „excessively slow“ 32
  33. 33. Dependable Systems Course PT 2014 MTTR << MTTF [Fox] • Proposal: Utility curve for recovery time • Factors: Length of recovery time, level of service availability during error state • Key distinction between interactive (session-based) and non-interactive systems • If error state leads to some steady-state latency • For how long will users tolerate temporary degradation ? • How much degradation is acceptable ? • Do they show a preference for increased latency vs. worse QOS vs. being turned away and incentivized to return? • Long recovery times are often reasoned by stateful components • Utilize alternative architecture concepts 33
  34. 34. Dependable Systems Course PT 2014 Availability 34

×