An extended notation of FTA for risk assessment of software-intensive medical devices
Yoshio Sakai, Seiko Shirasaka and Yasuharu Nishi
It is difficult to assess the risk of software-intensive medical devices. An extended notation of FTA recognizes the risk class before and after the risk control measure and the software in the system affects the top event of FTA.
You can see this content as 6-pages paper from IEEE Website.
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices.
1. An Extended Notation of FTA for Risk Assessment of
Software-intensive Medical Devices.
- Recognition of The Risk Class Before and After The Risk Control Measure -
Yoshio SAKAI
Engineering Promotion Center, NIHON KOHDEN CORPORATION
Seiko SHIRASAKA The Graduate School of System Design and Management, KEIO University
Yasuharu NISHI
Department of Systems Engineering, The University of Electro-Communications
2. Flow of the Presentation
Lack of consideration of the Software Failure Intensive-Software
2. Risk Assessment
Method in ISO 14971
Sequence of Events
1. Traditional FTA
3. An Extended
Notation of FTA
Hazard
Exposure (P1)
Hazardous
Situation
P2
Harm
Severity of
the Harm
Probability of
Occurrence
of Harm
Risk
P1 × P2
OLD
1.
2.
3.
OLD
NEW
Explanation of the traditional FTA which lack consideration of the software.
Explanation of the risk assessment method in ISO 14971 which lack consideration of
the software.
Explanation of solutions using an extended notation of FTA.
Yoshio_Sakai@mb2.nkc.co.jp
2
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
3. The History of FTA (Fault Tree Analysis)
NOW
1965
1962
The FTA is used widely.
As for the FTA, completeness was raised by
BOEING.
Fault Tree Analysis (FTA) was originally developed for
Minuteman Missile in 1962 at Bell Laboratories by H.A. Watson.
At that time, FTA was designed because the electronic system
was not able to endure vibration and caused it to break down.
The cause of the trouble was the hardware failure,
not software.
Yoshio_Sakai@mb2.nkc.co.jp
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
4. The traditional FTA which lacks consideration of
the software.
• When FTA was developed, the failure caused by the software was not
an element of the failures of FTA.
• The traditional FTA is not comprehensible about
– The effectiveness before and after the risk control measure.
– The software in the system and the risk control measure affects the top event.
• The calculation of the failure rate on FTA can not use for the failure
caused by the software.
○
×
Yoshio_Sakai@mb2.nkc.co.jp
HARDWARE
SOFTWARE
4
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
5. The Traditional Risk Assessment Method
The example is the boiled water with an electric kettle.
1. The hot water as the thermal energy
2. A cover opens and spills hot water
3. Getting burned
Fig. 3. ISO 14971
P1 is the probability of a hazardous situation occurring.
P2 is the probability of a hazardous situation leading to harm.
Yoshio_Sakai@mb2.nkc.co.jp
5
Software ?
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
6. The Estimation of the probability of a hazardous situation
HARDWARE
USABILITY
Failure Rate of Random Hardware Failure
The likelihood of the usability failure
HIGH
Frequent
Probable
Occasional
Remote
Improbable
LOW
Likelihood: SOURCE IEC 80001-2-1 Step by Step
SOFTWARE
•Software is Invisible.
•The failure caused by the
software occurs
systematically, but not
statistically.
We can not estimate the probability or
the likelihood of the failure cased by
Software.
Yoshio_Sakai@mb2.nkc.co.jp
6
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
7. Feature of Systematic Failure
Systematic failure is unwanted behaviour which is
• repeatable
– If the conditions can be exactly replicated
• predictable (but not accurately)
– all systems have flaws
• indefensible
– it should not occur...
… but it is extremely hard to prevent
Yoshio_Sakai@mb2.nkc.co.jp
7
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
8. The definition and explanation of Systematic Failure
Systematic Failure
failure, related in a deterministic way to a certain cause, that can only be eliminated by a
change of the design or of the manufacturing process, operational procedures,
documentation or other relevant factors
SOURCE: ISO 26262-1:2011
This International Standard NOTE4
:
• sets requirements for the avoidance and control of systematic faults, which are based
on experience and judgment from practical experience gained in industry. Even though
the probability of occurrence of systematic failures cannot in general be
quantified the standard does, however, allow a claim to be made, for a specified
safety function, that the target failure measure associated with the safety function can
be considered to be achieved if all the requirements in the standard have been met;
SOURCE: IEC 61508-3:2010
Yoshio_Sakai@mb2.nkc.co.jp
8
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
9. Two types of evaluation of the hazard caused by
Systematic Software Failure
The probability of such failure shall be assumed to be 100 percent.
(IEC 62304:2006)
• The probability is 100%.
• This 100 percent principle has been chosen
for conservative purpose
but not practical in real application.
If the hazard could arise from a failure of the software, the risk evaluation should be
analyzed by the following two concerns. (IEC 62304:2006 Amd.1 , This Study)
• 1st concern is the risk level as the severity of the harm before the risk control measures.
• 2nd concern is the risk level as the severity of the harm after the risk control measures.
• The evaluation of the residual risk is of importance, but under the cause of the software, the
probability of occurrence of harm before the risk control measures is not.
Yoshio_Sakai@mb2.nkc.co.jp
9
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
10. The procedure of evaluation
of the hazard caused by Systematic Software Failure
If the hazardous
situation occurs
by Systematic
Software Failure
RISK
The safety is affected by
• the hardware as the risk
control measure and
• the reliability of the
critical software
component.
RISK CONTROL MEASURES
After the risk
control measures,
we have to evaluate
the residual risk for
the safety.
RESIDUAL RISK
The probability of occurrence of harm caused by the software before the risk control
measures is not necessary for the risk assessment.
Yoshio_Sakai@mb2.nkc.co.jp
10
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
11. Method of evaluating Systematic Failure
Medical device Manufacturers can evaluate the residual risk class by the
following combination after countermeasure.
a. The severity of the residual risk
b.
The reliability of the software
items that could contribute to a
hazardous situation
c. The safe architecture of the
software system
These are not elements of
Yoshio_Sakai@mb2.nkc.co.jp
the probability
11
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
12. Relation between the risk control measures and Architecture.
Complicated Software Items
(Low cohesion and High coupling)
Segregated Software Items
(High cohesion and Low coupling)
Layered Architecture (3 Layers: Presentation,
Domain and Date Source)
Result of having continuous addition
(A real software system)
Not Clear
Yoshio_Sakai@mb2.nkc.co.jp
Clear
12
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
13. The Principles of Electrosurgical Knife
The mode of cut or coagulation is switched by software.
Mode
Principles
Cut
For cutting, a continuous single frequency sine wave is often
employed.
Coagulation
For coagulation, the average power is typically reduced below the
threshold of cutting. Generally, the sine wave is turned on and off in
a rapid succession.
There are the serious hazardous situations in the software system.
Yoshio_Sakai@mb2.nkc.co.jp
13
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
14. Electrosurgical Knife Block Diagram
The wave is
controlled and
switched by
the software
High Risk
Software
Component
High Risk
Software
Component
The most serious
hazard is
hemorrhage not
intended by the
abnormal output
of Electrosurgical
knife.
Let’s see the fault tree analysis following slides.
Yoshio_Sakai@mb2.nkc.co.jp
14
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
15. Abnormal Output of
Electrosurgical Knife
Extended Notation of FTA (1)
Class A(C)s = OR (A(C)s, A(C)s)
Abnornal Output caused
by Hardware
Class A(C)s = AND (C, --Bs))
1st column from the bottom and on
the left side of FTA Example
Unintended Output caused
by Software
Class A(C)s = AND (Cs, --Bs)
Output Hardware
Failure
Abnormal
Monitoring
Failure
Class Bs
d.
There are three hardware failures.
Each failure is classified by the risk level.
Three basic events are connected with OR
gate.
The highest risk class is adopted by the OR
function.
Risk Class
High-frequency
Wave Failure
Class C
Wave Circuit
Failure
Class C
Failure of the Abnormal
Detection
Class Cs
Class Bs = AND (Bs, B)
Class C = OR (C, C, B)
a.
b.
c.
Cut/Coag
Mode
Mismatch
Timer Failure
Class B
Abnormal
Monitoring
Failure
Class Bs
A/D
Convertor
Failure
Class B
Definition (Source IEC 62304:2006)
Class A
No injury or damage to health is possible
Class B
Non-serious injury is possible
Class C
Death or serious injury is possible
Yoshio_Sakai@mb2.nkc.co.jp
15
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
16. Extended Notation of FTA (2)
Abnormal Output of
Electrosurgical Knife
Class A(C)s = OR (A(C)s, A(C)s)
Abnornal Output caused
by Hardware
Class A(C)s = AND (C, --Bs))
2nd column from the bottom and on the left side of
FTA Example
Unintended Output caused
by Software
Class A(C)s = AND (Cs, --Bs)
Output Hardware
Failure
a.
b.
c.
d.
The right basic event is an abnormal monitoring failure.
This event is caused by the software.
It is described with Class Bs as impact level of risk Class
B and with “s” as the effect of the software.
The abnormal monitoring inhibits and controls the output
hardware failure. This is indicated by AND function as
AND(C, --Bs). The stage of inhibit is shown by the
number of the minus. In this case, the risk control measure
goes down the risk level by two stages from C to A.
Class A
Class Bs
Class C
Wave Circuit
Failure
Class C
Failure of the Abnormal
Detection
Class Cs
Class Bs = AND (Bs, B)
Class C = OR (C, C, B)
High-frequency
Wave Failure
Cut/Coag
Mode
Mismatch
Timer Failure
Class B
Abnormal
Monitoring
Failure
Class Bs
A/D
Convertor
Failure
Class B
Class A(C) s = AND(C, --Bs)
Class C
--
Abnormal
Monitoring
Failure
Risk Control Measure(Class Bs)
Down the risk level by two stages
Yoshio_Sakai@mb2.nkc.co.jp
16
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
17. Extended Notation of FTA (3)
Abnormal Output of
Electrosurgical Knife
Class A(C)s = OR (A(C)s, A(C)s)
Abnornal Output caused
by Hardware
1st column from the bottom and On
the right side of FTA Example.
a.
The abnormal monitoring failure is caused by the
software.
b.
Class A(C)s = AND (Cs, --Bs)
Output Hardware
Failure
Abnormal
Monitoring
Failure
Class Bs
Class C
Wave Circuit
Failure
Class C
Failure of the Abnormal
Detection
Class Cs
Class Bs = AND (Bs, B)
Class C = OR (C, C, B)
High-frequency
Wave Failure
Cut/Coag
Mode
Mismatch
Timer Failure
Class B
Abnormal
Monitoring
Failure
Class Bs
A/D
Convertor
Failure
Class B
If the basic event does not inhibit the other basic
event, the highest risk class is adopted by the AND
function. (This method is inspired by the notation of
ASIL decomposition in ISO 26262-9)
d.
Class A(C)s = AND (C, --Bs))
The A/D convertor failure is caused by hardware.
c.
Unintended Output caused
by Software
The subscript “s” is inherited from the left side to
the right side through the function as the affect of
the software to the system.
Yoshio_Sakai@mb2.nkc.co.jp
17
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
18. Extended Notation of FTA (4)
Abnormal Output of
Electrosurgical Knife
Class A(C)s = OR (A(C)s, A(C)s)
1st column from the top of
FTA Example.
Abnornal Output caused
by Hardware
Unintended Output caused
by Software
Class A(C)s = AND (C, --Bs))
Class A(C)s = AND (Cs, --Bs)
Output Hardware
Failure
a. The highest risk class is adopted by
the OR function. In this case, the risk
classes are same.
Abnormal
Monitoring
Failure
Class Bs
Class C
Wave Circuit
Failure
Class C
Failure of the Abnormal
Detection
Class Cs
Class Bs = AND (Bs, B)
Class C = OR (C, C, B)
High-frequency
Wave Failure
Cut/Coag
Mode
Mismatch
Timer Failure
Class B
Abnormal
Monitoring
Failure
Class Bs
A/D
Convertor
Failure
Class B
b. The risk class of a top event is
expressed after all as Class A (C) s.
•
The followings are recognized by this
notation.
– The risk class of the residual risk is A.
– The highest risk class before the risk
control measure is C.
– The software affects the top event or the
risk control measure in the system.
Yoshio_Sakai@mb2.nkc.co.jp
18
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
19. Effectiveness of this Notation
These are the following effectiveness of this notation.
• The safety analysts can recognize
– the risk class before and after the risk control measure.
– the software in the system and the risk control measure affects the top event.
– the effect of the risk control by the minus mark in the AND function.
• When there is the mark "s" of the event in the fault tree, the safety analysts find the
start point of the effect of the software for the system safety.
• When there is the mark "s" and the minus mark, the safety analysts can recognize the
risk which is given by changing software of the risk control measure.
Yoshio_Sakai@mb2.nkc.co.jp
19
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
20. Effectiveness
of this Notation
There is the
risk which is
given by
changing
software of
the risk control
measure
There is the
risk which is
given by
changing
software of
the risk control
measure
Yoshio_Sakai@mb2.nkc.co.jp
The start point
of the effect of
the software
for the system
safety
20
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
21. Attention!
• FTA is an excellent way to show the structure of the mechanism that
Top Event as "undesired state of the system" is generated.
• On the other hand, the calculation of the failure rate on FTA has a
dangerous feature too.
When Systematic Software Failure has not been recognized, the analysis of a
radiation therapy machine named Therac-25 included the software in the fault
trees but used a “generic failure rate” of 10-4 for software events.
This number was justified based on the historical performance of the Therac-25
software.(This source is from SAFEWARE by Pf. Nancy Leveson)
But now, we understand the features of the software well, and recognize it is not realistic.
1.The evaluation of the residual risk is of importance.
2.We can evaluate the severity of the harm before and after the risk control measures.
Therefore, we should focus on the architecture of the software system and the
structure of the risk control measures.
Yoshio_Sakai@mb2.nkc.co.jp
21
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
22. Thank you.
I wish this notation will be used in the real development of Medical Devices.
Yoshio_Sakai@mb2.nkc.co.jp
22
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
23. REFERENCES
[1] Dolores R. Wallace, D. Richard Kuhn, “Failure Modes In Medical Device Software:An
Analysis Of 15 Years Of Recall Data” , 2001
[2] S.Shirasaka, Y.Sakai, Y.Nishi, “Feature Analysis of Estimated Causes of Failures in Medical
Device Software and Proposal of Effective Measures” , ISSRE 2011,
[3] ISO 14971:2007 Medical devices - Application of risk management to medical devices
[4] ISO 26262-1:2011 Road vehicles - Functional safety - Part 1: Vocabulary
[5] IEC/TR 80001-2-1 Application of risk management for IT-networks incorporating medical
devices – Part 2-1: Step-by-step risk management of medical IT-networks – practical
applications and examples
[6] IEC 62304:2006 Medical device software - Software life cycle processes
[7] “Katerina Goseva-Popstojanova, Ahmed Hassan, Ajith Guedem, Walid Abdelmoez, Diaa Eldin
M. Nassar, Hany Ammar, Ali Mili, “Architectural-Level Risk Analysis Using UML”, IEEE
TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29 NO. 10 OCTOBER 2003
[8] Sherif M. Yacoub, Hany H. Ammar, “A Methodology for Architecture-Level Reliability Risk
Analysis”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING VOL. 28 NO. 6 JUNE
2002
Yoshio_Sakai@mb2.nkc.co.jp
23
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
24. Extra Information for this study
Yoshio_Sakai@mb2.nkc.co.jp
24
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
25. Therac-25 FTA
System outputs the
wrong energy
• The probability for the computer to
choose the wrong energy is 10-11 .
• The probability for the computer to
choose the wrong mode is 4×10-9
• I took off a safety device with the
hardware for an economic reason.
• Systematic Software Failure has not
been recognized
• This number was justified based on
the historical performance of the
Therac-25 software.
PDP-11
VT100
Computer
chooses the
wrong energy
0.00000000001
The probability is 10-11 ?
Yoshio_Sakai@mb2.nkc.co.jp
Computer
chooses the
wrong mode
0.000000004
The probability is 4×10-9 ?
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
26. IEC 80001-2-1 Figure 8
Yoshio_Sakai@mb2.nkc.co.jp
26
Work Sheet Example of
Hazard Analysis
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
27. New Hazard Analysis of the real medical devices.
Probability should be
replaced to Probability or
Likelihood or
NA(Software): Not Applicable.
Probability should be
replaced to Effect of Risk
Control Measure (e.g.
Major/Moderate/Minor)
Add “Risk Control Measure
Type of Concern”
SOFTWARE, USABILITY,
HARDWARE, CONBINATION
of ・・・
If there is the combination
of the hardware faults
and the software errors,
we should have
separation of the concern
which is Hardware or
Usability or Software.
Yoshio_Sakai@mb2.nkc.co.jp
27
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
28. Separation of The Concern for the risk assessment
1st Concern
SOFTWARE
NA→The risk level
The risk level before the risk control measures.
The risk level after the risk control measures.
2nd Concern
3rd Concern
USABILITY
Probability
likelihood
Yoshio_Sakai@mb2.nkc.co.jp
HARDWARE
(Statistically)
28
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
29. IEC 80001-2-1 Table D.3
Usability <-> ○ Likelihood
Software <-> × Likelihood
If the hazardous
situation occurred
in the software,
we can estimate
the risk level as
only the severity
of the harm after
the risk control
measures.
Yoshio_Sakai@mb2.nkc.co.jp
29
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
30. Sequence of Events
Change the method of
the risk assessment!
Hazard
Exposure (P1)
Hazardous
Situation
P2
Harm
Medical Device System
Requirements
Analysis
User Needs
Intended Use
Risk
Assessment
Hazard
Hazardous
Situation & Harm
Risk Reduction
Risk Control
Measure
Severity of
the Harm
Probability of
Occurrence
of Harm
Risk
P1 × P2
Software
Architecture
Hardware & Software
We should focus on the
architecture of the software
system and the structure of
the risk control measures.
The important aspects
Residual Risk
Yoshio_Sakai@mb2.nkc.co.jp
30
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
31. IEC 62304:2006 Amd1 CD 4.3 Software safety classification
This chart and
our study are
the same
classify method.
Yoshio_Sakai@mb2.nkc.co.jp
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
32. The Types of Safety Design
Specific Optimization
Fault
Avoidance
Total Optimization
Contrasting
Method
Specific optimization as
Fault Avoidance approach
is not realistic for the largescale and complicated
software system.
Yoshio_Sakai@mb2.nkc.co.jp
Architecture
Fail Safe
Fault
Tolerance
Error Proof
(Fool Proof)
Total optimization
approach is
reasonable for
today’s medical
device software.
USER
Usability
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
33. Safety Design
Method
Realization Technique
Fault
Avoidance
High
Coverage
Testing
Fail Safe
Interlock
Lockout
Safeguard
Fault
Tolerance
Space Tolerance
Error Proof /
Fool Proof
Formal
Method
Easy Operation
Home button
Safety Label
Yoshio_Sakai@mb2.nkc.co.jp
Main
Sub
Time Tolerance
1st
2nd
Information Tolerance
Main
Information
Error
Correction
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
34. ISO 26262-9 Figure 2 — ASIL decomposition schemes
• If the basic event
does not inhibit the
other basic event,
the highest risk class
is adopted by the
AND function. (This
method is inspired
by the notation of
ASIL decomposition
in ISO 26262-9)
AND function without the element of the risk control as inhibit should select the
maximum level of failures. Because it focus on the risk class before and after the risk
control measures.
Yoshio_Sakai@mb2.nkc.co.jp
34
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013