This document summarizes Rex Black's book on risk-based testing strategies. It discusses:
- The two main types of risks in testing: product risks related to quality, and project risks related to management and schedules.
- How risk-based testing guides testing activities based on identified risks, prioritizing higher-risk items and allocating more testing effort to them.
- The benefits of risk-based testing over requirements-based testing, like having a more predictable reduction in risk over time and the ability to intelligently reduce testing if needed.
- The history of risk-based testing strategies dating back to the 1980s, and how modern approaches aim to systematically analyze and address risks.
1. EuroSTAR
Software Testing
C o n fe r e n c e
EuroSTAR
Software Testing
C o m m u n i t y
Advanced Software Testing - Vol. 2:
Guide to the ISTQB Advanced
Certification as an Advanced
Test Manager
Rex Black
President of RBCS
2. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
1
PAGE
The following is an excerpt from Rex Black’s
book, Advanced Software Testing: Volume
2. It consists of the section concerning risk-
based testing.
Risk-Based Testing
and Failure Mode
and Effects Analysis
Learning objectives
(K2) Explain the different ways that risk-based
testing responds to risks.
(K4) Identify risks within a project and product,
and determine an adequate test strategy and
test plan based on these risks.
(K3) Execute a risk analysis for product from a
tester’s perspective, following the failure mode
and effects analysis approach.
(K4) Summarize the results from the various
perspectivesonrisktypicallyheldbykeyproject
stakeholders, and use their collective judgment
in order to outline test activities to mitigate
risks.
(K2)Describecharacteristicsofriskmanagement
that require it to be an iterative process.
(K3) Translate a given risk-based test strategy to
test activities and monitor its effects during the
testing.
(K4) Analyze and report test results, including
determining and reporting residual risks to
enable project managers to make intelligent
release decisions.
(K2) Describe the concept of FMEA, and explain
its application in projects and benefits to
projects by example.
Riskisthepossibilityofanegativeorundesirable
outcome or event. A specific risk is any problem
that might occur that would decrease customer,
user, participant, or stakeholder perceptions of
product quality or project success.
In testing, we’re concerned with two main
types of risks. The first type is product or quality
risks. When the primary effect of a potential
problem is on the quality of the product itself,
the potential problem is called a product risk.
A synonym for product risk, which I use most
frequently myself, is quality risk. An example of
a quality risk is a possible reliability defect that
could cause a system to crash during normal
operation.
ISTQB Glossary
product risk: A risk directly related to the test
object.
project risk: A risk related to management and
control of the (test) project, e.g., lack of staffing,
strict deadlines, changing requirements, etc.
risk: A factor that could result in future negative
consequences; usually expressed as impact and
likelihood.
The second type of risk is project or planning
risks. When the primary effect of a potential
problem is on the overall success of a project,
those potential problems are called project
risks. Some people also refer to project risks as
planning risks. An example of a project risk is
a possible staffing shortage that could delay
completion of a project.
Not all risks are equal in importance.There are a
number of ways to classify the level of risk. The
simplest is to look at two factors:
3. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
2
PAGE
The likelihood of the problem occurring
The impact of the problem should it
occur
Likelihood of a problem arises primarily
from technical considerations, such as the
programming languages used, the bandwidth
of connections, and so forth. The impact of a
problem arises from business considerations,
such as the financial loss the business will suffer,
the number of users or customers affected, and
so forth.
In risk-based testing, we use the risk items
identified during risk analysis, together with the
level of risk associated with each risk item, to
guide our testing. In fact, under a true analytical
risk-based testing strategy, risk is the primary
basis of testing.
Risk can guide testing in various ways, but there
are three very common ones:
First, during all test activities, test analysts
and test managers allocate effort for each
quality risk item proportional to the level
of risk. Test analysts select test techniques
in a way that matches the rigor and
extensiveness of the technique with the
level of risk. Test managers and test analysts
carry out test activities in reverse risk order,
addressing the most important quality
risks first and only at the very end spending
any time at all on less important ones.
Finally, test managers and test analysts
work with the project team to ensure that
the prioritization and resolution of defects
is appropriate to the level of risk
Second, during test planning and test
control, test managers carry out risk control
for all significant, identified project risks.
The higher the level of risk, the more
thoroughly that project risk is controlled.
We’ll cover risk control options in a
moment.
Third, test managers and test analysts
report test results and project status in
terms of residual risks. For example, which
tests have we not yet run or have we
skipped? Which tests have we run? Which
have passed? Which have failed? Which
defects have we not yet fixed or retested?
How do the tests and defects relate back to
the risks?
When following a true analytical risk-based
testing strategy, it’s important that risk
management not be something that happens
only at the start of a project. The three
responses to risk I just covered—along with
any others that might be needed—should
occur throughout the lifecycle. Specifically, we
should try to reduce quality risk by running
tests and finding defects and reduce project
risks through mitigation and, if necessary,
contingency actions. Periodically in the project,
we should reevaluate risk and risk levels based
on new information. This might result in our
reprioritizing tests and defects, reallocating test
effort, and other test control actions. This will
discussed further later in this section.
ISTQB Glossary
risk level: The importance of a risk as defined
by its characteristics impact and likelihood.
The level of risk can be used to determine the
intensity of testing to be performed. A risk level
can be expressed either qualitatively (e.g., high,
medium, low) or quantitatively.
risk management: Systematic application
of procedures and practices to the tasks
of identifying, analyzing, prioritizing, and
controlling risk.
One metaphor sometimes used to help people
understand risk-based testing is that testing
is a form of insurance. In your daily life, you
buy insurance when you are worried about
4. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
3
PAGE
some potential risk. You don’t buy insurance
for risks that you are not worried about. So, we
should test the areas and test for bugs that are
worrisome and ignore the ones that aren’t.
One potentially misleading aspect of this
metaphor is that insurance professionals and
actuaries can use statistically valid data for
quantitative risk analysis. Typically, risk-based
testingreliesonqualitativeanalysesbecausewe
don’t have the same kind of data that insurance
companies have.
During risk-based testing, you have to remain
aware of many possible sources of risks. There
are safety risks for some systems. There are
business and economic risks for most systems.
Thereareprivacyanddatasecurityrisksformany
systems. There are technical, organizational,
and political risks too.
Characteristics and
Benefits of Risk-
Based Testing
What does an analytical risk-based testing
strategy involve? What characteristics and
benefits does it have?
For one thing, an analytical risk-based testing
strategy matches the level of testing effort to
the level of risk. The higher the risk, the more
test effort we expend. This means not only the
effort expended in test execution, but also the
effortexpendedindesigningandimplementing
the tests. We’ll look at the ways to accomplish
this later in this section.
For another thing, an analytical risk-based
testing strategy matches the order of testing
to the level of risk. Higher-risk tests tend to find
more bugs, or tend to test more important areas
of the system, or both. So, the higher the risk,
the earlier the test coverage. This is consistent
with a rule of thumb for testing that I often tell
testers, which is to try to find the scary stuff first.
Again, we’ll see how we can accomplish this
later in this section.
Because of this effort allocation and ordering of
testing, the total remaining level of quality risk
is systematically and predictably reduced as
testing continues. By maintaining traceability
from the tests to the risks and from the located
defects to the risks, we can report test results
in terms of residual risk. This allows project
stakeholders to decide to declare testing
complete whenever the risk of continuing
testing exceeds the risk of declaring the testing
complete.
Since the remaining risk is going down in a
predictable way, this means that we can triage
tests in risk order. Should schedule compression
require that we reduce test coverage, we can
do this in risk order, providing a way that is
both acceptable and explainable to project
stakeholders.
For all of these reasons, an analytical risk-based
testingstrategyismorerobustthanananalytical
requirements-basedteststrategy.Pureanalytical
requirements-based test strategies require at
least one test per requirement, but they don’t
tell us how many tests we need in a way that
responds intelligently to project constraints.
They don’t tell us the order in which to run tests.
In pure analytical requirements-based test
strategies, the risk reduction throughout test
executionisneitherpredictablenormeasurable.
Therefore, with analytical requirements-based
test strategies, we cannot easily express the
remaining level of risk if project stakeholders
askuswhetherwecansafelycurtailorcompress
testing.
5. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
4
PAGE
That is not to say that we ignore requirements
specifications when we use an analytical
risk-based testing strategy. On the contrary,
we use requirements specifications, design
specifications, marketing claims, technical
support or help desk data, and myriad other
inputs to inform our risk identification and
analysis process if they are available. However,
if we don’t have this information available or we
find such information of limited usefulness, we
can still plan, design, implement, and execute
our tests by using stakeholder input to the risk
identification and assessment process. This also
makes an analytical risk-based testing strategy
more robust than an analytical requirements-
based strategy, because we reduce our
dependency on upstream processes (which we
may not control) like requirements gathering
and design.
ISTQB Glossary
risk identification: The process of identifying
risks using techniques such as brainstorming,
checklists, and failure history.
All that said, an analytical risk-based testing
strategyisnotperfect.Likeanyanalyticaltesting
strategy, we will not have all of the information
we need for a perfect risk assessment at the
beginning of the project. Even with periodic
reassessment of risk—which I will also discuss
later in this section—we will miss some
important risks. Therefore, an analytical risk-
basedtestingstrategy,likeanyanalyticaltesting
strategy,shouldblendreactivestrategiesduring
test implementation and execution so that we
can detect risks that we missed during our risk
assessment.
Let me be more specific and concise about
the testing problems we often face and how
analytical risk-based testing can help solve
them.
First, as testers, we often face significant time
pressures. There is seldom sufficient time to run
the tests we’d want to run, particularly when
doing requirements-based testing. Ultimately,
all testing is time-boxed. Risk-based testing
provides a way to prioritize and triage tests at
any point in the lifecycle.
When I say that all testing is time-boxed, I
mean that we face a challenge in determining
appropriate test coverage. If we measure test
coverage as a percentage of what could be
tested, any amount of testing yields a coverage
metric of 0 percent because the set of tests that
could be run is infinite for any real-sized system.
So, risk-based testing provides a means to
choose a smart subset from the infinite number
of comparatively small subsets of tests we could
run.
Further, we often have to deal with poor or
missingspecifications.Byinvolvingstakeholders
in the decision about what not to test, what to
test, and how much to test it, risk-based testing
allows us to identify and fills gaps in documents
like requirements specifications that might
result in big holes in our testing. It also helps to
sensitize the other stakeholders to the difficult
problem of determining what to test (and how
much) and what not to test.
To return to the issue of time pressure, not
only are they significant, they tend to escalate
during the test execution period. We are often
asked to compress the test schedule at the
start of or even midway through test execution.
Risk-based testing provides a means to drop
tests intelligently while also providing a way
to discuss with project stakeholders the risks
inherent in doing so.
Finally, as we reach the end of our test execution
period, we need to be able to help project
stakeholders make smart release decisions.
Risk-based testing allows us to work with
stakeholders to determine an acceptable level
6. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
5
PAGE
of residual risk rather than forcing them—and
us—to rely on inadequate, tactical metrics like
bug and test counts.
The History of Risk-
Based Testing
How did analytical risk-based testing strategies
come to be? Understanding this history can
help you understand where we are and where
these strategies might evolve.
In the early 1980s, Barry Boehm and Boris
Beizer each separately examined the idea of
risk as it relates to software development.
Boehm advanced the idea of a risk-driven spiral
development lifecycle, which we covered in the
Foundation syllabus. The idea of this approach
is to develop the architecture and design in
risk order to reduce the risk of development
catastrophes and blind alleys later in the
project.
Beizer advanced the idea of risk-driven
integration and integration testing. In other
words, it’s not enough to develop in risk order,
we need to assemble and test in risk order,
too.1
If you reflect on the implications of Boehm and
Beizer’s ideas, you can see that these are the
precursors of iterative and agile lifecycles.
Now, in the mid 1980s, Beizer and Bill Hetzel
each separately declared that risk should be a
primary driver of testing. By this, they meant
both in terms of effort and in terms of order.
However, while giving some general ideas
on this, they did not elaborate any specific
mechanisms or methodologies for for making
this happen. I don’t say this to criticize them. At
that point, it perhaps seemed that just ensuring
awarenessofriskamongthetesterswasenough
to ensure risk-based testing.2
However, it was not. Some testers have followed
this concept of using the tester’s idea of risk
to determine test coverage and priority. For
reasons we’ll cover later, this results in testing
devolving into an ill-informed, reactive bug
hunt. There’s nothing wrong with finding many
bugs, but finding as many bugs as possible is
not a well-balanced test objective.
So, more structure was needed to ensure a
systematic exploration of the risks. This brings
us to the 1990s. Separately, Rick Craig, Paul
Gerrard, Felix Redmill, and I were all looking for
ways to systematize this concept of risk-based
testing. I can’t speak for Craig, Gerrard, and
Redmill,butIknowthatIhadbecomefrustrated
with requirements-based strategies for the
reasons mentioned earlier. So in parallel and
with very little apparent cross-pollination, the
four of us—and perhaps others—developed
similar approaches for quality risk analysis and
risk-based testing. In this section, you’ll learn
these approaches.3
So, where are we now? In the mid- to late 2000s,
test practitioners widely use analytical risk-
based testing strategies in various forms. Some
still practice misguided, reactive, tester-focused
bug hunts. However, many practitioners are
trying to use analytical approaches to prevent
bugs from entering later phases of testing, to
focus testing on what is likely to fail and what
is important, to report test status in terms of
residual risk, and to respond better as their
1: See Beizer’s book Software System Testing and Quality Assurance.
2: See Beizer’s book Software Testing Techniques and Hetzel’s book The Complete Guide to Software Testing
3: For more details on my approach, see my discussion of formal techniques in Critical Testing Processes and my
discussion of informal techniques in Pragmatic Software Testing. For Paul Gerrard’s approach, see Risk-based e-
Business Testing. Van Veenendaal discusses informal techniques in The Testing Practitioner.
7. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
6
PAGE
understanding of risk changes. By putting the
ideas in this section into practice, you can join
us in this endeavor. As you learn more about
how analytical risk-based testing strategies
work—and where they need improvements—I
encourage you to share what you’ve learned
with others by writing articles, books, and
presentations on the topic.
However, while we still have much to learn, that
does not mean that analytical risk-based testing
strategies are at all experimental. They are well-
proven practice. I am unaware of any other
test strategies that adapt as well to the myriad
realities and constraints of software projects.
They are the best thing going, especially when
blended with reactive strategies.
Anotherformofblendingthatrequiresattention
and work is blending of analytical risk-based
testing strategies with all the existing lifecycle
models. My associates have used analytical
risk-based testing strategies with sequential
lifecycles,iterativelifecycles,andspirallifecycles.
These strategies work regardless of lifecycle.
However, the strategies must be adapted to the
lifecycle.
Beyondlearningmorethroughpractice,another
important next step is for test management
tools to catch up and start to advance the use
of analytical risk-based testing strategies. Some
test management tools now incorporate the
state of the practice in risk-based testing. Some
still do not support risk-based testing directly at
all. I encourage those of you who are working
on test management tools to build support for
this strategy into your tools and look for ways
to improve it.
How to Do Risk-
Based Testing
Let’s move on to the tactical questions about
how we can perform risk-based testing. Let’s
start with a general discussion about risk
management, and then we’ll focus on specific
elements of risk-based testing for the rest of
this section.
Risk management includes three primary
activities:
Risk identification, figuring out what the
different project and quality risks are for
the project
Risk analysis, assessing the level of risk—
typically based on likelihood and impact—
for each identified risk item
Risk mitigation (which is really more
properly called“risk control”because
it consists of mitigation, contingency,
transference, and acceptance actions for
various risks)
In some sense, these activities are sequential, at
least in when they start. They are staged such
that risk identification starts first. Risk analysis
comes next. Risk control starts once we have
determinedthelevelofriskthroughriskanalysis.
However,sinceweshouldcontinuouslymanage
risk in a project, risk identification, risk analysis,
and risk control are all recurring activities.
ISTQB Glossary
risk control: The process through which
decisions are reached and protective measures
are implemented for reducing risks to, or
maintaining risks within, specified levels
8. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
7
PAGE
risk mitigation: See risk control.
Everyone has their own perspective on
how to manage risks on a project, including
what the risks are, the level of risk, and the
appropriate controls to put in place for risks.
So risk management should include all project
stakeholders.
Test analysts bring particular expertise to risk
management due to their defect-focused
outlook. They should participate whenever
possible.Infact,inmanycases,thetestmanager
will lead the quality risk analysis effort with test
analysts providing key support in the process.
Let’s look at these activities more closely. For
proper risk-based testing, we need to identify
both product and project risks. We can identify
both kinds of risks using techniques like these:
Expert interviews
Independent assessments
Use of risk templates
Project retrospectives
Risk workshops and brainstorming
Checklists
Calling on past experience
Conceivably, you can use a single integrated
process to identify both project and product
risks. I usually separate them into two separate
processes since they have two separate
deliverables and often separate stakeholders. I
include the project risk identification process in
the test planning process. In parallel, the quality
risk identification process occurs early in the
project.
That said, project risks—and not just for
testing but also for the project as a whole—are
often identified as by-products of quality risk
analysis. In addition, if you use a requirements
specification, design specification, use cases,
and the like as inputs into your quality risk
analysis process, you should expect to find
defects in those documents as another set of
by-products. These are valuable by-products,
which you should plan to capture and escalate
to the proper person.
Previously, I encouraged you to include
representatives of all possible stakeholder
groups in the risk management process. For the
risk identification activities, the broadest range
of stakeholders will yield the most complete,
accurate, and precise risk identification. The
more stakeholder group representatives you
omit from the process, the more risk items and
even whole risk categories will be missing.
How far should you take this process? Well,
it depends on the technique. In informal
techniques, which I frequently use, risk
identification stops at the risk items. The risk
items must be specific enough to allow for
analysis and assessment of each one to yield
an unambiguous likelihood rating and an
unambiguous impact rating.
Techniques that are more formal often look
“downstream” to identify potential effects of
the risk item if it becomes an actual negative
outcome. These effects include effects on
the system—or the system of systems if
applicable—as well as on potential users,
customers, stakeholders, and even society in
general. Failure Mode and Effect Analysis is an
example of such a formal risk management
technique, and it is commonly used on safety-
critical and embedded systems.4
Other formal techniques look “upstream” to
identify the source of the risk. Hazard Analysis is
an example of such a formal risk management
technique. I’ve never used it myself, but I have
talked to clients who have used it for safety-
critical medical systems.
4: For a discussion of Failure Mode and Effect Analysis, see Stamatis’s book Failure Mode and Effect Analysis.
9. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
8
PAGE
We’ll look at some examples of various levels
of formality in risk analysis a little later in this
section.
The Advanced syllabus refers to the next
step in the risk management process as risk
analysis. I prefer to call it risk assessment, just
because analysis would seem to include both
identification and assessment of risk to me.
Regardless of what we call it, risk analysis or risk
assessment involves the study of the identified
risks. We typically want to categorize each risk
item appropriately and assign each risk item an
appropriate level of risk.
We can use ISO 9126 or other quality categories
to organize the risk items. In my opinion, it
doesn’t matter so much what category a risk
items goes into, usually, so long as we don’t
forget it. However, in complex projects and
for large organizations, the category of risk
can determine who has to deal with the risk. A
practical implication of categorization like this
will make the categorization important.
The Level of Risk
Theotherpartofriskassessmentorriskanalysisis
determining the level of risk.This often involves
likelihood and impact as the two key factors.
Likelihood arises from technical considerations,
typically, while impact arises from business
considerations. However, in some formalized
approaches you use three factors, such as
severity, priority, and likelihood of detection,
or even subfactors underlying likelihood and
impact. Again, we’ll discuss this further later in
the book.
So, what technical factors should we consider?
Here’s a list to get you started:
Complexity of technology and teams
Personnel and training issues
Intrateam and interteam conflict/
communication
Supplier and vendor contractual problems
Geographical distribution of the
development organization, as with
outsourcing
Legacy or established designs and
technologies versus new technologies and
designs
The quality—or lack of quality—in the
tools and technology used
Bad managerial or technical leadership
Time, resource, and management pressure,
especially when financial penalties apply
Lack of earlier testing and quality assurance
tasks in the lifecycle
High rates of requirements, design, and
code changes in the project
High defect rates
Complex interfacing and integration issues
Lack of sufficiently documented
requirements
And what business factors should we consider?
Here’s a list to get you started:
The frequency of use and importance of
the affected feature
Potential damage to image
Loss of customers and business
Potential financial, ecological, or social
losses or liability
Civil or criminal legal sanctions
Loss of licenses, permits, and the like
The lack of reasonable workarounds
The visibility of failure and the associated
negative publicity
Both of these lists are just starting points.
When determining the level of risk, we can
try to work quantitatively or qualitatively. In
quantitative risk analysis, we have numerical
10. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
9
PAGE
ratings for both likelihood and impact.
Likelihood is a percentage, and impact is often
a monetary quantity. If we multiply the two
values together, we can calculate the cost of
exposure, which is called—in the insurance
business—the expected payout or expected
loss.
While it will be nice some day in the future
of software engineering to be able to do
this routinely, typically the level of risk is
determined qualitatively. Why? Because we
don’t have statistically valid data on which to
perform quantitative quality risk analysis. So we
can speak of likelihood being very high, high,
medium, low, or very low, but we can’t say—at
least, not in any meaningful way—whether the
likelihood is 90 percent, 75 percent, 50 percent,
25 percent, or 10 percent.
This is not to say—by any means—that a
qualitative approach should be seen as inferior
oruseless.Infact,giventhedatamostofushave
to work with, use of a quantitative approach is
almostcertainlyinappropriateonmostprojects.
The illusory precision thus produced misleads
the stakeholders about the extent to which you
actually understand and can manage risk. What
I’ve found is that if I accept the limits of my data
and apply appropriate informal quality risk
management approaches, the results are not
only perfectly useful, but also indeed essential
to a well-managed test process.
Unless your risk analysis is based on extensive
and statistically valid risk data, your risk analysis
will reflect perceived likelihood and impact. In
otherwords,personalperceptionsandopinions
heldbythestakeholderswilldeterminethelevel
of risk. Again, there’s absolutely nothing wrong
with this, and I don’t bring this up to condemn
the technique at all.The key point is that project
managers, programmers, users, business
analysts, architects, and testers typically have
differentperceptionsandthuspossiblydifferent
opinions on the level of risk for each risk item.
By including all these perceptions, we distill the
collective wisdom of the team.
However, we do have a strong possibility of
disagreements between stakeholders. So the
risk analysis process should include some way
of reaching consensus. In the worst case, if we
cannot obtain consensus, we should be able
to escalate the disagreement to some level of
management to resolve. Otherwise, risk levels
will be ambiguous and conflicted and thus not
useful as a guide for risk mitigation activities—
including testing.
Controlling the
Risks
Part of any management role, including test
management, is controlling risks that affect
your area of interest. How can we control risks?
We have four main options for risk control:
Mitigation, where we take preventive
measures to reduce the likelihood and/or
the impact of a risk.
Contingency, where we have a plan or
perhaps multiple plans to reduce the
impact if the risk becomes an actuality.
Transference, where we get another party
to accept the consequences of a risk.
Finally, we can ignore or accept the risk
and its consequences.
For any given risk item, selecting one or more of
these options creates its own set of benefits and
opportunities as well as costs and, potentially,
additional risks associated with each option.
11. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
10
PAGE
Done wrong, risk control can make things
worse, not better.
Analytical risk-based testing is focused on
creating risk mitigation opportunities for the
test team, especially for quality risks. Risk-based
testing mitigates quality risks through testing
throughout the entire lifecycle.
In some cases, there are standards that can
apply. We’ll look at a couple of risk-related
standards shortly in this section.
Project Risks
While much of this section deals with product
risks, test managers often identify project risks,
andsometimestheyhavetomanagethem.Let’s
discuss this topic now so we can subsequently
focus on product risks. A specific list of all
possible test-related project risks would be
huge, but includes issues like these:
Test environment and tool readiness
Test staff availability and qualification
Low quality of test deliverables
Too much change in scope or product
definition
Sloppy, ad-hoc testing effort
Test-related project risks can often be mitigated
or at least one or more contingency plans put
in place to respond to the unhappy event if it
occurs. A test manager can manage risk to the
test effort in a number of ways.
We can accelerate the moment of test
involvement and ensure early preparation of
testware.Bydoingthis,wecanmakesureweare
ready to start testing when the product is ready.
In addition, as mentioned in the Foundation
syllabus and elsewhere in this course, early
involvement of the test team allows our test
analysis, design, and implementation activities
toserveasaformofstatictestingfortheproject,
which can serve to prevent bugs from showing
up later during dynamic testing, such as during
system test. Detecting an unexpectedly large
number of bugs during high-level testing
like system test, system integration test, and
acceptance test creates a significant risk of
project delay, so this bug-preventing activity is
a key project risk-reducing benefit of testing.
We can make sure that we check out the test
environment before test execution starts. This
can be paired with another risk-mitigation
activity, that of testing early versions of the
product before formal test execution begins.
If we do this in the test environment, we can
test the testware, the test environment, the test
release and test object installation process, and
many other test execution processes in advance
before the first day of testing.
We can also define tougher entry criteria to
testing. That can be an effective approach if
the project manager will slip the end date of
testing if the start date slips. Often, project
managers won’t do that, so making it harder to
start testing while not changing the end date
of testing simply creates more stress and puts
pressure on the test team.
We can try to institute requirements for
testability. For example, getting the user
interface design team to change editable fields
into non-editable pull-down fields wherever
possible—such as on date and time fields—
can reduce the size of the potential user
input validation test set dramatically and help
automation efforts.
Toreducethelikelihoodofbeingcaughtunaware
by really bad test objects, and to help reduce
bugs in those test objects, test team members
can participate in reviews of earlier project work
12. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
11
PAGE
products, such as requirements specifications.
We can also have the test team participate in
problem and change management.
Finally, during the test execution effort—
hopefully starting with unit testing and perhaps
even before, but if not at least from day one of
formal testing—we can monitor the project
progress and quality. If we see alarming trends
developing, we can try to manage them before
they turn into end-game disasters.
In Figure 1, you see the test-related project risks
for an Internet appliance project that serves as
a recurring case study in this book. These risks
were identified in the test plan and steps were
taken throughout the project to manage them
through mitigation or respond to them through
contingency.
Let’s review the main project risks identified for
testing on this project and the mitigation and
contingency plans put in place for them.
We were worried, given the initial aggressive
schedules, that we might not be able to staff
the test team on time. Our contingency plan
was to reduce scope of test effort in reverse-
priority order.
On some projects, test release management
is not well defined, which can result in a test
cycle’s results being invalidated. Our mitigation
plan was to ensure a well-defined crisp release
management process.
We have sometimes had to deal with test
environment system administration support
that was either unavailable at key times or
simply unable to carry out the tasks required.
Our mitigation plan was to identify system
administration resources with pager and cell
phone availability and appropriate Unix, QNX,
and network skills.
As consultants, my associates and I often
encounter situations in which test environment
are shared with development, which can
introducetremendousdelaysandunpredictable
interruptions into the test execution schedule.
Figure 1: Test-related project risks example
13. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
12
PAGE
In this case, we had not yet determined the best
mitigation or contingency plan for this, so it was
marked“[TBD].”
Of course, buggy deliverables can impede
testing progress. In fact, more often than not,
the determining factor in test cycle duration for
new applications (as opposed to maintenance
releases) is the number of bugs in the product
and how long it takes to grind them out. We
asked for complete unit testing and adherence
to test entry and exit criteria as mitigation plans
for the software. For the hardware component,
we wanted to mitigate this risk through early
auditing of vendor test and reliability plans and
results.
It’s also the case that frequent or sizeable test
and product scope and definition changes can
impede testing progress. As a contingency plan
to manage this should it occur, we wanted a
change management or change control board
to be established.
Two Industry
Standards and Their
Relation to Risk
You can find an interesting example of how
risk management, including quality risk
management, plays into the engineering of
complex and/or safety-critical systems in the
ISO/IEC standard 61508, which is mentioned in
the Advanced syllabus. It is designed especially
for embedded software that controls systems
with safety-related implications, as you can tell
from its title: “Functional safety of electrical/
electronic/programmable electronic safety-
related systems.”
The standard focuses on risks. It requires risk
analysis. It considers two primary factors to
determine the level of risk: likelihood and
impact. During a project, the standard directs us
to reduce the residual level of risk to a tolerable
level, specifically through the application of
electrical, electronic, or software improvements
to the system.
The standard has an inherent philosophy about
risk. It acknowledges that we can’t attain a level
of zero risk—whether for an entire system or
even for a single risk item. It says that we have
to build quality, especially safety, in from the
beginning, not try to add it at the end, and
thus must take defect-preventing actions like
requirements, design, and code reviews.
The standard also insists that we know what
constitutes tolerable and intolerable risks and
that we take steps to reduce intolerable risks.
When those steps are testing steps, we must
document them, including a software safety
validation plan, software test specification,
software test results, software safety validation,
verification report, and software functional
safety report. The standard is concerned
with the author-bias problem, which, as you
should recall from the Foundation syllabus,
is the problem with self-testing, so it calls for
tester independence, indeed insisting on it
for those performing any safety-related tests.
And, since testing is most effective when the
system is written to be testable, that’s also a
requirement.
The standard has a concept of a safety integrity
level (SIL), which is based on the likelihood of
failure for a particular component or subsystem.
The safety integrity level influences a number
of risk-related decisions, including the choice of
testing and QA techniques.
Some of the techniques are ones I discuss in
the companion volume on Advanced Test
Analyst, such as the various functional and
14. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
13
PAGE
black-box testing design techniques. Many
of the techniques are ones I discuss in the
companion volume on Advanced Technical
Test Analyst, including probabilistic testing,
dynamic analysis, data recording and analysis,
performance testing, interface testing, static
analysis, and complexity metrics. Additionally,
since thorough coverage, including during
regression testing, is important to reduce
the likelihood of missed bugs, the standard
mandates the use of applicable automated test
tools.
Again, depending on the safety integrity
level, the standard might require various
levels of testing. These levels include module
testing, integration testing, hardware-software
integration testing, safety requirements testing,
and system testing. If a level is required, the
standard states that it should be documented
and independently verified. In other words,
the standard can require auditing or outside
reviews of testing activities. Continuing in that
vein of“guarding the guards,”the standard also
requires reviews for test cases, test procedures,
and test results, along with verification of data
integrity under test conditions.
The 61508 standard requires structural testing
as a test design technique. So structural
coverage is implied, again based on the safety
integrity level. Because the desire is to have
high confidence in the safety-critical aspects
of the system, the standard requires complete
requirements coverage not once but multiple
times, at multiple levels of testing. Again, the
level of test coverage required depends on the
safety integrity level.
Now,thismightseemabitexcessive,especiallyif
you come from a very informal world. However,
the next time you step between two pieces of
metal that can move—e.g., elevator doors—ask
yourself how much risk you want to remain in
the software the controls that movement.
Let’s look at another risk-related testing
standard. The United States Federal Aviation
Administration provides a standard called DO-
178B for avionics systems. In Europe, it’s called
ED-12B.
The standard assigns a criticality level based on
the potential impact of a failure, as shown in
Table 1. Based on the criticality level, the DO-
Table 1: FAA-DO 178B mandated coverage
15. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
14
PAGE
178B standard requires a certain level of white-
box test coverage.
Criticality level A, or Catastrophic, applies
when a software failure can result in a
catastrophic failure of the system. For software
with such criticality, the standard requires
Modified Condition/Decision, Decision, and
Statement coverage.
Criticality level B, or Hazardous and Severe,
applies when a software failure can result in a
hazardous,severe,ormajorfailureofthesystem.
For software with such criticality, the standard
requires Decision and Statement coverage.
Criticality level C, or Major, applies when a
software failure can result in a major failure of
thesystem.Forsoftwarewithsuchcriticality,the
standard requires only Statement coverage.
Criticality level D, or Minor, applies when a
software failure can result in only a minor failure
of the system. For software with such criticality,
the standard does not require any level of
coverage.
Finally, criticality level E, or No effect, applies
when a software failure cannot have an effect
on the system. For software with such criticality,
the standard does not require any level of
coverage.
This makes a certain amount of sense. You
should be more concerned about software that
affects flight safety, such as rudder and aileron
control modules, than you are about software
that doesn’t, such as video entertainment
systems. Of course, lately there has been a trend
toward putting all of the software, both critical
and noncritical, on a common network in the
plane, which introduces enormous potential
risks for inadvertent interference and malicious
hacking.
However, I consider it dangerous to use a one-
dimensional white-box measuring stick to
determine how much confidence we should
have in a system. Coverage metrics are a
measure of confidence, it’s true, but we should
use multiple coverage metrics, both white-box
and black-box.5
By the way, if you found this material a bit
confusing, note that the white-box coverage
metrics used in this standard were discussed
in the Foundation syllabus in Chapter 4. If you
don’t remember these coverage metrics, you
should go back and review that material in that
chapter of the Foundation syllabus.
Risk Identification
and Assessment
Techniques
Various techniques exist for performing quality
risk identification and assessment. These range
from informal to semiformal to formal.
You can think of risk identification and
assessment as a structured form of project and
product review. In a requirements review, we
focus on what the system should do. In quality
risk identification and assessment sessions,
we focus on what the system might do that
it should not. Thus, we can see quality risk
identification and assessment as the mirror
image of the requirements, the design, and the
implementation.
As with any review, as the level of formality
increases, so does the cost, the defect removal
5: You might be tempted to say, “Well, why worry about this? It seems to work for aviation software.?” Spend a few
moments on the Risks Digest at www.risks.org and peruse some of the software-related aviation near misses. You
might feel less sanguine. There is also a discussion of the Boeing 787 design issue that relates to the use of a single
network for all onboard systems, both safety critical and non–safety critical.
16. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
15
PAGE
effectiveness, and the extent of documentation
associated with it. You’ll want to choose the
technique you use based on constraints and
needs for your project. For example, if you are
working on a short project with a very tight
budget, adopting a formal technique with
extensive documentation doesn’t make much
sense.
Let’s review the various techniques for quality
riskidentificationandassessment,frominformal
to formal, and then some ways in which you can
organize the sessions themselves.
Inmanysuccessfulimplementationsofprojects,
we use informal methods for risk-based testing.
These can work just fine. In particular, it’s a good
way to start learning about and practicing risk-
based testing because excessive formality and
paperwork can create barriers to successful
adoption of risk-based testing.
In informal techniques, we rely primarily on
history, stakeholder domain and technical
experience, and checklists of risk category to
guide us through the process. These informal
approaches are easy to put in place and to
carry out. They are lightweight in terms of both
documentation and time commitment. They
are flexible from one project to the next since
the amount of documented process is minimal.
However, since we rely so much on stakeholder
experience, these techniques are participant
dependent.Thewrongsetofparticipantsmeans
a relatively poor set of risk items and assessed
risk levels. Because we follow a checklist, if the
checklist has gaps, so does our risk analysis.
Because of the relatively high level at which
risk items are specified, they can be imprecise
both in terms of the items and the level of risk
associated with them.
That said, these informal techniques are a great
way to get started doing risk-based testing. If it
turnsoutthatamorepreciseorformaltechnique
is needed, the informal quality risk analysis can
be expanded and formalized for subsequent
projects. Even experienced users of risk-based
testing should consider informal techniques
for low-risk or agile projects. You should avoid
using informal techniques on safety-critical or
regulated projects due to the lack of precision
and tendency toward gaps.
Categories of
Quality Risks
I mentioned that informal risk-based testing
tends to rely on a checklist to identify risk items.
What are the categories of risks that we would
look for? In part, that depends on the level of
testing we are considering. Let’s start with
the early levels of testing, unit or component
testing. In the following lists, I’m going to pose
these checklist risk categories in the form of
questions, to help stimulate your thinking
about what might go wrong.
Does the unit handle state-related
behaviors properly? Do transitions
from one state to another occur when the
appropriate events occur? Are the correct
actions triggered? Are the correct events
associated with each input?
Can the unit handle the
transactions it should handle, correctly,
without any undesirable side effects?
What statements, branches,
conditions, complex condition, loops, and
other paths through the code might result
in failures?
What flows of data
into or out of the unit—whether through
17. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
16
PAGE
parameters, objects, global variables, or
persistent data structures like files or
database tables—might result in
immediate or delayed failures, including
corrupted persistent data (the worst kind
of corrupted data).
Is the functionality provided
to the rest of the system by this component
incorrect, or might it have invalid side
effects?
If this component interacts
with the user, might users have problems
understanding prompts and messages,
deciding what to do next, or feeling
comfortable with color schemes and
graphics?
For hardware components,
might this component wear out or fail after
repeated motion or use?
For hardware components,
are the signals correct and in the correct
form?
As we move into integration testing, additional
risks arises, many in the following areas:
Are
the interfaces between components well
defined? What problems might arise
in direct and indirect interaction between
components?
Again, what problems
might exist in terms of actions and side
effects, particularly as a result of
component interaction?
Are the static data
spaces such as memory and disk space
sufficient to hold the information needed?
Are the dynamic volume conduits such
as networks going to provide sufficient
bandwidth?
Will
the integrated component respond
correctly under typical and extreme
adverse conditions? Can they recover to
normal functionality after such a condition?
Can the system store, load,
modify, archive, and manipulate data
reliably, without corruption or loss of data?
What problems might exist
in terms of response time, efficient resource
utilization, and the like?
Again, for this integration
collection of components, if a user
interface is involved, might users have
problems understanding prompts and
messages, deciding what to do next, or
feeling comfortable with color schemes and
graphics?
Similar issues apply for system integration
testing, but we would be concerned with
integration of systems, not components.
Finally, what kinds of risk might we consider
for system and user acceptance testing?
Again, we need to consider
functionality problems. At these levels, the
issues we are concerned with are systemic.
Do end-to-end functions work properly?
Are deep levels of functionality and
combinations of related functions working?
In terms of
the whole system interface to the user,
are we consistent? Can the user understand
the interface? Do we mislead or distract the
user at any point? Trap the user in dead-end
interfaces?
Overall, does
18. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
17
PAGE
the system handle various states correctly?
Considering states the user or objects
acted on by the system might be in, are
there potential problems here?
Considering the entire set
of data that the system uses—including
data it might share with other systems—
can the system store, load, modify, archive,
and manipulate that data reliably, without
corrupting or losing it?
Complex systems often
require administration. Databases,
networks, and servers are examples.
Operations these administrators perform
can include essential maintenance tasks.
For example, might there be problems
with backing up and restoring files or
tables? Can you migrate the system from
one version or type of database server or
middleware to another? Can storage,
memory, or processor capacity be added?
Are there potential issues with response
time? With behavior under combined
conditions of heavy load and low
resources? Insufficient static space?
Insufficient dynamic capacity and
bandwidth?
Will
the system fail under normal, exceptional,
or heavy load conditions? Might the system
be unavailable when needed? Might it
prove unstable with certain functions?
Configuration: What installation,
data migration, application migration,
configuration, or initial conditions might
cause problems?
Will the system respond correctly under
typical and extreme adverse conditions?
Can it recover to normal functionality after
such a condition? Might its response to
such conditions create consequent
conditions that negatively affect
interoperating or cohabiting applications?
Might certain
date- or time-triggered events fail? Do
related functions that use dates or times
work properly together? Could situations
like leap years or daylight saving time
transitions cause problems? What about
time zones?
In terms of the various
languages we need to support, will some
of those character sets or translated
messages cause problems? Might currency
differences cause problems?
Do latency,
bandwidth, or other factors related to the
networking or distribution of processing
and storage cause potential problems?
Might the system be
incompatible with various environments
it has to work in? Might the system
be incompatible with interoperating
or cohabiting applications in some of the
supported environments?
What standards apply to our
system, and might it violate some of those
standards?
Is it possible for users without
proper permission to access functions or
data they should not? Are users with proper
permission potentially denied access?
Is data encrypted when it should be? Can
security attacks bypass various access
controls?
For hardware systems,
19. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
18
PAGE
might normal or exceptional operating
environments cause failures? Will humidity,
dust, or heat cause failures, either
permanent or intermittent?
Are there problems with power
consumption for hardware systems? Do
normal variations in the quality of the
power supplied cause problems? Is battery
life sufficient?
For hardware
systems, might foreseeable physical shocks,
background vibrations, or routine bumps
and drops cause failure
Is the
documentation incorrect, insufficient, or
unhelpful? Is the packaging sufficient?
Can we upgrade the
system? Apply patches? Remove or add
features from the installation media?
There are certainly other potential risk
categories, but this list forms a good starting
point. You’ll want to customize this list to your
particular systems if you use it.
Documenting
Quality Risks
In Figure 2, you see a template that can be
used to capture the information you identify
in quality risk analysis. In this template, you
start by identifying the risk items, using the
categories just discussed as a framework. Next,
for each risk item, you assess its level of risk in
terms of the factors of likelihood and impact.
You then use these two ratings to determine
the overall priority of testing and the extent
of testing. Finally, if the risks arise from specific
Figure 2: A template for capturing quality risk information
20. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
19
PAGE
requirements or design specification elements,
you establish traceability back to these
items. Let’s look at these activities and how
they generate information to populate this
template.
First, remember that quality risks are potential
system problems that could reduce user
satisfaction. We can use the risk categories to
organize the list and to jog people’s memory
about risk items to include. Working with the
stakeholders, we identify one or more quality
risk item for each category and populate the
template.
Having identified the risks, we can now go
through the list of risk items and assess the
level of risk because we can see the risk items
in relation to each other. An informal technique
typically uses main factors for assessing risk.
Thefirstisthelikelihoodoftheproblem,whichis
determined mostly by technical considerations.
I sometimes call this “technical risk” to remind
me of that fact. The second is the impact of
the problem, which is determined mostly
by business or operational considerations. I
sometimes call this“business risk”to remind me
of that fact.
Both likelihood and impact can be rated on an
ordinal scale. A three-point ordinal scale is high,
medium, and low. I prefer to use a five-point
scale, from very high to very low.
Given the likelihood and impact, we can
calculate a single, aggregate measure of risk
for the quality risk item. A generic term for this
measure of risk is risk priority number. One way
to do this is to use a formula to calculate the risk
priority number from the likelihood and impact.
First, translate the ordinal scale into a numerical
scale, as in this example:
1 = Very high
2 = High
3 = Medium
4 = Low
5 = Very low
You can then calculate the risk priority number
as the product of the two numbers. We’ll revisit
thisissueinalatersectionofthischapterbecause
this is just one of many ways to determine the
risk priority.
Theriskprioritynumbercanbeusedtosequence
the tests. To allocate test effort, I determine the
extent of testing. Figure 2 shows one way to do
this, by dividing the risk priority number into
five groups and using those to determine test
effort:
1–-5 = Extensive
6–10 = Broad
11–15 = Cursory
16–20 = Opportunity
21–25 = Report bugs only
We’ll return to the matter of variations in the
way to accomplish this later in this section.
As noted before, while you go through the
quality risk analysis process, you are likely to
generate various useful by-products. These
include implementation assumptions that you
and the stakeholders made about the system
in assessing likelihood. You’ll want to validate
these, and they might prove to be useful
suggestions. The by-products also include
project risks that you discovered, which the
project manager can address. Perhaps most
importantly, the by-products include problems
with the requirements, design, or other input
documents. We can now avoid having these
problemsturnintoactualsystemdefects.Notice
that all three enable the bug-preventive role of
testing discussed earlier in this book.
In Figure 3, you see an example of an informal
quality risk analysis. We have used six quality
categories for our framework:
21. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
20
PAGE
These are the standard quality categories used
by some groups at Hewlett Packard.
I’ve provided one or two example quality
risk items for each category. Of course, for a
typical product there would more like 100 to
500 total quality risks—perhaps even more for
particularly complex products.
Quality Risk Analysis
Using ISO 9126
We can increase the structure of an informal
quality risk analysis—formalize it slightly, if you
will—by using the ISO 9126 standard as the
quality characteristic framework instead of the
rather lengthy and unstructured list of quality
risk categories given on the previous pages.
This has some strengths. The ISO 9126 standard
providesapredefinedandthoroughframework.
The standard itself—that is, the entire set of
documents that the standard comprises—
provides a predefined way to tailor it to your
organization. If you use this across all projects,
you will have a common basis for your quality
risk analyses and thus your test coverage.
Consistency in testing across projects provides
comparability of results.
Figure 3: Informal quality risk analysis example
22. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
21
PAGE
The use of ISO 9126 in risk analysis has its
weaknesses too. For one thing, if you are not
careful tailoring the quality characteristics, you
could find that you are potentially over-broad
in your analysis. That makes you less efficient.
For another thing, applying the standard to all
projects, big and small, complex and simple,
could prove over-regimented and heavyweight
from a process point of view.
I would suggest that you consider the use of
ISO 9126 structure for risk analysis whenever a
bit more formality and structure is needed, or if
you are working on a project where standards
compliance matters. I would avoid its use on
atypical projects or projects where too much
structure, process overhead, or paperwork is
likely to cause a problem, relying instead on
the lightest-weight informal process possible in
such cases.
TorefreshyourmemoryontheISO9126standard,
here are the six quality characteristics:
Functionality, which has the
subcharacteristics of suitability, accuracy,
interoperability, security, and compliance
Reliability, which has the subcharacteristics
of maturity (robustness), fault tolerance,
recoverability, and compliance
Usability, which has the subcharacteristics
of understandability, learnability,
operability, attractiveness, and compliance
Efficiency, which has the subcharacteristics
of time behavior, resource utilization, and
compliance
Maintainability, which has the
subcharacteristics of analyzability,
changeability, stability, testability, and
compliance
Portability, which has the subcharacteristics
of adaptability, installability, coexistence,
replaceability, and compliance
You should remember, too, that in the ISTQB
taxonomyofblack-boxorbehavioraltests,those
relatedtofunctionalityanditssubcharacteristics
are functional tests, while those related to
reliability, usability, efficiency, maintainability,
and portability and their subcharacteristics are
non-functional tests.
Quality Risk Analysis
Using Cost of
Exposure
Another form of quality risk analysis is referred
to as cost of exposure, a name derived from
the financial and insurance world. The cost
of exposure—or the expected payout in
insurance parlance—is the likelihood of a loss
times the average cost of such a loss. Across a
large enough sample of risks for a long enough
period, we would expect the total amount lost
to tend toward the total of the costs of exposure
for all the risks.
So, for each risk, we should estimate, evaluate,
and balance the costs of testing versus not
testing. If the cost of testing were below the cost
of exposure for a risk, we would expect testing
to save us money on that particular risk. If the
cost of testing were above the cost of exposure
for a risk, we would expect testing not to be a
smart way to reduce costs of that risk.
This is obviously a very judicious and balanced
approach to testing. Where there’s a business
case we test, where’s there’s not we don’t.
What could be more practical? Further, in the
insurance and financial worlds, you’re likely to
23. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
22
PAGE
find that stakeholders relate easily and well to
this approach.
That said, it has some problems. In order to do
this with any degree of confidence, we need
enough data to make reasonable estimates of
likelihood and cost. Furthermore, this approach
uses monetary considerations exclusively to
decide on the extent and sequence of testing.
For some risks, the primary downsides are
nonmonetary, or at least difficult to quantify,
such as lost business and damage to company
image.
If I were working on a project in a financial
or actuarial world, and had access to data,
I’d probably lean toward this approach. The
accessibility of the technique to the other
participants in the risk analysis process is quite
valuable. However, I’d avoid this technique on
safety- or mission-critical projects. There’s no
way to account properly for the risk of injuring
people or the risk of catastrophic impact to the
business.
Quality Risk Analysis
Using Hazard
Analysis
Anotherriskanalysistechniquethatyoucanuse
is called hazard analysis. Like cost of exposure, it
fits with certain fields quite well and doesn’t fit
many others.
A hazard is the thing that creates a risk. For
example, a wet bathroom floor creates the
risk of a broken limb due to a slip and fall. In
hazard analysis, we try to understand the
hazards that create risks for our systems. This
has implications not only for testing but also for
upstream activities that can reduce the hazards
and thus reduce the likelihood of the risks.
As you might imagine, this is a very exact,
cautious, and systematic technique. Having
identified a risk, we then must ask ourselves
how that risk comes to be and what we might
do about the hazards that create the risk. In
situations in which we can’t afford to miss
anything, this makes sense.
However, in complex systems there could be
dozens or hundreds or thousands of hazards
that interact to create risks. Many of the hazards
might be beyond our ability to predict. So,
hazard analysis is overwhelmed by excessive
complexity and in fact might lead us to think
the risks are fewer than they really are. That’s
bad.
I would consider using this technique on
medical or embedded systems projects.
However, on unpredictable, rapidly evolving, or
highly complex projects, I’d avoid it.
Determining the
Aggregate Risk
Priority
We are going to cover one more approach for
risk analysis in a moment, but I want to return
to this issue of using risk factors to derive an
aggregate risk priority using a formula. You’ll
recall this was the technique shown earlier
when we multiplied the likelihood and impact
to determine the risk priority number. It is also
implicit in the cost of exposure technique,
where the cost of exposure for any given risk is
the product of the likelihood and the average
cost of a loss associated with that risk.
24. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
23
PAGE
Some people prefer to use addition rather than
multiplication. For example, Rick Craig uses
addition of the likelihood and impact.6
This
results in a more compressed and less sparse
scale of risks. To see that, take a moment to
construct two tables. Use likelihood and impact
ranging from 1–5 for each, and then populate
the tables showing all possible risk priority
number calculations for all combinations of
likelihood and impact. The tables should each
have 25 cells. In the case of addition, the risk
priority numbers range from 2–10, while in the
case of multiplication, the risk priority numbers
range from 1–25.
It’s also possible to construct sophisticated
formulas for the risk priority number, some
of which might use subfactors for each major
factor. For example, certain test management
tools such as the newer versions of Quality
Center support this. In these formulas, we can
weight some of the factors so that they account
for more points in the total risk priority score
than others.
In addition to calculating a risk priority number
for sequencing of tests, we also need to use risk
factors to allocate test effort. We can derive the
extent of testing using these factors in a couple
ways. We could try to use another formula. For
example, we could take the risk priority number
and multiply it times some given number of
hours for design and implementation and
some other number of hours for test execution.
Alternatively, we could use a qualitatively
method where we try to match the extent of
testing with the risk priority number, allowing
some variation according to tester judgment.
If you do choose to use formulas, make sure
you tune them based on historical data. Or, if
you are time-boxed in your testing, you can
use formulas based on risk priority numbers to
distribute the test effort proportionally based
on risks.
Some people prefer to use a table rather than
a formula to derive the aggregate risk priority
from the factors. Table 2 shows an example of
such a table.
First you assess the likelihood and impact as
before. You then use the table to select the
aggregate risk priority for the risk item based
on likelihood and impact scores. Notice that
the table looks quite different than the two
you constructed earlier. Now, experiment with
different mappings of risk priority numbers to
risk priority ratings—ranging from very high
to very low—to see whether the addition or
6: See his book, Systematic Software Testing.
Table 2: Using a table for risk priority
25. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
24
PAGE
multiplicationmethodmorecloselycorresponds
to this table.
As with the formulas discussed a moment ago,
you should tune the table based on historical
data.Also,youshouldincorporateflexibilityinto
this approach by allowing deviation from the
aggregate risk priority value in the table based
on stakeholder judgment for each individual
risk.
In Table 3, you see that not only can we derive
the aggregate risk rating from a table, we can
do something similar for the extent of testing.
Based on the risk priority rating, we can now
use a table like Table 3 to allocate testing effort.
You might want to take a moment to study this
table.
Stakeholder
Involvement
On a few occasions in this section so far, I’ve
mentioned the importance of stakeholder
involvement. In the last sections, we’ve looked
at various techniques for risk identification
and analysis. However, the involvement of
the right participants is just as important, and
probably more important, than the choice
of technique. The ideal technique without
adequate stakeholder involvement will usually
provide little or no valuable input, while a less-
than-ideal technique, actively supported and
participated in by all stakeholder groups, will
almost always produce useful information and
guidance for testing.
What is most critical is that we have a cross-
functional team representing all of the
stakeholders who have an interest in testing
and quality. This means that we involve at least
two general stakeholder groups. One is made
up of those who understand the needs and
Table 3: Using a table for extent of testing
26. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
25
PAGE
interests of the customers and/or users. The
other includes those who have insight into the
technical details of the system. We can involve
business stakeholders, project funders, and
others as well. Through the proper participant
mix, a good risk-based testing process gathers
information and builds consensus around what
not to test, what to test, the order in which to
test, and the way to allocate test effort.
I cannot overstate the value of this stakeholder
involvement. Lack of stakeholder involvement
leads to at least two major dysfunctions in the
risk identification and analysis. First, there is
no consensus on priority or effort allocation.
This means that people will second-guess
your testing after the fact. Second, you will
find—either during test execution or worse yet
after delivery—that there are many gaps in the
identified risks, or errors in the assessment of
the level of risk, due to the limited perspectives
involved in the process.
Whileweshouldalwaystrytoincludeacomplete
setofstakeholders,oftennotallstakeholderscan
participate or would be willing to do so. In such
cases, some stakeholders may act as surrogates
for other stakeholders. For example, in mass-
market software development, the marketing
team might ask a small sample of potential
customers to help identify potential defects
that would affect their use of the software most
heavily. In this case, the sample of potential
customers serves as a surrogate for the entire
eventual customer base. As another example,
business analysts on IT projects can sometimes
represent the users rather than involving users
in potentially distressing risk analysis sessions
where we have conversations about what could
go wrong and how bad it would be.
Failure Mode and
Effect Analysis
The last, and most formal, technique we’ll
consider for risk-based testing is Failure
Mode and Effect Analysis. This technique was
developed originally as a design-for-quality
technique. However, you can extend it for risk-
based software and systems testing. As with
an informal technique, we identify quality risk
items, in this case called failure modes. We tend
to be more fine grained about this than we
would in an informal approach. This is in part
because, after identifying the failure modes, we
then identity the effects those failure modes
would have on users, customers, society, the
business, and other project stakeholders.
This technique has as its strength the properties
of precision and meticulousness.When it’s used
properly, you’re less likely to miss an important
quality risk with this technique than with the
other techniques. Hazard analysis is similarly
precise, but it tends to be overwhelmed by
complexity due to the need to analyze the
upstream hazards that cause risks. For Failure
Mode and Effect Analysis (often called FMEA,
or “fuh-me-uh”), the downstream analysis of
effects is easier, making the technique more
general.
ISTQB Glossary
Failure Mode and Effect Analysis (FMEA): A
systematic approach to risk identification
and analysis in which you identify possible
modes of failure and attempt to prevent their
occurrence.
Failure Mode, Effect and Criticality Analysis
(FMECA): An extension of FMEA, as in addition
to the basic FMEA, it includes a criticality
27. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
26
PAGE
analysis, which is used to chart the probability
of failure modes against the severity of their
consequences. The result highlights failure
modes with relatively high probability and
severity of consequences, allowing remedial
effort to be directed where it will produce the
greatest value.
However, this precision and meticulousness
has its weaknesses. It tends to produce lengthy
outputs. It is document heavy.The large volume
of documentation produced requires a lot of
work not only during the initial analysis, but
also during maintenance of the analysis during
the project and on subsequent projects. It is
also hard to learn, requiring much practice to
master. If you must learn to use FMEA, it’s best
to start with an informal technique for quality
risk analysis on another project first or to first
do an informal quality risk analysis and then
upgrade that to FMEA after it is complete.
I have used FMEA on a number of projects,
and would definitely consider it for high-risk
or conservative projects. However, for chaotic,
fast-changing, or prototyping projects, I would
avoid it.
Failure mode and effect analysis was originally
developed to help prevent defects during
design and implementation work. I came across
the idea initially in D.H. Stamatis’s book Failure
Mode and Effect Analysis and decided to apply
it to software and hardware/software systems
based on some work I was doing with clients in
the mid-1990s. I later included a discussion of it
in my first book, Managing the Testing Process,
published in 1999, which as far as I know makes
it the first software-testing-focused discussion
of the technique. I discussed it further in Critical
Testing Processes as well. So, I can’t claim to
have invented the technique by any means, but
I can claim to have been a leading popularizer
of the technique amongst software testers.
Failure Mode and Effect Analysis exists in
several variants. One is Failure Mode, Effects and
Criticality Analysis (FMECA, or “fuh-me-kuh”),
where the criticality of each effect is assessed
along with other factors affecting the level of
risk for the effect in question.
Two other variants—at least in naming—exist
whenthetechniqueisappliedtosoftware.These
are software failure mode and effect analysis
and software failure mode, effects and criticality
analysis. In practice, I usually hear people use
the terms FMEA and FMECA in the context of
both software and hardware/software systems.
In this book, we’ll focus on FMEA. The changes
involved in the criticality analysis are minor and
we can ignore them here.
Quality Risk Analysis
Using Failure Mode
and Effect Analysis
The FMEA approach is iterative. In other words,
reevaluation of residual risk—on an effect-
by-effect basis—is repeated throughout the
process. Since this technique began as a design
and implementation technique, ideally the
technique is used early in the project.
As with other forms of risk analysis, we would
expect test analysts and test managers to
contribute to the process and the creation of
the FMEA document. Because the documents
can be intricate, it’s important that testers
who want to contribute understand their
purpose and application. As with any other
risk analysis, test analysts and test managers,
like all participants, should be able to apply
their knowledge, skills, experience, and unique
outlook to help perform the risk analysis itself,
following a FMEA approach.
28. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
27
PAGE
As I mentioned before, FMEA and its variants
are not ideal for all projects. However, it should
be applied when appropriate, as it is precise
and thorough. Specifically, FMEA makes sense
under the following circumstances:
The software, system, or system of systems
is potentially critical and the risk of failure
must be brought to a minimum. For
example, avionics software, industrial
control software, and nuclear control
software would deserve this type of
scrutiny.
The system is subject to mandatory risk-
reduction or regulatory requirements—for
example, medical systems or those subject
to ISO 61508.
The risk of project delay is unacceptable,
so management has decided to invest extra
effort to remove defects during early stages
of the project. This involves using the
design and implementation aspects of
FMEA more so than the testing aspects.
The system is both complex and safety
critical, so close analysis is needed to
define special test considerations,
operational constraints, and design
decisions. For example, a battlefield
command, communication, and control
system that tied together disparate systems
participating in the ever-changing scenario
of a modern battle would benefit from the
technique.
As I mentioned earlier, if necessary, you can use
an informal quality risk analysis technique first,
then augment that to include the additional
precision and factors considered with FMEA.
Since FMEA arose from the world of design
and implementation—not testing—and
since it is inherently iterative, you should
plan to schedule FMEA activities very early
in the process, even if only preliminary, high-
level information is available. For example, a
marketing requirements document or even
a project charter can suffice to start. As more
informationbecomesavailable,andasdecisions
firm up, you can refine the FMEA based on the
additional details.
Additionally, you can perform a FMEA at any
level of system or software decomposition. In
other words, you can—and I have—perform a
FMEA on a system, but you can—and I have—
also perform it on a subset of system modules
during integration testing or even on an single
module or component.
Whether you start at the system level, the
integration level, or the component level, the
process is the same. First, working function
by function, quality characteristic by quality
characteristic, or quality risk category by quality
riskcategory,identifythefailuremodes.Afailure
mode is exactly what it sounds like: a way in
which something can fail. For example, if we are
considering an e-commerce system’s security,
a failure mode could be “Allows inappropriate
access to customer credit card data.”So far, this
probably sounds much like informal quality risk
analysis to you, but the next step is the point at
which it gets different.
In the next step, we try to identify the possible
causes for each failure mode. This is not
something included in the informal techniques
we discussed before. Why do we do this? Well,
remember that FMEA is originally a design and
implementation tool. We try to identify causes
for failures so we can define those causes out
of the design and avoid introducing them
into the implementation. To continue with
our e-commerce example, one cause of the
inappropriate access failure mode could be
“Credit card data not encrypted.”
The next step, also unique to FMEA, is that, for
each failure mode, we identify the possible
effects. Those effects can be on the system
29. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
28
PAGE
itself, on users, on customers, on other project
and product stakeholders, even on society as a
whole. (Remember, this technique is often used
for safety-critical systems like nuclear control
where society is indeed affected by failures.)
Again, using our e-commerce example, one
effect of the access failure mode could be
“Fraudulent charges to customer credit cards.”
Based on these three elements—the failure
mode, the cause, and the effect—we can then
assess the level of risk. We’ll look at how this
works in just a moment. We can also assess
criticality. In our e-commerce example, we’d say
that leakage of credit card data is critical.
Now, we can decide what types of mitigation or
risk reduction steps we can take for each failure
mode. In our informal approaches to quality
risk analysis, we limited ourselves to defining
an extent of testing to be performed here.
However, in FMEA—assuming we involved the
right people—we can specify other design and
implementation steps too. For the e-commerce
example, a mitigation step might be “Encrypt
all credit card data.” A testing step might be
“Penetration-test the encryption.”
Notice that this example highlights the iterative
elements of this technique. The mitigation step
of encryption reduces the likelihood of the
failure mode, but it introduces new causes for
the failure mode, such as “Weak keys used for
encryption.”
We not only iterate during the process, we
iterate at regular intervals in the lifecycle, as
we gain new information and carry out risk
mitigation steps, to refine the failure modes,
causes, effects, and mitigation actions.
Determining the
Risk Priority Number
Let’s return to the topic of risk factors and the
overall level of risk. In FMEA, people commonly
refer to the overall level of risk as the risk priority
number, or RPN.
When doing FMEA, there are typically three
risk factors used to determine the risk priority
number:
Severity. This is an assessment of the impact
of the failure mode on the system, based on
the failure mode itself and the effects.
Priority. This is an assessment of the impact
of the failure mode on users, customers,
the business, stakeholders, the project, the
product, and society, based on the effects.
Detection. This is an assessment of the
likelihood of the problem existing in the
system and escaping detection without
any additional mitigation. This takes into
consideration the causes of the failure
mode and the failure mode itself.
People performing a FMEA often rate these
risk factors on a numerical scale. You can use
a 1 to 10 scale, though a 1 to 5 scale is also
common. You can use either a descending or
ascending, so long as each of the factors uses
the same type of scale, either all descending or
all ascending. In other words, 1 can be the most
risky assessment or the least risky, respectively.
If you use a 1 to 10 scale, then a descending
scale means 10 is the least risky. If you use a 1
to 5 scale, then a descending scale means 5 is
the least risky. For ascending scales, the most
risky would be 10 or 5, depending on the scale.
Personally, I always worry about using anything
finer grained than a five-point scale. Unless I
30. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
29
PAGE
can actually tell the difference between a 9 and
a 10 or a 2 and a 3, for example, it seems like
I’m just lying to others and myself about the
level of detail at which I understand the risks.
Trying to achieve this degree of precision can
also lengthen debates between stakeholders
in the risk analysis process, often to little if any
benefit.
AsImentionedbefore,youdeterminetheoverall
or aggregate measure of risk, the risk priority
number (or RPN), using the three factors. The
simplest way to do this—and one in common
use—is to multiply the three factors. However,
you can also add the factors. You can also use
more complex calculations, including the use of
weighting to emphasize one or two factors.
As with risk priority numbers for the informal
techniques discussed earlier, the FMEA RPN will
help determine the level of effort we invest in
risk mitigation. However, note that FMEA risk
mitigation isn’t always just through testing.
In fact, multiple levels of risk mitigation could
occur, particularly if the RPN is serious enough.
Where failure modes are addressed through
testing, we can use the FMEA RPN to sequence
the test cases. Each test case inherits the RPN
for the highest-priority risk related to it. We can
then sequence the test cases in risk priority
order wherever possible.
Benefits, Costs, and
Challenges of FMEA
So, what are the benefits of FMEA? In addition
to being precise and thorough—and thus less
likelytomisassessoromitrisks—FMEAprovides
otheradvantages.Itrequiresdetailedanalysisof
expected system failures that could be caused
by software failures or usage errors, resulting in
a complete view—if perhaps an overwhelming
view—of the potential problems.
If FMEA is used at the system level—rather
than only at a component level—we can have a
detailed view of potential problems across the
system. In other words, if we consider systemic
risks, including emergent reliability, security,
and performance risks, we have a deeply
informed understanding of system risks. Again,
those performing and especially managing
the analysis can find this overwhelming, and it
certainly requires a significant time investment
to understand the entire view and its import.
As I’ve mentioned, another advantage of
FMEA—as opposed to other quality risk
analysis techniques discussed—is that we
can use our analysis to help guide design and
implementation decisions. The analysis can
also provide justification for not doing certain
things, for avoiding certain design decisions, for
not implementing in a particular way or with a
particular technology.
As with any quality risk analysis technique, our
FMEA analysis can focus our testing on specific,
critical areas of the system. However, because
it’s more precise than other techniques, the
focusing effect is correspondingly more precise.
This can have test design implications, too,
since you might choose to implement more
fine-grained tests to take the finer-grained
understanding of risk into account.
There are costs and challenges associated with
FMEA,ofcourse.Foronething,youhavetoforce
yourself to think about use cases, scenarios,
and other realities that can lead to sequences
of failures. Because of the fine-grained nature
of the analysis, it’s easy to focus on each
failure mode in isolation, without considering
everything else that’s going on. You can—and
should—overcome this challenge, of course.
31. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
30
PAGE
As mentioned a few times, the FMEA tables
and other documentation can be huge. This
means that participants and those managing
the analysis can find the development and
maintenance of these documents a large, time-
consuming, expensive investment.
As originally conceived, FMEA works function
by function. When looking at a component or
a complex system, it might be difficult to define
independent functions. I’ve managed to get
around this myself by doing the analysis not
just by function, but by quality characteristic or
by quality risk category.
Finally, when trying to anticipate causes,
it might be challenging to distinguish true
causes from intermediate effects. For example,
suppose we are considering a failure mode
for an e-commerce system such as “Foreign
currency transactions rejected.” We could list a
cause as “Credit card validation cannot handle
foreign currency.” However, the true cause
might be that we simply haven’t enabled
foreign currency processing with our credit
card processing vendor, which is a simple
implementation detail—provided someone
remembers to do it. These challenges are in
addition to those discussed earlier for quality
risk analysis in general.
Case Study of FMEA
In Figure 4, you see an example of a quality risk
analysis document. It is a case study of an actual
project. This document—and the approach we
used—followed the Failure Mode and Effect
Analysis approach.
As you can see, we started—at the left side of
the figure—with a specific function and then
identified failure modes and their possible
effects. We determined criticality based on the
Figure 4: Case study of Failure Mode and Effect Analysis
32. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
31
PAGE
effects, along with the severity and priority. We
listed possible causes to enable bug prevention
work during requirements, design, and
implementation.
Next, we looked at detection methods—those
methods we expected to apply anyway for this
project. The more likely the failure mode was
to escape detection, the worse the detection
number. We calculated a risk priority number
based on the severity, priority, and detection
numbers.Smallernumberswereworse.Severity,
priority, and detection each ranged from 1 to
5. So the risk priority number ranged from 1 to
125.
Thisparticularfigureshowsthehighest-levelrisk
items only because it was sorted by risk priority
number. For these risk items, we’d expect a lot of
additional detection and other recommended
risk control actions. You can see that we have
assigned some additional actions at this point
but have not yet assigned the owners.
During testing actions associated with a risk
item, we’d expect that the number of test
cases, the amount of test data, and the degree
of test coverage would all increase as the risk
increased. Notice that we can allow any test
procedures that cover a risk item to inherit the
level of risk from the risk item. That documents
the priority of the test procedure, based on the
level of risk.
Risk Based Testing
and the Testing
Process
We’ve talked so far about quality risk analysis
techniques. As with any technique, we have
to align and integrate the selected quality
risk analysis technique with the larger testing
process and indeed the larger software or
system development process. Table 8 shows a
general process that you can use to organize
the quality risk identification, assessment, and
management process for quality risk-based
testing.7
Let’s go through the process step-by-step and
in detail. Identify the stakeholders who will
participate. This is essential to obtain the most
benefit from quality risk-based testing. You
want a cross-functional team that represents
the interests of all stakeholders in testing and
quality. The better the representation of all
interests, the less likely it is that you will miss
key risk items or improperly estimate the levels
of risk associated with each risk item. These
stakeholders are typically in two groups. The
first group consists of those who understand
the needs and interests of the customers and
users—or are the customers and users.They see
potential business-related problems and can
assess the impact. The second group consists
of those who understand the technical details.
They see what is likely to go wrong and how
likely it is.
Selectatechnique.Theprevioussectionsshould
have given you some ideas on how to do that.
Identify the quality risk items using the
technique chosen. Assess the level of risk
associated with each item. The identification
and assessment can occur as a single meeting,
using brainstorming or similar techniques, or as
a series of interviews, either with small groups
or one-on-one. Try to achieve consensus on the
rating for each risk item If you can’t, escalate
to the appropriate level of management. Now
select appropriate mitigation techniques.
Remember that this doesn’t just have to be
testing at one or more level. It can also include
7: This process was first published in my book Critical Testing Processes.
33. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
32
PAGE
also reviews of requirements, design, and code;
defensive programming techniques; static
analysis to ensure secure and high-quality code;
and so forth.
Deliver the by-products. Risk identification and
analysisoftenlocatesproblemsinrequirements,
design, code, or other project documents,
models, and deliverables. These can be actual
defects in these documents, project risks, and
implementation assumptions and suggestions.
You should send these by-products to the right
person for handling.
Review, revise, and finalize the quality risk
document that was produced.
This document is now a valuable project work
product. You should save it to the project
repository,placingitundersomeformofchange
control.The document should change only with
the knowledge of—ideally, the consent of—the
other stakeholders who participated.
That said, it will change. You should plan to
revise the risk assessment at regular intervals.
For example, review and update the document
at major project milestones such as the
completion of the requirements, design, and
implementation phases and at test level entry
and exit reviews. Also, review and update when
significant chunks of new information become
available, such as at the completion of the first
Table 8: Quality risk analysis process
34. Advanced
Software
Testing
-
Vol.
2:
Guide
to
the
ISTQB
Advanced
Certification
as
an
Advanced
Test
Manager
33
PAGE
test cycle in a test level. You should plan to add
new risk items and reassess the level of risk for
the existing items.
Throughout this process, be careful to preserve
the collaborative nature of the endeavor.
In addition to the information-gathering
nature of the process, the consensus-building
aspects are critical. Both business-focused and
technically focused participants can and should
help prioritize the risks and select mitigation
strategies. This way, everyone has some
responsibility for and ownership of the testing
effort that will be undertaken.
Risk-Based Testing
throughout the
Lifecycle
A basic principle of testing discussed in the
Foundation syllabus is the principle of early
testing and quality assurance. This principle
stresses the preventive potential of testing.
Preventivetestingispartofanalyticalrisk-based
testing. It’s implicit in the informal quality risk
analysis techniques and explicit in FMEA.
Preventive testing means that we mitigate risk
before test execution starts. This can entail
early preparation of testware, pretesting test
environments, pretesting early versions of the
product well before a test level starts, insisting
on tougher entry criteria to testing, ensuring
requirements for and designing for testability,
participatinginreviewsincludingretrospectives
for earlier project activities, participating
in problem and change management, and
monitoring the project progress and quality.
In preventive testing, we integrate quality risk
control actions into the entire lifecycle. Test
managers should look for opportunities to
control risk using various techniques, such as
those listed here:
An appropriate test design technique
Reviews and inspection
Reviews of test design
An appropriate level of independence for
the various levels of testing
The use of the most experienced person
on test tasks
The strategies chosen for confirmation
testing (retesting) and regression testing
Preventive test strategies acknowledge that
we can and should mitigate quality risks using
a broad range of activities, many of them not
what we traditionally think of as “testing.”
For example, if the requirements are not well
written, perhaps we should institute reviews
to improve their quality rather than relying on
tests that will be run once the badly written
requirements become a bad design and
ultimately bad, buggy code?
Dynamic testing is not effective against all
kinds of quality risks. For example, while we
can easily find maintainability issues related to
poor coding practices in a code review—which
is a static test—dynamic testing will only reveal
the consequences of unmaintainable code over
time, as excessive regression starts to occur.
In some cases, it’s possible to estimate the risk
reductioneffectivenessoftestingingeneraland
of specific test techniques for given risk items.
For example, use-case-based functional tests
are unlikely to do much to reduce performance
or reliability risks.
So, there’s not much point in using dynamic
testing to reduce risk where there is a low level
of test effectiveness. Quality risk analysis, done
earlierintheproject,makesprojectstakeholders
aware of quality risk mitigation opportunities