2011/09/20 - Software Testing

Software Testing

Fernando Brito e Abreu
DCTI / ISCTE-IUL
QUASAR Research Group

SWEBOK: the 10 Knowledge Areas
 Software Requirements
 Software Design
 Software Construction
 Software Testing
 Software Maintenance
 Software Configuration Management
 Software Engineering Management
 Software Engineering Process
 Software Engineering Tools and Methods
 Software Quality
27-Sep-11 Software Engineering / Fernando Brito e Abreu 2

Motivation - The Bad News ...
 Software bugs cost the U.S. economy an
estimated $59.5 billion annually, or about 0.6%
of the gross domestic product.
 Sw users shoulder more than half of the costs
 Sw developers and vendors bear the remainder
of the costs.
Source:The Economic Impacts of Inadequate Infrastructure for
Software Testing, Technical Report, National Institute of
Standards and Technology, USA, May 2002
http://www.nist.gov/director/prog-ofc/report02-3.pdf

Motivation - The GOOD News!
According to the same report:
 More than 1/3 of the costs (an estimated $22.2
billion) can be eliminated with earlier and more
effective identification and removal of software
defects.
 Savings can mainly occur in the development
stage, when errors are introduced.
 More than half of these errors aren't detected until
later in the development process or during post-sale
software use.

Motivation
 Reliability is one of the most important software
quality characteristics
 Reliability has a strong financial impact:
 betterimage of producer
 reduction of maintenance costs
 celebration or revalidation of maintenance contracts,
new developments, etc.

 The quest for Reliability is the aim of V&V !

Verification and Validation (V&V)
 Verification - product correctness and
consistency in a given development phase, face
to products and standards used as input to that
phase - "Do the Job Right"
 Validation - product conformity with specified
requirements - "Do the Right Job"
 Basically two complementary V&V techniques :
 Reviews (Walkthroughs, Inspections, ...)
 Tests


Summary
 Software Testing Fundamentals
 Test Levels
 Test Techniques
 Test-related Measures
 Test Process


Summary
 Test Levels
 Test Techniques
 Test Process


Testing is …

 … an activity performed for evaluating product
quality, and for improving it, by identifying
defects and problems.

 … the dynamic verification of the behavior of a
program on a finite set of test cases, suitably
selected from the usually infinite executions
domain, against the expected behavior.


Dynamic versus static verification
 Testing always implies executing the program on
(valued) inputs; therefore is a dynamic technique
 The input value alone is not always sufficient to determine a
test, since a complex, nondeterministic system might react to
the same input with different behaviors, depending on its state

 Different from testing and complementary to it are static
techniques (described in the Software Quality KA)


Terminology issues
 Error
 the human cause for defect existence (although bugs walk …)

 Fault or defect (aka bug)
 incorrectness, omission or undesirable characteristic in a deliverable
 the cause of a failure

 Failure
 Undesired effect (malfunction) observed in the system’s delivered service
 Incorrectness in the functioning of a system

 See: IEEE Standard for SE Terminology (IEEE610-90)


Testing views
 Testing for defect identification
A successful test is one which causes a system to fail
 Testing can reveal failures, but it is the faults (defects) that
must be removed

 Testing to demonstrate (that the software meets its
specifications or other desired properties)
A successful test is one where no failures are observed
 Fault detection (e.g. in code) is often hard through failure
exposure
 Identifying all failure-causing input sets (i.e. those sets of inputs that
cause a failure to appear) may not be feasible


Summary
 Test Levels
 Test Techniques
 Test Process


Test Levels
Objectives of testing
 Testing can be aimed at verifying different properties:
 Checking if functional specifications are implemented right
 aka conformance testing, correctness testing, or functional testing

 Checking nonfunctional properties
 E.g. performance, reliability evaluation, reliability measurement,
usability evaluation, etc

 Stating the objective in precise, quantitative terms
allows control to be established over the test process
 Often objectives are qualitative or not even stated explicitly


Test Levels
Objectives of testing
 Acceptance /  Regression testing
Qualification testing  Performance testing
 Installation testing  Stress testing
 Alpha and beta testing  Back-to-back testing
 Conformance /  Recovery testing
Functional /  Configuration testing
Correctness testing
 Usability testing
 Reliability achievement
and evaluation

Test Levels – Objectives of testing
Acceptance / Qualification testing
 Checks the system behavior against the
customer’s requirements
 The customer may not exist yet, so someone has to
forecast his intended requirements

 This testing activity may or may not involve the
developers of the system


Installation testing
 Installation testing can be viewed as system
testing conducted once again according to
hardware configuration requirements
 Usually performed in the target environment at the
customer’s premises

 Installation procedures may also be verified
 e.g. is the customer local expert able to add a new
user in the developed system?

Alpha and beta testing
 Before the software is released, it is sometimes
given to a small, representative set of potential
users for trial use. Those users may be:
 in-house (alpha testing)
 external (beta testing)

 These users report problems with the product
 Alpha and beta use is often uncontrolled, and is not
always referred to in a test plan

Conformance / Functional / Correctness testing

 Conformance testing is aimed at validating
whether or not the observed behavior of the
tested software conforms to its specifications


Reliability achievement and evaluation
 Testing is a means to improve reliability

 By randomly generating test cases according to
the operational profile, statistical measures of
reliability can be derived

 Reliability growth models allow to express this
reality

Reliability growth models
 Provide a prediction of reliability based on the
failures observed under reliability achievement
and evaluation

 They assume, in general, that:
a growing number of well-succeeded tests increases
our confidence on the system’s reliability
 the faults that caused the observed failures are fixed
after being found (thus, on average, product’s
reliability has an increasing trend)


Reliability growth models
 Many models were published, which are divided
into:
 failure-count
models
 time-between failure models


Regression testing (1/2)
 Regression testing is:
 The “selective retesting of a system or component to verify
that modifications have not caused unintended effects.”
(IEEE610.12-90)
 Any repetition of tests intended to show that the software’s
behavior is unchanged, except insofar as required
 A technique to combat side-effects!

 In practice, the idea is to show that software which
previously passed the tests still does


Regression testing (2/2)
 A trade-off must be made between:
 theassurance given by regression every time a change is made
 … and the resources required to do that

 To allow regression tests we must build, incrementally, a
test battery

 Regression testing is more feasible if we have tools to
record and playback test cases
 Several commercial user interface event-caption tools (black-
box testing) exist

Performance testing / Stress testing
 Aimed at verifying that the software meets the
specified performance requirements:
 e.g.volume testing and response time
 The performance degradation under increasingly
exigent scenarios should be plotted

 If we exercise software at the maximum design
load (or beyond it), we call it stress testing


Back-to-back testing

 A single test set is performed on two
implemented versions of a software product

 The results are compared
 Whenever a mismatch occurs, then one of the two
versions (at least) is probably evidencing failure


Recovery testing

 Aimed at verifying software restart capabilities
after a “disaster”

 Recovery testing is a fundamental step in
building a contingency plan


Configuration testing
 When software is built to serve different users,
configuration testing analyzes the software under
the various specified configurations
 The problem is similar when the hardware of software
platform varies somehow (e.g. different mobile phone
versions, different browsers)
 This is one of the main issues in software
product lines development
 See: http://www.sei.cmu.edu/plp/framework.html

Usability testing
 This process evaluates how easy it is for end-
users to use and learn the software, including:
 user documentation
 initial installation and extension through add-ons
 effectively support in user tasks
…


Test Levels
The target of the test
 Unit testing
 the target is a single module

 Integration testing
 the target is a group of modules (related by purpose, use,
behavior, or structure)

 System testing
 the target is a whole system


Test Levels – The target of the test
Unit testing

 Verifies the functioning in isolation of software pieces
which are separately testable
 Depending on the context, they can be individual subprograms
or a larger component made of tightly related units

 Typically, unit testing occurs with:
 access to the code being tested
 support of debugging tools
 the programmers who wrote the code


Integration testing
 Is the process of verifying the interaction between
software components
 Classical integration testing strategies
 top-down or bottom-up, are used with hierarchically structured sw

 Modern systematic integration strategies
 architecture-driven, which implies integrating the software
components or subsystems based on identified functional threads

 Except for small, simple software, systematic,
incremental integration testing strategies are usually
preferred to putting all the components together at once
 The latter is called “big bang” testing


System testing

 The majority of functional failures should already have
been identified during unit and integration testing

 Main concerns:
 Assessing if the system complies to the non-functional
requirements, such as security, speed, accuracy, and reliability
 Assess if the external interfaces to other applications,
utilities, hardware devices, or the operating environment are
performed well


Test Levels
Identifying the test set

 Test adequacy criteria
 Isthe test set consistent?
 How much testing is enough?
 How many test cases should be selected?

 Test selection criteria
 How is the test set composed?
 Which test cases should be selected?


Test case selection
 Proposed test techniques differ essentially in
how they select the test set, which may yield
vastly different degrees of effectiveness

 In practice, risk analysis techniques and test
engineering expertise are applied to identify the
most suitable selection criterion under given
conditions


How large should a test battery be?
 Even in simple programs, so many test cases are
theoretically possible that exhaustive testing
could require months or years to execute
 In practice the whole test set can generally be
considered infinite

 Testing always implies a trade-off:
 limitedresources and schedules on the one hand
 inherently unlimited test requirements on the other


After testing …

 Even after successful completion of extensive
testing, the software could still contain faults

 The remedy for sw failures found after delivery is
provided by corrective maintenance actions
 This will be covered in the Software Maintenance KA


Summary
 Test Levels
 Test Techniques
 Test Process


Test Techniques
 Functional / Black box (based on user’s intuition
and experience)
 Based on tester's intuition and experience
 Specification-based
 Code-based
 Usage-based
 Fault-based
 Based on nature of application
 Selecting and combining techniques
Software Engineering / Fernando Brito e Abreu 39
27-Sep-11

Functional Tests (Black-Box) actors
 A relevant aspect of black-
box testing is that it is not
compulsory to use
programming experts to
produce a test battery

 Extensive invalid input
characterization heavily
relies on tester experience


Case study:
Tool to capture GUI events
(functional test cases)

Functional Test Tools - Visual Test


Functional Test Tools
Grouping of test cases

Test cases

Test battery
(test suite)

Reusable test code



Integration with other
Rational tools


Integration with other
Rational tools

Test cases to
execute in this suite

Reported failures


Assessing Functional Test Coverage
 The ReModeler tool from the
QUASAR team takes an
innovative model-based
approach to represent this
kind of testing coverage
 The color represents the
percentage of the scenarios of
each use case that were
executed by a given test suite


Test Techniques
Based on tester's intuition and experience

 Ad hoc testing
 Perhaps the most widely practiced technique remains
ad hoc testing
 Tests are derived relying on the software engineer’s
skill, intuition, and experience with similar programs
 Ad hoc testing might be useful for identifying special
tests, those not easily captured by formalized
techniques


Test Techniques
Based on tester's intuition and experience
 Exploratory testing
 Simultaneous learning, test design and execution
 The tests are not defined in advance in an established test
plan, but are dynamically designed, executed, and modified
 The effectiveness of this approach relies on the tester
knowledge, which can be derived from many sources:
 observed product behavior during previous version testing
 familiarity with the application, platform, failure process

 type of possible faults and failures

 the risk associated with a particular product

 …

Test Techniques
Specification-based

 Equivalence partitioning
 Boundary-value analysis
 Decision table
 Finite-state machine-based
 Testing from formal specifications
 Random testing


Test Techniques – Specification-based
Equivalence partitioning

 The input domain is subdivided into a collection
of subsets, or equivalent classes, which are
deemed equivalent according to a specified
relation, and a representative set of tests
(sometimes only one) is taken from each class.


Boundary-value analysis

 Test cases are chosen on and near the boundaries of
the input domain of variables, with the underlying
rationale that many faults tend to concentrate near the
extreme values of inputs

 An extension of this technique is robustness testing,
wherein test cases are also chosen outside the input
domain of variables, to test program robustness to
unexpected or erroneous inputs

Case study:
Equivalence partitioning and
boundary-value analysis

Triangle Classifier
 Classic problem proposed in [Myers79] and
[Hetzel84]:

 Distinct classification criteria:
 dimension of sides - equilateral, isosceles or scalene
 bigger angle - acute, rectangle or obtuse


Triangle Classifier: specification
 Input:
 dimensions of the three sides: three numbers,
separated by commas (or two angles instead).
 Algorithm:
 If the dimension of one side is superior to the sum
of the other two, then write ”Not a triangle!"
 If it is a valid triangle, then write its classification:
 according to the biggest angle - obtuse, rectangle or
acute
 according to the side dimension - scalene, isosceles or
equilateral

Triangle Classifier: specification
 Output: Write a test case battery for the triangle
classifier

 For each test case, specify:
 inputvalues (including invalid or unexpected
conditions)
 corresponding expected output values
 Example: 3,4,5 -> scalene, rectangle


Triangle Classifier
equivalence partitioning
 For a complete test battery, we need to:
 divide the solution space in partitions
 identify typical cases for each partition
 identify frontier cases
 identify extreme cases
 identify invalid cases.

 Now it is your time to work ...
 Don’t turn the page until you finished!

Triangle Classifier
partitions and typical cases
SCALENE ISOSCELES EQUILATERAL
OBTUSE 10, 6, 5 12, 7, 7 Impossible
RECTANGLE 5, 4, 3 18 , 3, 3 Impossible
ACUTE 6, 5, 2 7,7,4 6, 6, 6

SCALENE ISOSCELES EQUILATERAL
OBTUSE 120º, 40º (20º) 120º, 30º (30º) Impossible
RECTANGLE 90º, 40º (50º) 90º, 45º (45º) Impossible
ACUTE 30º, 70º (80º) 30º, 75º (75º) 60º, 60º (60º)


Triangle Classifier
boundary values
4.001, 4, 3.999 almost equilateral (scalene acute)
4.0001, 4, 4 almost equilateral (isosceles acute)
3, 4.9999, 5 almost isosceles (scalene acute)
9, 4.9999, 5 almost isosceles (scalene obtuse)
5, 3.9999, 3 almost rectangle (scalene acute)
5.0001, 4, 3 almost rectangle (scalene obtuse)
1, 1, 1.4141 almost rectangle (isosceles acute)
1, 1, 1.4143 almost rectangle (isosceles obtuse)


Triangle Classifier
extreme cases

 1, 2, 3 line segment!
 0, 0, 0 point!

Note: extreme cases are not invalid!


Triangle Classifier:
invalid cases
 6, 4, 0 null side!
 12, 4, 3 not a triangle!
 5, 3, 2, 5 four sides!
 2, 5 one side missing!
 3.45 only one side!
 No value!
 3, , 4, 6 incorrect format
 4A, 3, 7 invalid value
 6, -1, 4 negative value

Triangle Classifier
invalid cases

 As we saw, apparently simple problems, often
have some subtleties that make testing more
complex than expected!

 Frontier values and invalid input state spaces are
the most likely situations producing failures


Decision table

 Decision tables represent logical relationships between
conditions (roughly, inputs) and actions (roughly,
outputs)

 Test cases are systematically derived by considering
every possible combination of conditions and actions

 A related technique is cause-effect graphing


Finite-state machine-based

 By modeling a program as a finite state machine,
tests can be selected in order to cover states and
transitions on it


Testing from formal specifications

 Giving the specifications in a formal language
allows for automatic derivation of functional
test cases
 Atthe same time, provides a reference output, an
oracle, for checking test results
 This is an active research topic


Random testing

 Tests are generated in a stochastic (non-deterministic)
way

 This form of testing falls under the heading of the
specification-based entry, since at least the input
domain must be known, to be able to pick random
points within it


Random testing
 We simulate the data input by generating sequences
of values that may occur in practice
 This process must be repeated on and on because, in the
long term, we can generate all possible input combinations

 This approach is only feasible with a tool, a case test
generator - its input is some sort of description of
possible input values input, their sequence and
probability of occurrence


Random testing
 Random tests are often used to test compilers,
through the generation of random programs
 The description of possible input sequences can be made
with BNF (Backus Naur Form)

 Random testing can also be used in testing
communications protocol software
 The description of possible input sequences can be made
out of the state machines that describe each of the involved
parties


Test Techniques
Code-based (aka white box)

 Control-flow-based criteria

 Data flow-based criteria


Test Techniques – Code-based
Control-flow-based criteria

 Several testing tools allow the generation of
Control Flow Graphs from source code.

 By instrumenting source code these tools allow to
verify graphically the execution of each edge and
node in the network



 The strongest control-flow-based criteria is path
testing, which aims executing all entry-to-exit
control flow paths in the flowgraph
 Full path testing is generally not feasible because of
loops


 Control-flow-based coverage criteria is aimed at
covering all the statements or blocks of statements in a
program
 Several coverage criteria have been proposed, like
condition/decision coverage
 A test battery coverage is the percentage of the total code (e.g.
statements or branches/decisions coverage) which is exercised
by that battery

 Code coverage is a much less stringent criteria than path
coverage

Case study:
Graph-based control flow
testing techniques

Control flow graphs
 Are a graphical representation of programs that
traduces the ways they can be transversed
during execution
 nodes represent decisions
 oriented edges represent sets of sequential
instructions
 In more complex code segments, the graph looks like
spaghetti - more tests are needed


Control flow graphs


Example: tax calculation
 Consider an IRS tax system that reads annual
income revenues and determines the
corresponding tax due:
 If the total income is less than 25K EUROS no tax is
deducted
 If it is above that, but less than 100K EUROS, the tax
is 7%
 otherwise is 15%


Example: tax calculation
1
Function Calculates_Tax ( Int n) 2
Array of Int income; 5 6
Int total,tax; 3
1. total, tax = 0;
2
2. for i=1 to n 7
5
4
3. {read(income[i]);
7
4. total = total + income[i]};
5. if total >= 25000 then 9 8 9
6. tax = total * 0.07
7. else if total >= 100000 then  Note: the problem solution is wrong,
8. tax = total * 0.15; because the condition for the 100K
9. return( tax) EURO limit should be tested first.
This defect would be caught by
structural testing.


Example: how many test cases?
 Based on graph theory, Tom McCabe proposed
the cyclomatic complexity metric that
expresses the minimum number of test cases for
100% test coverage:
v(G) = # edges - # nodes + # inputs and outputs
In the current case we obtain:
11 - 9 + 2 = 4 (complete graph)
6 - 4 + 2 = 4 (reduced graph)
Therefore we should be able to produce 4 test cases
that when applied would lead to a 100% coverage.

Call graphs
 Are a graphical representation of the
dependences of functions, procedures or
methods on each other
 nodes (boxes) represent functions, methods, etc
 oriented edges represent invocations made

 This kind of white box testing is often used for
profiling execution snapshots


Call graph based testing


 Colors are often used to
represent coverage
percentages


Assessing structural test coverage
 The ReModeler tool from
the QUASAR team uses
a model-based approach
to represent this kind of
testing coverage
 Each class or package is
colored according to the
percentage of executed
methods


Data-flow-based criteria
 In data-flow-based testing, the control flowgraph is
annotated with information about how the program
variables are defined, used, and killed (undefined)
 The strongest criterion, all definition-use paths, requires
that, for each variable, every control flow path segment
from a definition of that variable to a use of that
definition is executed
 In order to reduce the number of paths required, weaker
strategies such as all-definitions and all-uses are
employed


Test Techniques
Fault-based
 With different degrees of formalization, fault-
based testing techniques devise test cases
specifically aimed at revealing categories of likely
or predefined faults

 Two main techniques exist:
 Errorguessing
 Mutation testing


Test Techniques – Fault-based
Error guessing

 In error guessing, test cases are specifically
designed by software engineers trying to figure
out the most plausible faults in a given program

 A good source of information is the history of
faults discovered in earlier projects, as well as
the software engineer’s expertise


Test Techniques – Fault-based
Mutation testing
 A mutant is a slightly modified version of the program under test, differing from it by a
small, syntactic change

 Every test case exercises both the original and all generated mutants: if a test case is
successful in identifying the difference between the program and a mutant, the latter is
said to be “killed”

 Originally conceived as a technique to evaluate a test set, mutation testing is also a
testing criterion in itself: either tests are randomly generated until enough mutants have
been killed, or tests are specifically designed to kill surviving mutants
 In the latter case, mutation testing can also be categorized as a code-based technique

 The underlying assumption of mutation testing, the coupling effect, is that by looking for
simple syntactic faults, more complex but real faults will be found

 For the technique to be effective, a large number of mutants must be automatically
derived in a systematic way.


Test Techniques
Usage-based

 Operational profile

 Software Reliability Engineered Testing


Test Techniques – Usage-based
Operational profile
 In testing for reliability evaluation, the test
environment must reproduce the operational
environment of the software as closely as
possible
 The idea is to infer, from the observed test
results, the future reliability of the software when
in actual use
 To do this, inputs are assigned a probability
distribution, or profile, according to their
occurrence in actual operation

Test Techniques – Usage-based
Software Reliability Engineered Testing

 Software Reliability Engineered Testing (SRET)
is a testing method encompassing the whole
development process, whereby testing is
“designed and guided by reliability objectives and
expected relative usage and criticality of different
functions in the field.”


Test Techniques
Based on nature of application
 Object-oriented testing
 Component-based testing
 Web-based testing
 GUI testing
 Testing of concurrent programs
 Protocol conformance testing
 Testing of real-time systems
 Testing of safety-critical systems


Test Techniques
Selecting and combining techniques
 Specification-based and code-based test
techniques are often contrasted as functional vs.
structural testing
 These two approaches to test selection are not to
be seen as alternative but rather complementary
 infact, they use different sources of information and
have proved to highlight different kinds of problems
 they should be used in combination, depending on
budgetary considerations


Automatic Construction of Test Cases

 Test generation is possible from:
 model-based specifications
 algebraic (formal) specifications

 Segmentation (“slicing”) and ramification
(“branch analysis”) techniques are used to
identify partitions

TTCN (Tree and Tabular Combined Notation)
 1983: ISO TC 97/SC 16 and later in ISO/IEC JTC 1/SC
21 and in CCITT SG VII as part of the work on OSI
conformance testing methodology and framework
 Has been widely used since then for describing protocol
conformance test suites in standardization organizations such
as ITU-T, ISO/IEC, ATM Forum, ETSI and industry
 1998: TTCN-2, in ISO/IEC and in ITU-T
 New features: concurrency mechanism, concepts of module and package,
manipulation of ASN.1 encoding
 TTCN-3


TTCN (Tree and Tabular Combined Notation)

 TTCN is a standardized test case format
 The main characteristics of TTCN are that:
 its Tabular Notation allows its user to describe easily and
naturally in a tree form all possible scenarios of stimulus and
various reactions to it between the tester and the target
 its verdict system is designed such that to facilitate
conformance judgment on the test result agrees against the
test purpose, and
 it provides a mechanism to describe appropriate constraints on
received messages so that conformance of the received
messages can be automatically evaluated against the test
purpose

TTCN-3 example
 The following is an example of an
Abstract Test Suite (ATS)
where we are trying to test a
weather service
 The tester sends a request
consisting of a location, a date
and a kind of report to some on-
line weather service and receives
a response with confirmation of
the location and date along with
the temperature, the wind
velocity and the weather
conditions at this location

TTCN-3 example
 A TTCN-3 ATS is always composed of four sections:

1. type definitions: data structures like in C but also an easy to use
concept of lists and sets
2. template definitions: A TTCN-3 template consists of two separate
concepts merged into one:
 test data definition
 test data matching rules
3. test cases definitions: specifies the sequences and alternatives of
sequences of messages sent and received to and from the System Under
Test (SUT)
4. test control definitions: defines the order of execution of various test
cases


Sample TTCN-3 Abstract Test Suite
module SimpleWeather { type record weatherResponse {
charstring location,
type record weatherRequest { charstring date,
charstring location, charstring kind,
charstring date, integer temperature,
charstring kind integer windVelocity,
} charstring conditions
}
template weatherRequest
ParisWeekendWeatherRequest := { template weatherResponse ParisResponse := {
location := "Paris", location := "Paris",
date := "15/06/2006", date := "15/06/2006",
kind := "actual" kind := "actual",
} temperature := (15..30),
windVelocity := (0..20),
conditions := "sunny"
}

Sample TTCN-3 Abstract Test Suite
type port weatherPort message { testcase testWeather() runs on MTCType {
in weatherResponse; weatherOffice.send(ParisWeekendWeatherRequest);
out weatherRequest; alt {
} [] weatherOffice.receive(ParisResponse) {
setverdict(pass)
type component MTCType { }
port weatherPort weatherOffice; [] weatherOffice.receive {
setverdict(fail)
} }
}
}

control {
execute (testWeather())
}
}

 Implies the resolution of several problems:
 program decomposition (slicing)
 classification of partitions found
 selection of test paths
 test case generation to exercise those paths
 validation of generated cases

 Last problem is solved by the construction of an oracle
(software) whose function is to find if, for a given test,
the program responds according to its specification.


An example Who? » Siemens + Swiss PTT
What? » SAMSTAG (Sdl And Msc
baSed Test cAse Generation)
How to model system & tests?
 Target system (SDL)
 Test scenarios (MSC)

SDL - Specification and Description
Language [ITU Z.100]
MSC - Message Sequence Chart
[ITU Z.120]
TTCN (Tree and Tabular Combined
Notation) [ISO/IEC JTC1/SC21]

Some tools

Validator (Aonix)

SoftTest (?)

ObjectGEODE TestComposer
(Verilog)

TestFactory (Rational)

Summary
 Test Levels
 Test Techniques
 Test Process


Test-related Measures
Evaluation of the program under test
Program measurements to aid in planning and
designing testing

 To guide testing we may use measures based
on:
 program size
 E.g. SLOC or function points
 program structure
 E.g. McCabe’s metrics or frequency with which modules
call each other


Fault types, classification, and statistics

 Testing literature is rich in classifications and
taxonomies of faults

 To make testing more effective, it is important to know:
 which types of faults could be found in the software under test
 the relative frequency with which these faults have occurred in
the past

 This information can be very useful in making quality
predictions, as well as for process improvement


Fault density

 A program under test can be assessed by counting and
classifying the discovered faults by their types

 For each fault class, fault density is measured as the
ratio between the number of faults found and the size of
the program


Evaluation of the tests performed
Coverage/thoroughness measures

 Several test adequacy criteria require that the test cases
systematically exercise a set of elements identified in the program
or in the specifications
 To evaluate the thoroughness of the executed tests, testers can
monitor the elements covered, so that they can dynamically
measure the ratio between covered elements and their total
number
 For example, it is possible to measure the percentage of covered branches
in the program flowgraph, or that of the functional requirements exercised
among those listed in the specifications document
 Code-based adequacy criteria require appropriate
instrumentation of the program under test


Example:
Static and dynamic metrics
used to guide white-box testing

Static Metrics Collection
Some examples collected by White-box tools:
– Number of private, protect and public attributes
– Overloading, overriding and visibility of operations
– Comments density (eg. JavaDoc comments per class)
– Inheritance metrics (ex: depth,width,inherited features)
– MOOSE metrics (Chidamber and Kemerer)
– MOOD metrics (Brito e Abreu)
– QMOOD metrics (Jagdish Bansiya)


Static Metrics - ex: Cantata++


Dynamic Metrics Collection
 Class, Operation, Branch, Exception clause coverage
 Example: Multiple Condition Coverage
 Measures whether each combination of condition
outcomes for a decision has been exercised; are f() and
g() called in the following code extract?

if ((a == b || f()) && (c == d || g()))
x();
else
y();

 Note that the expression can be evaluated to true without
calling f() or g().


Fault seeding
 Some faults are artificially introduced into the program before test

 When the tests are executed, some of these seeded faults will be
revealed, and possibly some faults which were already there will
be as well
 depending on which of the artificial faults are discovered, and how many,
testing effectiveness can be evaluated, and the remaining number of
genuine faults can be estimated

 Problems:
 distribution and representativeness of seeded faults relative to original ones
 small sample size on which any extrapolations are based
 inserting faults into software involves the obvious risk of leaving them there



Mutation score

 In mutation testing, the ratio of killed mutants to
the total number of generated mutants can be a
measure of the effectiveness of the executed test
set


Summary
 Test Levels
 Test Techniques
 Test Process


Test Process – Practical Considerations
Attitudes / Egoless programming
 A very important component of successful testing is a
collaborative attitude towards testing and quality
assurance activities

 Managers have a key role in fostering a generally
favorable reception towards failure discovery during
development and maintenance
 for instance, by preventing a mindset of code ownership
among programmers, so that they will not feel responsible for
failures revealed by their code


Test guides

 The testing phases could be guided by various
aims:

 in risk-based testing, which uses the product risks
to prioritize and focus the test strategy

 in scenario-based testing, in which test cases are
defined based on specified software scenarios

Test documentation and work products
 Documentation is an integral part of the formalization of
the test process

 Test documents may include:
 Test Plan
 Test Design Specification
 Test Procedure Specification
 Test Case Specification
 Test Log
 Test Incident or Problem Report


Internal vs. independent test team

 External members, may bring an unbiased, independent
perspective

 Decision on internal, external or a blend of teams,
should be based upon considerations of:
 cost
 schedule
 maturity levels of the involved organizations
 criticality of the application


Case study:
ISV&V at Critical Software

Cost/effort estimation and other process measures
 Several measures related to the resources spent
on testing, as well as to the relative fault-finding
effectiveness of the various test phases, are
used by managers to control and improve the
test process, such as:
 number of test cases specified
 number of test cases executed
 number of test cases passed
 number of test cases failed


Cost/effort estimation and other process measures

 Evaluation of test phase reports can be combined with
root cause analysis to evaluate test process
effectiveness in finding faults as early as possible
 Such an evaluation could be associated with the analysis of
risks

 Moreover, the resources that are worth spending on
testing should be commensurate with the use/criticality
of the application:
 different techniques have different costs and yield different
levels of confidence in product reliability


Termination
 A decision must be made as to how much testing is
enough and when a test stage can be terminated

 Thoroughness measures, such as …
 achieved code coverage
 functional completeness
 estimates of fault density or of operational reliability
 … provide useful support, but are not sufficient in
themselves


Termination

 The decision also involves considerations about the
costs and risks incurred by the potential for remaining
failures, as opposed to the costs implied by continuing
to test

 There are two possible approaches to this problem
 Termination based on test efficiency
 Termination based on test effectiveness


Test efficiency-based termination

 To decide on test termination or to compare
distinct V&V procedures and tools we need to
know their Efficiency
 walkthroughs, inspections, black-box, white-box ?

 Efficiency = work produced / resources spent
» Test efficiency = defects found / effort spent
= benefit / cost


Test efficiency-based termination

 As testing proceeds …
defect density decreases
test efficiency decreases - more and more
effort is spent (cost) to find new defects
(benefit)
reliability grows - probability that users
experience defect effects (failures) reduces


Case Study 2500
Testing effort spent per week

Testing Effort 2000

1500

1000

500

0
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Cumulative Effort (Cost)
35000
30000
25000
20000
15000
10000
5000
0
127
27-Sep-11
Week 1 2 3 4 5 6 7 8 9 10 11 12 / 13 14 15 16
Software Engineering Fernando Brito e Abreu

Case Study 300
Defects found per week

Defects Found 250
200
150
100
50
0
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Cumulative Defects (Benefit)
2000

1500

1000

500

0
Week 1 2
27-Sep-11 3 4 5 6 7 8 9 10 11 12 / 13 14 15 16
Software Engineering Fernando Brito e Abreu 128

Case Study Cost / Benefit Ratio
2500
Test Efficiency 2000

1500

1000

500

0
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Benefit / Cost Ratio (Test Efficiency)
140
120
100 These ratios can be
80
used to set test
60
40
stopping thresholds
20
0
Week 1 2
27-Sep-11 3 4 5 6 7 8 Software 11 12 13 Fernando Brito e Abreu
9 10 Engineering / 14 15 16 129

Test effectiveness-based termination
"Testing can only show the presence of bugs but never their absence"
Dijkstra
Is this statement correct ?


Test effectiveness-based termination

 Test effectiveness allows to decide when tests
should be stopped
 test plan should indicate that level (e.g. 90%)

 Effectiveness = achieved effect / desired effect
» Test effectiveness = percentage of total defects found


Test Effectiveness - Case Study
Weekly % of Defects Found
(Weekly Test Effectiveness)

14% Conclusion: it is not
12%
10%
worth testing beyond a
8% certain point; that point
6%
4% can be based on a
2% given effectiveness
0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 threshold
Week Cumulative % of Defects Found
(Cumulative Test Effectiveness)
100%

80%

60%

40%

20%

0%
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Test Effectiveness
 To calculate it we need to know:
 the total number of defects
 or the number of remaining defects
 total = found + remaining
 Remaining defects can be known à posteriori
 Simply wait by user action (not a good choice ...)
 Even then, we have to set an observation period
 Obs. period = f (system complexity, transaction rate)
 some defects may only cause failures after intensive use


Defect Injection Technique
This technique allows to estimate remaining defects and
therefore obtain test effectiveness
1. A member of the development team (not necessarily the
producer) includes deliberately some defects in the target
system, neither condensed nor in a captious way.
2. He documents and describes the localization of injected
defects and delivers that information to the project leader.
3. The target system is passed on to the testing team.
4. Test process efficiency is verified through the percentage of
injected defects that were found.
5. Remaining defects (not injected) are then estimated

Defect Injection (continued)
 Before the beginning of the test we have:
DOi Original Defects (unknown !)
DIi Injected Defects (known)

 At all moments after the beginning of the test we have:
DOe Original defects found
DIe Injected defects found
DOr = DOi - Doe Original defects remaining (not found)
DIr = DIi - DIe Injected defects remaining (not found)


Defect Injection (continued)
Let:
ERO = DOe / DOi Effectiveness in Original Defects
Removal (unknown !)
ERI = DIe / DIi Effectiveness in Injected Defects Removal
(known !)

 Considering ERO  ERI which will be close to truth if the
number of injected defects is sufficiently large:
DOi = DOe / ERO  DOe / ERI
DOr = DOi ( 1 - ERO ) = DOe ( 1 / ERO - 1 )  DOe ( 1 / ERI - 1 )


Test Process – Test activities
Defect tracking

 Detected defects can be analyzed to determine:
 when they were introduced into the software
 what kind of error caused them to be created
 E.g. poorly defined requirements, incorrect variable
declaration, memory leak, programming syntax error, …
 when they could have been first observed in the
software


Test Process – Test activities
Defect tracking

 Defect-tracking information is used to determine
what aspects of software engineering need
improvement and how effective previous
analyses and testing have been
 This causal analysis allows introducing prevention
actions
 Prevention is better than the cure  and is a typical
characteristic of higher levels of maturity in the
software development process


Defect prevention in CMMI


Bibliography
[Bec02] K. Beck, Test-Driven [Lyu96] M.R. Lyu, Handbook of Software
Development by Example, Addison- Reliability Engineering, Mc-Graw-
Wesley, 2002. Hill/IEEE, 1996, Chap. 2s2.2, 5-7.
[Bei90] B. Beizer, Software Testing [Per95] W. Perry, Effective Methods for
Techniques, International Thomson Software Testing, John Wiley &
Press, 1990, Chap. 1-3, 5, 7s4, 10s3, Sons, 1995, Chap. 1-4, 9, 10-12,
11, 13. 17, 19-21.
[Jor02] P. C. Jorgensen, Software [Pfl01] S. L. Pfleeger, Software
Testing: A Craftsman's Approach, Engineering: Theory and Practice,
second edition, CRC Press, 2004, 2nd ed., Prentice Hall, 2001, Chap.
Chap. 2, 5-10, 12-15, 17, 20. 8, 9.
[Kan99] C. Kaner, J. Falk, and H.Q. [Zhu97] H. Zhu, P.A.V. Hall and J.H.R.
Nguyen, Testing Computer Software, May, “Software Unit Test Coverage
2nd ed., John Wiley & Sons, 1999, and Adequacy,” ACM Computing
Chaps. 1, 2, 5-8, 11-13, 15. Surveys, vol. 29, iss. 4 (Sections 1,
[Kan01] C. Kaner, J. Bach, and B. 2.2, 3.2, 3.3), Dec. 1997, pp. 366-
Pettichord, Lessons Learned in 427.
Software Testing, Wiley Computer
Publishing, 2001.

Applicable standards
(IEEE610.12-90) IEEE Std 610.12- (IEEE1044-93) IEEE Std 1044-1993
1990 (R2002), IEEE Standard (R2002), IEEE Standard for the
Glossary of Software Engineering Classification of Software
Terminology, IEEE, 1990. Anomalies, IEEE, 1993.
(IEEE829-98) IEEE Std 829-1998, (IEEE1228-94) IEEE Std 1228-1994,
Standard for Software Test Standard for Software Safety
Documentation, IEEE, 1998. Plans, IEEE, 1994.
(IEEE982.1-88) IEEE Std 982.1-1988, (IEEE12207.0-96) IEEE/EIA
IEEE Standard Dictionary of 12207.0-1996 //
Measures to Produce Reliable ISO/IEC12207:1995, Industry
Software, IEEE, 1988. Implementation of Int. Std.
(IEEE1008-87) IEEE Std 1008-1987 ISO/IEC 12207:95, Standard for
(R2003), IEEE Standard for Information Technology-
Software Unit Testing, IEEE, Software Life Cycle Processes,
1987. IEEE, 1996.


Black-Box Tools - Web Links
 JavaStar (http://www.sun.com/workshop/testingtools/javastar.html)
 JavaLoad (http://www.sun.com/workshop/testingtools/javaload.html)
 VisualTest, Scenario Recorder, Test Suite Manager
(http://www.rational.com/)
 SoftTest (http://www.softtest.com/pages/prod_st.htm)
 AutoTester (http://www.autotester.com/)
 WinRunner (http://www.merc-int.com/products/winrunguide.html)
 LoadRunner (http://www.merc-int.com/products/loadrunguide.html)
 QuickTest (http://www.mercury.com)
 TestComplete (http://www.automatedqa.com)
 S-Unit test framework (http://sunit.sourceforge.net)
 eValid™ Automated Web Testing Suite (http://www.soft.com/eValid/)


White-Box Tools - Web Links
 JavaScope
(http://www.sun.com/workshop/testingtools/javascope.html)
 JavaSpec ( http://www.sun.com/workshop/testingtools/javaspec.html )
 Cantata++ ( http://www.iplbath.com/ )
 PureCoverage, Quantify, Purify ( http://www.rational.com/ )
 LDRA ( http://www.luna.co.uk/~elverex/ldratb.htm )
 McCabe Test (http://www.mccabe.com/?file=./prod/test/data.html )
 ATTOL Coverage ( http://www.attol-testware.com/coverage.htm )
 Quality Works ( http://www.segue.com )
 Panorama (http://www.softwareautomation.com/)


2011/09/20 - Software Testing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 2011/09/20 - Software Testing

Ähnlich wie 2011/09/20 - Software Testing (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

2011/09/20 - Software Testing