Assessing Test Case Prioritization on Real Faults and Mutants

Qi Luo,
Kevin Moran,
Massimiliano Di Penta,
Denys Poshyvanyk
AssessingTest Case Prioritization
on Real Faults and Mutants
34th International Conference on Software
Maintenance and Evolution (ICSME’18)
Thursday, September 27th, 2018

REGRESSION TESTING
v1.0 v1.2 v2.0 vN
…

REGRESSION TESTING
v1.0 v1.2 v2.0 vN
…
t1
t2
t3
t4

TEST CASE PRIORITIZATION (TCP)
v1.2
t1
t2
t3
t4

v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering

v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1

v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2

v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3

v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
t3
t1
t2
t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3
t3
t1
t2
t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1
t3
t1
t2
t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2
t3
t1
t2
t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
APFD: Average Percentage
of Faults Detected
APFD = 54%

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
of Faults Detected
APFD = 54%
APFD = 96%

v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
of Faults Detected
APFD = 54%
APFD = 96%
The Red ordering of test cases
outperforms the Blue ordering in terms
of APFD
The main goal of TCP is to prioritize test
cases so as to maximize APFD

ASSESSING TCP EFFECTIVENESS (IDEAL)
v1.0 v1.2 v2.0 vN
…

t1
t2
t3
t4

t1
t2
t3
t4
TCP
Technique

t1
t2
t3
t4
TCP
Technique
t1
t2
t3
t4
Prioritized
Tests

t1
t2
t3
t4
TCP
Technique
t1
t2
t3
t4
Prioritized
Tests
Measured
APFD Values

ASSESSING TCP EFFECTIVENESS (ACTUAL)
t1
t2
t3
t4
Mutation
Framework
v N

t1
t2
t3
t4
v N

t1
t2
t3
t4
TCP
Technique
v N

t1
t2
t3
t4
TCP
Technique
t1
t2
t3
t4
Prioritized
Tests
v N

t1
t2
t3
t4
TCP
Technique
t1
t2
t3
t4
Prioritized
Tests
Measured
APFD Values v N

How well do TCP Techniques perform on real faults?

Is the performance of TCP techniques on mutants
representative of their performance on real faults?

What properties of mutants impact the
representativeness of this performance?
Is the performance of TCP techniques on mutants
representative of their performance on real faults?

RESEARCH QUESTIONS
• RQ1: TCP performance real faults?

RESEARCH QUESTIONS
• RQ2: Representativeness of mutants for TCP?

RESEARCH QUESTIONS
• RQ2: Representativeness of mutants for TCP?
• RQ3: How do fault properties impact TCP performance?

EMPIRICAL STUDY CONTEXT
Project Name Number of Real Faults
JFreeChart 26
Closure compiler 133
Apache commons-lang 65
Apache commons-math 106
Joda-Time 27
Total 357
Defects4J

v1.0 v2.0 vN
…

Pit
Mutation
Framework
vN

Pit
Mutation
Framework
vN
Repeat this process
100 times
Every mutant can be detected
by at least one test case

Project Name Number of Real Faults Mutants
JFreeChart 26 2,600
Closure 133 13,300
Commons-lang 65 6,500
Commons-math 106 10,600
Joda-Time 27 2,700
Total 357 35,700
Defects4J

Created a second set of
mutants with subsumed
mutants removed

Project Name Number of Bugs Mutants Subsuming Mutants
JFreeChart 26 2,600 1,796
Closure 133 13,300 9,731
Commons-lang 65 6,500 2,129
Commons-math 106 10,600 5,016
Joda-Time 27 2,700 2,700
Total 357 35,700 21,372
Defects4J

Type Tag Description
Static
CG-Total Call graph-based (total strategy)
CG-Add Call graph-based (additional strategy)
Str String distance-based
Topic Topic model-based
Dynamic
Total Greedy Total (statement level)
Add Greedy Additional (statement level)
Art Adaptive Random (statement level)
Search Search-based (statement level)
Studied TCP Techniques

METHODOLOGY RQ1
Run Tests at Test
Method Level

METHODOLOGY RQ1
Run Tests at Test
Method Level
APFD = Rate of
Fault Detection

METHODOLOGY RQ1
Run Tests at Test
Method Level
APFD = Rate of
Fault Detection
APFDc = Fault
Detection Rate
& Efﬁciency

METHODOLOGY RQ1
Run Tests at Test
Method Level
ANOVA & Tukey
HSD Tests
APFD = Rate of
Fault Detection
APFDc = Fault
Detection Rate
& Efﬁciency

METHODOLOGY RQ2
Examine
Absolute
Performance

METHODOLOGY RQ2
Examine
Absolute
Performance
Examine
Relative
Performance

METHODOLOGY RQ2
Examine
Absolute
Performance
Kendall
Rank
Correlation
Analysis
Examine
Relative
Performance

METHODOLOGY RQ3
Examine Mutants
based on Real
Fault Coupling

METHODOLOGY RQ3
Examine Mutants
based on Real
Fault Coupling
Examine
Mutants based
on Operator

METHODOLOGY RQ3
Kendall
Rank
Correlation
Analysis
Examine Mutants
based on Real
Fault Coupling
Examine
Mutants based
on Operator

RQ1 RESULTS: REAL FAULT PERFORMANCE
TCP Technique APFD
Static
Topic 0.700
Str 0.696
CG-Add 0.597
CG-Total 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
TCP Technique APFDc
Static
Topic 0.635
Str 0.594
CG-Add 0.591
CG-Total 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419

TCP Technique APFD
Static
Topic 0.700
Str 0.696
CG-Add 0.597
CG-Total 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
TCP Technique APFDc
Static
Topic 0.635
Str 0.594
CG-Add 0.591
CG-Total 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Summary
• All Techniques perform better
according to APFD

TCP Technique APFD
Static
Topic 0.700
Str 0.696
CG-Add 0.597
CG-Total 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
TCP Technique APFDc
Static
Topic 0.635
Str 0.594
CG-Add 0.591
CG-Total 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Summary
according to APFD
• Static TCP Techniques tend to
perform better overall

TCP Technique APFD
Static
Topic 0.700
Str 0.696
CG-Add 0.597
CG-Total 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
TCP Technique APFDc
Static
Topic 0.635
Str 0.594
CG-Add 0.591
CG-Total 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Summary
according to APFD
• Static TCP Techniques tend to
perform better overall
• Total outperforms Add for APFD

RQ2 RESULTS: COMPARING MUTANTS & REAL FAULTS
Tech APFD
Static
Topic 0.700
Str 0.696
CG-A 0.597
CG-T 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
Tech APFDc
Static
Topic 0.635
Str 0.594
CG-A 0.591
CG-T 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Tech APFD
Static
Str 0.834
Topic 0.832
CG-A 0.818
CG-T 0.743
Dynamic
Add 0.897
Art 0.800
Search 0.784
Total 0.757
Tech APFDc
Static
CG-A 0.835
Topic 0.802
Str 0.788
CG-T 0.598
Dynamic
Art 0.841
Add 0.829
Search 0.725
Total 0.549
Real Faults All Mutants

Tech APFD
Static
Topic 0.700
Str 0.696
CG-A 0.597
CG-T 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
Tech APFDc
Static
Topic 0.635
Str 0.594
CG-A 0.591
CG-T 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Tech APFD
Static
Str 0.620
Topic 0.612
CG-A 0.612
CG-T 0.561
Dynamic
Add 0.664
Art 0.622
Search 0.578
Total 0.534
Tech APFDc
Static
CG-A 0.639
Str 0.572
Topic 0.570
CG-T 0.407
Dynamic
Art 0.671
Add 0.565
Search 0.508
Total 0.305
Real Faults Subsuming Mutants

Tech APFD
Static
Topic 0.700
Str 0.696
CG-A 0.597
CG-T 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
Tech APFDc
Static
Topic 0.635
Str 0.594
CG-A 0.591
CG-T 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Tech APFD
Static
Str 0.620
Topic 0.612
CG-A 0.612
CG-T 0.561
Dynamic
Add 0.664
Art 0.622
Search 0.578
Total 0.534
Tech APFDc
Static
CG-A 0.639
Str 0.572
Topic 0.570
CG-T 0.407
Dynamic
Art 0.671
Add 0.565
Search 0.508
Total 0.305
Real Faults Subsuming MutantsSummary
•Metrics according to the full mutant set
tend to overestimate performance

Tech APFD
Static
Topic 0.700
Str 0.696
CG-A 0.597
CG-T 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
Tech APFDc
Static
Topic 0.635
Str 0.594
CG-A 0.591
CG-T 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Tech APFD
Static
Str 0.620
Topic 0.612
CG-A 0.612
CG-T 0.561
Dynamic
Add 0.664
Art 0.622
Search 0.578
Total 0.534
Tech APFDc
Static
CG-A 0.639
Str 0.572
Topic 0.570
CG-T 0.407
Dynamic
Art 0.671
Add 0.565
Search 0.508
Total 0.305
•APFDc values on mutants correlated more
strongly to APFDc values on Real Faults

Tech APFD
Static
Topic 0.700
Str 0.696
CG-A 0.597
CG-T 0.594
Dynamic
Art 0.657
Total 0.610
Search 0.600
Add 0.583
Tech APFDc
Static
Topic 0.635
Str 0.594
CG-A 0.591
CG-T 0.480
Dynamic
Art 0.677
Search 0.556
Add 0.454
Total 0.419
Tech APFD
Static
Str 0.620
Topic 0.612
CG-A 0.612
CG-T 0.561
Dynamic
Add 0.664
Art 0.622
Search 0.578
Total 0.534
Tech APFDc
Static
CG-A 0.639
Str 0.572
Topic 0.570
CG-T 0.407
Dynamic
Art 0.671
Add 0.565
Search 0.508
Total 0.305
•APFDc values on mutants correlated more
strongly to APFDc values on Real Faults
•The relative ordering of techniques
between mutants and real faults diﬀers

RQ3 RESULTS: EXAMINING FAULT PROPERTIES

• High performance correlation for real faults that were
highly coupled to mutation operators

• TCP techniques perform diﬀerently across mutants
seeded by diﬀerent operators

• TCP techniques perform differently across mutants
seeded by different operators
• TCP performance for mutants seeded by different
operators varies widely across subject programs

LEARNED LESSONS
Relative TCP
performance can
diﬀer between
mutants and real faults
Lesson 1

LEARNED LESSONS
Relative TCP
performance can
diﬀer between
Lesson 1
The metrics utilized
in TCP evaluations
impact mutant
representativeness
Lesson 2

LEARNED LESSONS
Relative TCP
performance can
diﬀer between
Lesson 1
The metrics utilized
in TCP evaluations
impact mutant
representativeness
Lesson 2
Mutation Operators must be
carefully selected in order
for mutation-based TCP
performance to represent
performance on real faults
Lesson 3

Any Questions?
Thank you!
Kevin Moran
Post-Doctoral Fellow
College of William & Mary
@kevpmo
kpmoran@cs.wm.edu
https://www.kpmoran.com

SEEDING MUTANTS INTO THE LATEST VERSION

Assessing Test Case Prioritization on Real Faults and Mutants

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Assessing Test Case Prioritization on Real Faults and Mutants

Similar to Assessing Test Case Prioritization on Real Faults and Mutants (20)

More from Kevin Moran

More from Kevin Moran (13)

Recently uploaded

Recently uploaded (20)

Assessing Test Case Prioritization on Real Faults and Mutants