Test Case Prioritization (TCP) is an important component of regression testing, allowing for earlier detection of faults or helping to reduce testing time and cost. While several TCP approaches exist in the research literature, a growing number of studies have evaluated them against synthetic software defects, called mutants. Hence, it is currently unclear to what extent TCP performance on mutants would be representative of the performance achieved on real faults. To answer this fundamental question, we conduct the first empirical study comparing the performance of TCP techniques applied to both real-world and mutation faults. The context of our study includes eight well-studied TCP approaches, 35k+ mutation faults, and 357 real-world faults from five Java systems in the Defects4J dataset. Our results indicate that the relative performance of the studied TCP techniques on mutants may not strongly correlate with performance on real faults, depending upon attributes of the subject programs. This suggests that, in certain contexts, the best performing technique on a set of mutants may not be the best technique in practice when applied to real faults. We also illustrate that these correlations vary for mutants generated by different operators depending on whether chosen operators reflect typical faults of a subject program. This highlights the importance, particularly for TCP, of developing mutation operators tailored for specific program domains.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Assessing Test Case Prioritization on Real Faults and Mutants
1. Qi Luo,
Kevin Moran,
Massimiliano Di Penta,
Denys Poshyvanyk
AssessingTest Case Prioritization
on Real Faults and Mutants
34th International Conference on Software
Maintenance and Evolution (ICSME’18)
Thursday, September 27th, 2018
11. TEST CASE PRIORITIZATION (TCP)
v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
12. TEST CASE PRIORITIZATION (TCP)
v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1
13. TEST CASE PRIORITIZATION (TCP)
v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2
14. TEST CASE PRIORITIZATION (TCP)
v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3
15. TEST CASE PRIORITIZATION (TCP)
v1.2
t1
t2
t3
t4
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
16. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
17. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
t3
t1
t2
t4
18. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3
t3
t1
t2
t4
19. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1
t3
t1
t2
t4
20. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2
t3
t1
t2
t4
21. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
22. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
APFD: Average Percentage
of Faults Detected
APFD = 54%
23. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
APFD: Average Percentage
of Faults Detected
APFD = 54%
24. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
APFD: Average Percentage
of Faults Detected
APFD = 54%
APFD = 96%
25. TEST CASE PRIORITIZATION (TCP)
v1.2
0
1
2
3
4
0 1 2 3 4
# Faults
Found
#Tests Executed
Test Ordering
1) t1 2) t2 3) t3 4) t4
1) t3 2) t1 3) t2 4) t4
t3
t1
t2
t4
APFD: Average Percentage
of Faults Detected
APFD = 54%
APFD = 96%
The Red ordering of test cases
outperforms the Blue ordering in terms
of APFD
The main goal of TCP is to prioritize test
cases so as to maximize APFD
41. ASSESSING TCP EFFECTIVENESS (ACTUAL)
t1
t2
t3
t4
TCP
Technique
t1
t2
t3
t4
Prioritized
Tests
Measured
APFD Values v N
42.
43. How well do TCP Techniques perform on real faults?
44. How well do TCP Techniques perform on real faults?
Is the performance of TCP techniques on mutants
representative of their performance on real faults?
45. How well do TCP Techniques perform on real faults?
What properties of mutants impact the
representativeness of this performance?
Is the performance of TCP techniques on mutants
representative of their performance on real faults?
48. RESEARCH QUESTIONS
• RQ1: TCP performance real faults?
• RQ2: Representativeness of mutants for TCP?
49. RESEARCH QUESTIONS
• RQ1: TCP performance real faults?
• RQ2: Representativeness of mutants for TCP?
• RQ3: How do fault properties impact TCP performance?
50. EMPIRICAL STUDY CONTEXT
Project Name Number of Real Faults
JFreeChart 26
Closure compiler 133
Apache commons-lang 65
Apache commons-math 106
Joda-Time 27
Total 357
Defects4J
57. EMPIRICAL STUDY CONTEXT
Project Name Number of Real Faults Mutants
JFreeChart 26 2,600
Closure 133 13,300
Commons-lang 65 6,500
Commons-math 106 10,600
Joda-Time 27 2,700
Total 357 35,700
Defects4J
93. RQ3 RESULTS: EXAMINING FAULT PROPERTIES
• High performance correlation for real faults that were
highly coupled to mutation operators
94. RQ3 RESULTS: EXAMINING FAULT PROPERTIES
• High performance correlation for real faults that were
highly coupled to mutation operators
• TCP techniques perform differently across mutants
seeded by different operators
95. RQ3 RESULTS: EXAMINING FAULT PROPERTIES
• High performance correlation for real faults that were
highly coupled to mutation operators
• TCP techniques perform differently across mutants
seeded by different operators
• TCP performance for mutants seeded by different
operators varies widely across subject programs
98. LEARNED LESSONS
Relative TCP
performance can
differ between
mutants and real faults
Lesson 1
The metrics utilized
in TCP evaluations
impact mutant
representativeness
Lesson 2
99. LEARNED LESSONS
Relative TCP
performance can
differ between
mutants and real faults
Lesson 1
The metrics utilized
in TCP evaluations
impact mutant
representativeness
Lesson 2
Mutation Operators must be
carefully selected in order
for mutation-based TCP
performance to represent
performance on real faults
Lesson 3
100.
101.
102.
103.
104.
105. Any Questions?
Thank you!
Kevin Moran
Post-Doctoral Fellow
College of William & Mary
@kevpmo
kpmoran@cs.wm.edu
https://www.kpmoran.com