SBST 2015 - 3rd Tool Competition for Java Junit test Tools

3rd Java Unit Testing Tool
Competition
Tanja E.J. Vos
Urko Rueda
Universidad Politecnica de Valencia
http://sbstcontest.dsic.upv.es/
8th International Workshop on Search-Based Software Testing (SBST) at the
37th IEEE International Conference on Software Engineering ICSE 2015

A (3rd) tool competition… WHY?
§  Competition between different types of automated unit testing tools
(evolutionary, guided/random, dynamic)
§  Task: generate regression Junit tests for given unknown set of
classes
§  Score takes into account:
–  Effectiveness : instruction coverage, branch coverage, mutation coverage
–  Efficiency: time to prepare, generate and execute
§  Allows comparison between different approaches
§  Help developers:
–  Improve their tools
–  Guide future developments

¢  Commercial Tool
¢  anonymous, dynamic approach, deployment and configuration for competition done by UPV
¢  EvoSuite
¢  G. Fraser, A. Arcuri, evolutionary/search-based, static analysis
¢  Evosuite-Mosa
¢  A. Panichella, P. Tonella, F.M. Kifetew, A. Panico, evolutionary
¢  GRT
¢  L. Ma, C. Artho, C. Zhang, guided random, static analysis
¢  jTexPert
¢  A. Sakti, guided random, static analysis
¢  T3
¢  W. Prasetya, random testing, pair-wise testing, …
WHO were the participants
(alphabetical order)

¢  Baseline: Randoop (random testing)
¢  Baseline: Manual
•  3 testers (professional tester + researcher + PhD
student)
•  “Write unit tests for given classes! Take as much
time as you think is necessary”
•  Measure time to get familiar with class and to
write the tests
WHAT were the baselines

§  Instruction coverage
§  Branch coverage
§  Mutation coverage
§  Time for generation of tests
§  Execution time
§  Preparation time
Unit Testing Tool Competition – Round Two 9
we deﬁned a benchmark function which assigns to each run of a test tool T a
core as the weighted sum over the measured variables:
scoreT :=
X
class

!i · covi(class) + !b · covb(class)+
!m · covm(class)
!t ·
✓
tprep +
X
class
⇥
tgen(class) + texec(class)
⇤
◆
where, consistent with Section 2.4, covi, covb, covm refer to achieved instruction,
ωi = 1
ωb = 2
ωm = 4
ωt = 1
HOW do we compare them

Unit Testing Tool Competition – Round Two 7
RUN TOOL
RUN TOOL
RUN TOOL
RUN TOOL
T1T1
T2
TN-1
TN
SCORE
BENCHMARKTOOL
CUTs
GENERATED
TEST CASES
COMPILE
EXECUTE
MEASURE
PERFORMANCE:
M1
COMPETITION EXECUTION FRAMEWORK
AGGREGATOR
MEASURE
PERFORMANCE:
M2
HOW do we execute them
JaCoCo
PiTest

8 Sebastian Bauersfeld, Tanja E. J. Vos, and Kiran Lakhotia
run tool
for Tool T
Benchmark
Framework
"BENCHMARK"
Src Path / Bin Path / ClassPath
ClassPath for JUnit Compilation
"READY"
.
.
.
name of CUT
.
.
.
generate ﬁle in
./temp/testcases
"READY"
compile + execute
+ measure test case
loop
preparation
Fig. 2. Benchmark Automation Protocol
HOW to implement RUNTOOL

¢  Same as the 2nd competition (but nobody knew ;-))
¢  Java open source libraries
¢  9 Projects (async http client, eclipse checkstyle, gdata client, guava, hibernate, java machine learning library, Java wikipedia library, scribe, twitter4j)
¢  Sources: Google Code, GitHub, Sourceforge.net
¢  7 classes per project è total of 63 classes
¢  Packages with highest value for the Afferent Coupling Metric
¢  AFC
determines
the
number
of
classes
from
other
packages
that
depend
on

classes
in
the
current
package.

¢  Select
”popular”
classes
within
a
project.

¢  Classes with highest Nested Block Depth
¢  NBD
determines
the
maximal
depth
of
nested
statements
such
as
if-‐else

constructs,
loops
and
excepCon
handlers.

¢  Select
complex
classes
for
which
it
is
diﬃcult
to
achieve
high
branch
coverage.

¢  No exclusions: abstract, small, large, file constructors…
WHAT were the Benchmark Classes

6 runs (indeterminism caused by tools and classes)
Results
(do not try to read this …. Just wanna show that we have done the work)

Results per class
(do not try to read this …. Just wanna show that we have done the work)

And the winner is…….
210, 45 Manual
203, 73 GRT (1)
190,64 EvoSuite (2)
189,22 MOSA-EvoSuite (3)
186,15 T3 (4)
159,16 Jtexpert (5)
93,45 Randoop
65,5 CT (6)

Combined strength
tools tools+humans
Average covi 78.0 % 84.9 %
Average covb 64.7 % 70.1 %
Average covm 60.3 % 69.4 %
# CUTs with covb = 100% 6 7
# CUTs with covb 80% 31 34
CUTs with covi  10% { 43,45,49,61 } { 45 }
CUTs with covi  5% { 45,61 } { 45 }
SCORE 266.7 277.8
TABLE IV
COMBINED STRENGTH OF THE CONTESTING TOOLS
In Table IVCombined strength of the contesting toolstable.4
[1] S. B
less
[2] “Ra
22/0
[3] G. F
obje
[4] L. M
“En
[5] A. S
repr
IEE
[6] A.
cove
Con

¢  More classes
¢  Need testers for manual baseline
¢  Any volunteers? ;-)
¢  More participants!!
¢  Score:
¢  The participants will have a lot to say ;-)
¢  Tool library dependencies appearing as CUTs
(the known Guava library problems)
Future Editions

Contact
§  Tanja
E.
J.
Vos

§  correo:
tvos@pros.upv.es

§  twi2er/skype:
tanja_vos

§  web:
hIp://staq.dsic.upv.es/

§  teléfono:
+34
690
917
971

SBST 2015 - 3rd Tool Competition for Java Junit test Tools

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie SBST 2015 - 3rd Tool Competition for Java Junit test Tools

Ähnlich wie SBST 2015 - 3rd Tool Competition for Java Junit test Tools (20)

Mehr von Tanja Vos

Mehr von Tanja Vos (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SBST 2015 - 3rd Tool Competition for Java Junit test Tools