Multi-Objective Cross-Project Defect Prediction

Gerardo

Canfora

Andrea
De

Lucia

Massimiliano

Di
Penta

Rocco

Oliveto

Annibale
Panichella
Sebas<ano

Panichella

Multi-Objective Cross-Project
Defect Prediction

Practical Constraints
Sofwtare
Quality
Money
Time

Defect Prediction
Spent more resources
on components most
likely to fail

Indicators of defects
Cached history
information
Kim
at
al.

ICSE
2007

Change Metrics
Moset
at
al.

ICSE
2008.

A metrics suite for
object oriented
design Chidamber

at
al.

TSE

1994

Defect Prediction Methodology
Predic<ng

Model

Project

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …

Predic<ng

Model

Project

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Within Project

Predic<ng

Model

Project

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Within Project
Issue: Size of the
Training Set

Predic<ng

Model

Project

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Predic<ng

Model

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Within Project
Issue: Size of the
Training Set

Past
Projects

New
Project

Project
B

Project
A

Predic<ng

Model

Project

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Predic<ng

Model

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Within Project
Cross-Project
Issue: Size of the
Training Set

Project
B

Project
A

Predic<ng

Model

Project

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Predic<ng

Model

Test
Set

Training
Set

Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Within Project
Cross-Project
Issue: Size of the
Training Set
Issue: The predicting
accuracy can be lower

Cost Effectiveness
1)  Cross-project does not
necessarily works worse than
within-project
2)  Better precision (accuracy)
does not mirror less
inspection cost
3)  Traditional predicting model:
logistic regression
Recaling the “imprecision” of Cross-
project Defect Prediction, Rahman
at
al.

FSE
2012

Cost Effectiveness: example
Class
A
Class
B
Class
C
Class
D

Predicting model 1
Class
A
Class
B
Class
A
Class
C
Class
D

100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Predicting model 2
Class
A
Class
B
Class
C
Class
D

Predicting model 1
Class
A
Class
B
Class
A
Class
C
Class
D

BUG

BUG

100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Predicting model 2
Class
A
Class
B
Class
C
Class
D

Predicting model 1
Class
A
Class
B
Class
A
Class
C
Class
D

BUG

BUG

100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Precision
=
50
%

Cost
=10,100
LOC

Predicting model 2
Class
A
Class
B
Class
C
Class
D

Cost Effectiveness: an example
Predicting model 1
Class
A
Class
B
Class
A
Class
C
Class
D

BUG

BUG

100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Precision
=
50
%

Cost
=10,100
LOC

Predicting model 2
Precision
=
33
%

Cost
=
300
LOC

Class
A
Class
B
Class
C
Class
D

Class
A
Class
B
Class
C
Class
D

Cost Effectiveness: an example
Predicting model 1
Class
A
Class
B
Class
A
Class
C
Class
D

BUG

BUG

100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Predicting model 2
Precision does not mirrorthe inspection cost
All the existing predicting models work
on precision and not on cost
We need COST oriented models

Mul+-‐objec+ve

Logis+c
Regression

Building Predicting Model on Training Set
Training
Set

P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …
Logis<c

Regression

Pred.
C1 1
C2 1
C3 0
C4 1
… 0

Training
Set

Logis<c

Regression

Pred.
C1 1
C2 1
C3 0
C4 1
… 0
Actual Val
C1 1
C2 0
C3 1
C4 1
… 0
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …

Training
Set

Logis<c

Regression

Pred.
C1 1
C2 1
C3 0
C4 1
… 0
Actual Val
C1 1
C2 0
C3 1
C4 1
… 0
Comparison
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …

Training
Set

Logis<c

Regression

Pred.
C1 1
C2 1
C3 0
C4 1
… 0
Actual Val
C1 1
C2 0
C3 1
C4 1
… 0
Comparison
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …
GOAL: minimazing the
predicting error
(PRECISION)

Multi-objective Logistic Regression
Pred.
1
0
…
1
0
LOC
100
95
…
110
10
*
=

Cost
100
0
…
110
0
Ispection Cost = 210 LOC

Pred.
1
0
…
1
0
LOC
100
95
…
110
10
*
=

Cost
100
0
…
110
0
Pred.
1
0
…
1
0
Actual
Values
1
1
…
1
0
*
=

#Bug
1
0
…
1
0
Effectiveness = 2 defects

⎪
⎩
⎪
⎨
⎧
⋅=
⋅=
∑
∑
i
ii
i
i
i
ActualPredessEffectiven
CostPredCostIspectionmin
max
Pred.
1
0
…
1
0
LOC
100
95
…
110
10
*
=

Cost
100
0
…
110
0
Pred.
1
0
…
1
0
Actual
Values
1
1
…
1
0
*
=

#Bug
1
0
…
1
0

⎪
⎩
⎪
⎨
⎧
⋅=
⋅=
∑
∑
i
ii
i
i
i
ActualedessEffectiven
CostPredCostIspection
Pr
min
max
Pred.
1
0
…
1
0
LOC
100
95
…
110
10
*
=

Cost
100
0
…
110
0
Pred.
1
0
…
1
0
Actual
Values
1
1
…
1
0
*
=

#Bug
1
0
…
1
0

a + b mi1 + c mi2 + …
Multi-objective Genetic Algorithm
⎪
⎩
⎪
⎨
⎧
⋅=
⋅=
∑
∑
i
ii
i
i
i
ActualedessEffectiven
CostPredCostIspection
Pr
min
max
.
1 e
e
Pred
+
=
a + b mi1 + c mi2 + …
Chromosome

(a, b,c , …)
Fitness Function
Multiple objectives are
optimized using Pareto
efficient approaches

Multi-objective Genetic Algorithm
Pareto Optimality: all solutionsthat are not dominated by anyother solutions form the Paretooptimal set.
Multiple otpimal solutions (models)
can be found
Cost
Effectiveness
The frontier allows to make a
well-informed decision that
balances the trade-offs
between the two objectives

Research Questions
RQ1: How does the multi-objective (MO)prediction perform,
compared to single-objective (SO) prediction

Research Questions
Cross-project MO vs. cross-project SO
vs. within project SO

Research Questions
RQ2: How does the proposed approach perform, comparedto the local prediction approach by Menzie et al. ?

Research Questions
RQ2: How does the proposed approach perform, comparedto the local prediction approach by Menzie et al. ?
Cross-project MO vs. Local Prediction

Experiment outline
• 10 java projects from PROMISE datasetü 
diﬀerent
sizes

ü 
diﬀerent
context
applica<on

diﬀerent
sizes

ü 
diﬀerent
context
applica<on

Experiment outline
• Cross-projects defect prediction:
ü Training
model
on
nine
projects
and
test
on
the
remaining
one

(10
<mes)

RQ1

diﬀerent
sizes

ü 
diﬀerent
context
applica<on

Experiment outline
ü Training
model
on
nine
projects
and
test
on
the
remaining
one

(10
<mes)

• Within project defect prediction:
ü 
10
cross-‐folder
valida<on

RQ1

RQ1

different
sizes

ü 
different
context
applica<on

Experiment outline
ü Training
model
on
nine
projects
and
test
on
the
remaining
one

(10
<mes)

• Within project defect prediction:
ü 
10
cross-‐folder
valida<on

• Local prediction:
ü 

K-‐means
clustering
algorithm

ü 
Silhoue]e
Coefficient

RQ1

RQ1

RQ2

Cross-project MO vs. Cross-project SO
0

50

100

150

200

250

300

KLOC

Cross-‐project
SO
Cross
project
MO

Cross-project MO vs. Cross-project SO
0

50

100

150

200

250

300

KLOC

Cross-‐project
SO
Cross
project
MO

The proposed multi-objective model
Outperform the single-objective one

Cross-project MO vs. Within-project SO
0

50

100

150

200

250

300

350

KLOC

Within
project
SO
Cross
project
MO

0

10

20

30

40

50

60

70

80

90

100

Precision

Within
project
SO
Cross
project
MO

0

10

20

30

40

50

60

70

80

90

100

Precision

Within
project
SO
Cross
project
MO

Cross-project prediction is worse than within-project
prediction in terms of PRECISION

0

10

20

30

40

50

60

70

80

90

100

Precision

Within
project
SO
Cross
project
MO

Cross-project prediction is worse than within-project
prediction in terms of PRECISION
But it is better than within-project predictors in term
of COST-EFFECTIVENESS

0

50

100

150

200

250

300

KLOC

Local
Predic<on
Cross
project
MO


0

50

100

150

200

250

300

KLOC

Local
Predic<on
Cross
project
MO

The multi-objective predictor outperforms the local
predictor.

Multi-Objective Cross-Project Defect Prediction

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Multi-Objective Cross-Project Defect Prediction

Ähnlich wie Multi-Objective Cross-Project Defect Prediction (20)

Mehr von Sebastiano Panichella

Mehr von Sebastiano Panichella (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Multi-Objective Cross-Project Defect Prediction