SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com
Communicated by Sunghun Kim
The Impact of Class Rebalancing
Techniques on the Performance and
Interpretation of Defect Models
Chakkrit (Kla)

Tantithamthavorn
Ahmed

Hassan
Kenichi

Matsumoto
Analytical 

Models
.
.
. ..
. .
.
.
..
DEFECT MODELS IN A NUTSHELL
An analytical model trained on historical data to predict and explain future software defects
BUG
CLEAN
A.java
B.java
C.java
D.java
FILE CLASSMETRICS
……..
CLEAN
Predict future 

software defects
Explain which factors

are associated with 

defect-proneness
Lewis et al.,
ICSE’13
Mockus et al.,
BLTJ’00
Ostrand et al.,
TSE’05
Kim et al.,
FSE’15
Zimmermann et
al., FSE’09

Naggappan et al.,
ICSE’06
Caglayan et al.,
ICSE’15
Tan et al.,
ICSE’15
Shimagaki et al.,
ICSE’16
Defect Dataset
CLEAN
Analytical 

Models
Defect Dataset .
.
. ..
. .
.
.
..
DEFECT DATASETS ARE IMBALANCED!
The proportion of defective and clean modules is not equally represented
BUG
CLEAN
A.java
B.java
C.java
D.java
FILE CLASSMETRICS
CLEAN
CLEAN
Predict future 

software defects
Explain which factors

are associated with 

defect-pronenessTraditional classification techniques often fail
to accurately identify the minority class (i.e.,
defective modules)
……..
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
64% of the defect datasets have a
defective ratio below 30%
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
64% of the defect datasets have a
defective ratio below 30%
As little as 8% of defect
datasets have a defective
ratio between 45%-55%
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
64% of the defect datasets have a
defective ratio below 30%
As little as 8% of defect
datasets have a defective
ratio between 45%-55%
Class imbalance is prominent in defect datasets, likely affecting the
performance and interpretation of defect models
TO MITIGATE THE RISK OF CLASS IMBALANCE
Class rebalancing techniques (i.e., techniques for rebalancing the proportion of defective and clean
modules of the training corpus) are often applied
Original
Dataset
MajorityClassMinorityClass
Re-sampled
Dataset
A
B
A
B
A
B
Over-Sampling

Technique
Original
Dataset
Re-sampled
Dataset
A
B
A
B
Under-Sampling

Technique
SMOTE

Technique
ROSE

Techniqu
Original
Dataset
R
A
B
A
B
MajorityClassMinorityClass
MajorityClassMinorityClass
Original
Dataset
MajorityClassMinorityClass
Re-sampled
Dataset
A
B
…
…
A
B
…
…
SyntheticMinorityClass
SHOULD WE REBALANCE OR NOT?
Prior studies arrive at contradictory conclusions, which make it hard to derive practical guidelines
Improve the F-measure 

by 7.8%-22.4%
[Kamei et al.]
Do not improve the percentage

of correctly classified modules 

(i.e., Accuracy) [Riquelme et al.]
Are not harmful when
defective ratio > 20%
[Mahmood et al.]
4 classification techniques, 2
datasets, 3 measures
2 classification techniques, 4
datasets, 2 measures
A meta-analysis of 42 primary
defect prediction studies
SHOULD WE REBALANCE OR NOT?
Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift)
B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11.
Knowledge
Data
Model
World
Decision/Policy

Making
SHOULD WE REBALANCE OR NOT?
Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift)
B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11.
Decision/Policy

Making
Knowledge
Data
Model
World
Data is not representative to
the world
The learned model
may be biased
Different knowledge
Incorrect action plans
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
CASE STUDY SETUP
Study #Classification #datasets Measures
Kamei et al. 4 2 P, R, and F1
Riquelme et al 2 4 AUC
Wang et al. 2 5 PD, PF, Balance, G-mean, AUC
Tan et al. 7 7 P, R, and F1
Agrawal et al. 6 9 P, R, PF, AUC
Bennin et al. 5 40 P, R, AUC, Balance, G-mean
Our study 7 101 10 performance measures
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
WHICH EXPERIMENTAL SETTINGS YEILD THE
BEST BENEFITS?
Defective Ratio Classification 

Techniques
Class Rebalancing 

Techniques
+ ++Metrics Family
+The Risk of Overfitting

(Events Per Variable, EPV)
~Performance
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
SMOTETUNED by 

[Agrawal and Menzies, ICSE'18]
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
SMOTETUNED by 

[Agrawal and Menzies, ICSE'18]
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Similarly, the SMOTE parameter must
be optimized to improve AUC. Works
best with NNet, GBM, RF, and C5.0
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
SMOTETUNED by 

[Agrawal and Menzies, ICSE'18]
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Similarly, the SMOTE parameter must
be optimized to improve AUC. Works
best with NNet, GBM, RF, and C5.0
SMOTETUNED still has a large
impact on the model interpretation
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
TAKE

AWAY
For predictions
- Use optimised SMOTE for AUC
- Use under-sampling for Recall
For interpretations
- Don’t apply anything!!!!
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com
Dr. Chakkrit (Kla) Tantithamthavorn

Weitere ähnliche Inhalte

Was ist angesagt?

Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Chakkrit (Kla) Tantithamthavorn
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)SungdoGu
 
The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...RAKESH RANA
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutationsTao He
 
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...Editor IJCATR
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...eSAT Publishing House
 
Complexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesComplexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesTim Menzies
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceAnnibale Panichella
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year JourneyLionel Briand
 
Software testing defect prediction model a practical approach
Software testing defect prediction model   a practical approachSoftware testing defect prediction model   a practical approach
Software testing defect prediction model a practical approacheSAT Journals
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynoteShiva Nejati
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated ProcessIRJET Journal
 
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTSESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTScsandit
 
SBST 2019 Keynote
SBST 2019 Keynote SBST 2019 Keynote
SBST 2019 Keynote Shiva Nejati
 
Practical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A ReviewPractical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A Reviewinventionjournals
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorTim Menzies
 
Final Exam Questions Fall03
Final Exam Questions Fall03Final Exam Questions Fall03
Final Exam Questions Fall03Radu_Negulescu
 
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesModel-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesLionel Briand
 

Was ist angesagt? (20)

Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
 
The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutations
 
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...
 
[Tho Quan] Fault Localization - Where is the root cause of a bug?
[Tho Quan] Fault Localization - Where is the root cause of a bug?[Tho Quan] Fault Localization - Where is the root cause of a bug?
[Tho Quan] Fault Localization - Where is the root cause of a bug?
 
Complexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesComplexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software Architectures
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational Intelligence
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year Journey
 
Software testing defect prediction model a practical approach
Software testing defect prediction model   a practical approachSoftware testing defect prediction model   a practical approach
Software testing defect prediction model a practical approach
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynote
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated Process
 
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTSESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
 
SBST 2019 Keynote
SBST 2019 Keynote SBST 2019 Keynote
SBST 2019 Keynote
 
Practical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A ReviewPractical Guidelines to Improve Defect Prediction Model – A Review
Practical Guidelines to Improve Defect Prediction Model – A Review
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction Factor
 
Final Exam Questions Fall03
Final Exam Questions Fall03Final Exam Questions Fall03
Final Exam Questions Fall03
 
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesModel-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
 

Ähnlich wie The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

AMS_Aviation_2014_Ali
AMS_Aviation_2014_AliAMS_Aviation_2014_Ali
AMS_Aviation_2014_AliMDO_Lab
 
Sarcia idoese08
Sarcia idoese08Sarcia idoese08
Sarcia idoese08asarcia
 
Six sigma statistics
Six sigma statisticsSix sigma statistics
Six sigma statisticsShankaran Rd
 
Bayesian Approaches To Improve Sample Size Webinar
Bayesian Approaches To Improve Sample Size WebinarBayesian Approaches To Improve Sample Size Webinar
Bayesian Approaches To Improve Sample Size WebinarnQuery
 
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Eswar Publications
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icseSAIL_QU
 
Elane - Promise08
Elane - Promise08Elane - Promise08
Elane - Promise08gregoryg
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
AIAA-Aviation-VariableFidelity-2014-Mehmani
AIAA-Aviation-VariableFidelity-2014-MehmaniAIAA-Aviation-VariableFidelity-2014-Mehmani
AIAA-Aviation-VariableFidelity-2014-MehmaniOptiModel
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selectionAndrea Dal Pozzolo
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesM HiDayat
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliMDO_Lab
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
 

Ähnlich wie The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models (20)

Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 
AMS_Aviation_2014_Ali
AMS_Aviation_2014_AliAMS_Aviation_2014_Ali
AMS_Aviation_2014_Ali
 
Sarcia idoese08
Sarcia idoese08Sarcia idoese08
Sarcia idoese08
 
Six sigma statistics
Six sigma statisticsSix sigma statistics
Six sigma statistics
 
Bayesian Approaches To Improve Sample Size Webinar
Bayesian Approaches To Improve Sample Size WebinarBayesian Approaches To Improve Sample Size Webinar
Bayesian Approaches To Improve Sample Size Webinar
 
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
 
Elane - Promise08
Elane - Promise08Elane - Promise08
Elane - Promise08
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
AIAA-Aviation-VariableFidelity-2014-Mehmani
AIAA-Aviation-VariableFidelity-2014-MehmaniAIAA-Aviation-VariableFidelity-2014-Mehmani
AIAA-Aviation-VariableFidelity-2014-Mehmani
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
 
Simulation
SimulationSimulation
Simulation
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_Ali
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 

Kürzlich hochgeladen

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 

Kürzlich hochgeladen (17)

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

  • 1. chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com Communicated by Sunghun Kim The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Models Chakkrit (Kla)
 Tantithamthavorn Ahmed
 Hassan Kenichi
 Matsumoto
  • 2. Analytical 
 Models . . . .. . . . . .. DEFECT MODELS IN A NUTSHELL An analytical model trained on historical data to predict and explain future software defects BUG CLEAN A.java B.java C.java D.java FILE CLASSMETRICS …….. CLEAN Predict future 
 software defects Explain which factors
 are associated with 
 defect-proneness Lewis et al., ICSE’13 Mockus et al., BLTJ’00 Ostrand et al., TSE’05 Kim et al., FSE’15 Zimmermann et al., FSE’09
 Naggappan et al., ICSE’06 Caglayan et al., ICSE’15 Tan et al., ICSE’15 Shimagaki et al., ICSE’16 Defect Dataset CLEAN
  • 3. Analytical 
 Models Defect Dataset . . . .. . . . . .. DEFECT DATASETS ARE IMBALANCED! The proportion of defective and clean modules is not equally represented BUG CLEAN A.java B.java C.java D.java FILE CLASSMETRICS CLEAN CLEAN Predict future 
 software defects Explain which factors
 are associated with 
 defect-pronenessTraditional classification techniques often fail to accurately identify the minority class (i.e., defective modules) ……..
  • 4. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al.
  • 5. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al.
  • 6. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al. 64% of the defect datasets have a defective ratio below 30%
  • 7. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al. 64% of the defect datasets have a defective ratio below 30% As little as 8% of defect datasets have a defective ratio between 45%-55%
  • 8. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al. 64% of the defect datasets have a defective ratio below 30% As little as 8% of defect datasets have a defective ratio between 45%-55% Class imbalance is prominent in defect datasets, likely affecting the performance and interpretation of defect models
  • 9. TO MITIGATE THE RISK OF CLASS IMBALANCE Class rebalancing techniques (i.e., techniques for rebalancing the proportion of defective and clean modules of the training corpus) are often applied Original Dataset MajorityClassMinorityClass Re-sampled Dataset A B A B A B Over-Sampling
 Technique Original Dataset Re-sampled Dataset A B A B Under-Sampling
 Technique SMOTE
 Technique ROSE
 Techniqu Original Dataset R A B A B MajorityClassMinorityClass MajorityClassMinorityClass Original Dataset MajorityClassMinorityClass Re-sampled Dataset A B … … A B … … SyntheticMinorityClass
  • 10. SHOULD WE REBALANCE OR NOT? Prior studies arrive at contradictory conclusions, which make it hard to derive practical guidelines Improve the F-measure 
 by 7.8%-22.4% [Kamei et al.] Do not improve the percentage
 of correctly classified modules 
 (i.e., Accuracy) [Riquelme et al.] Are not harmful when defective ratio > 20% [Mahmood et al.] 4 classification techniques, 2 datasets, 3 measures 2 classification techniques, 4 datasets, 2 measures A meta-analysis of 42 primary defect prediction studies
  • 11. SHOULD WE REBALANCE OR NOT? Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift) B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11. Knowledge Data Model World Decision/Policy
 Making
  • 12. SHOULD WE REBALANCE OR NOT? Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift) B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11. Decision/Policy
 Making Knowledge Data Model World Data is not representative to the world The learned model may be biased Different knowledge Incorrect action plans
  • 13. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 14. CASE STUDY SETUP Study #Classification #datasets Measures Kamei et al. 4 2 P, R, and F1 Riquelme et al 2 4 AUC Wang et al. 2 5 PD, PF, Balance, G-mean, AUC Tan et al. 7 7 P, R, and F1 Agrawal et al. 6 9 P, R, PF, AUC Bennin et al. 5 40 P, R, AUC, Balance, G-mean Our study 7 101 10 performance measures
  • 15. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 16. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 17. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision Unfortunately, class rebalancing techniques have a large impact on the model interpretation WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 18. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision Unfortunately, class rebalancing techniques have a large impact on the model interpretation WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 19. WHICH EXPERIMENTAL SETTINGS YEILD THE BEST BENEFITS? Defective Ratio Classification 
 Techniques Class Rebalancing 
 Techniques + ++Metrics Family +The Risk of Overfitting
 (Events Per Variable, EPV) ~Performance
  • 20. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 21. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 22. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 23. SMOTETUNED by 
 [Agrawal and Menzies, ICSE'18] PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 24. SMOTETUNED by 
 [Agrawal and Menzies, ICSE'18] PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Similarly, the SMOTE parameter must be optimized to improve AUC. Works best with NNet, GBM, RF, and C5.0 WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 25. SMOTETUNED by 
 [Agrawal and Menzies, ICSE'18] PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Similarly, the SMOTE parameter must be optimized to improve AUC. Works best with NNet, GBM, RF, and C5.0 SMOTETUNED still has a large impact on the model interpretation WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 26. TAKE
 AWAY For predictions - Use optimised SMOTE for AUC - Use under-sampling for Recall For interpretations - Don’t apply anything!!!! chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com Dr. Chakkrit (Kla) Tantithamthavorn