SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
A Tale of Bug Prediction in 
Software Development 
Martin Pinzger 
Professor of Software Engineering 
University of Klagenfurt, Austria 
Follow me: @pinzger
Software repositories 
2
Hmm, wait a minute 
3 
Can’t we learn “something” from that data?
Goal of software repository mining 
Software Analytics 
To obtain insightful and actionable information for completing various tasks 
around developing and maintaining software systems 
Examples 
Quality analysis and defect prediction 
Detecting “hot-spots” 
Recommender (advisory) systems 
Code completion 
Suggesting good code examples 
Helping in using an API 
... 
4
Examples from my mining research 
Predicting failure-prone source files using changes (MSR 2011) 
Predicting failure-prone methods (ESEM 2012) 
The relationship between developer contributions and failure-prone 
Microsoft Vista binaries (FSE 2008) 
! 
Surveys on software repository mining 
A survey and taxonomy of approaches for mining software repositories in the 
context of software evolution, Kagdi et al. 2007 
Evaluating defect prediction approaches: a benchmark and an extensive 
comparison, D’Ambros et al. 2012 
Conference: MSR 2015 http://2015.msrconf.org/ 
5
Using Fine-Grained Source Code 
Changes for Bug Prediction 
with Emanuel Giger, Harald Gall 
University of Zurich
Bug prediction 
Goal 
Train models to predict the bug-prone source files of the next release 
How 
Using product measures, process measures, organizational measures with 
machine learning techniques 
7
Many existing studies 
A comparative analysis of the efficiency of change metrics and static 
code attributes for defect prediction, Moser et al. 2008 
Use of relative code churn measures to predict system defect 
density, Nagappan et al. 2005 
Cross-project defect prediction: a large scale experiment on data vs. 
domain vs. process, Zimmermann et al. 2009 
Predicting faults using the complexity of code changes, Hassan et al. 
2009 
8
Classical change measures 
Number of file revisions 
Code Churn aka lines added/deleted/changed 
! 
Can we improve existing prediction models? 
9
Revisions are coarse grained 
What did change in a revision? 
10
Code Churn can be imprecise 
11 
Extra changes not relevant for locating bugs
Fine Grained-Source Code Changes (SCC) 
Account.java 1.5 
THEN 
MI 
IF "balance > 0" 
"withDraw(amount);" 
THEN 
MI 
Account.java 1.6 
"balance > 0 && amount <= balance" 
IF 
"withDraw(amount);" 
ELSE 
MI 
notify(); 
3 SCC: 1x condition change, 1x else-part insert, 1x invocation 
statement insert 12
Categories of SCC 
cDecl = changes to class declarations 
oState = insertion and deletion of class attributes 
func = insertion and deletion of methods 
mDecl = changes to method declarations 
stmt = insertion or deletion of executable statements 
cond = changes to conditional expressions 
else = insertion and deletion of else-parts 
13
Research hypotheses 
14 
H1 SCC is correlated with the number of bugs in 
source files 
H2 SCC is a predictor for bug-prone source files 
(and outperforms Code Churn) 
H3 SCC is a predictor for the number of bugs in 
source files (and outperforms Code Churn)
15 Eclipse plug-ins 
Data 
>850’000 fine-grained source code changes (SCC) 
>10’000 files 
>9’700’000 lines modified (LM = Code Churn) 
>9 years of development history 
..... and a lot of bugs referenced in commit messages (e.g., bug #345) 
15
Typical experimental set-up 
1. Analyze quality and distribution of the data 
Use descriptive statistics, histograms, Q-Q plots 
-> Determines the statistical methods that you can use 
2. Perform correlation analysis 
Spearman (non-parametric) 
3. Machine learners/classifiers 
Simple ones first (binary logistic regression, linear regression, decision trees) 
10-fold cross validation, precision, recall, AUC ROC 
4. Interpretation and discussion of results (incl. threats to validity) 
16
(LM), (2) bug data, i.e., which files contained bugs and 
how many of them (Bugs), and (3) fine-grained source code 
changes (SCC). 
Approach overview 
2. Bug Data 
3. 1.Versioning Data Source Code Changes (SCC) 
4. Experiment 
CVS, SVN, 
GIT 
Evolizer 
RHDB 
Log Entries 
ChangeDistiller 
Subsequent 
Versions 
Changes 
#bug123 
Message Bug 
Support 
Vector 
Machine 
AST 
Comparison 
1.1 1.2 
Figure 1: Stepwise overview of the data extraction process. 
17
Table 3: Relative frequencies of SCC categories per Eclipse 
project, Frequency plus of their change mean type and categories 
variance over all selected 
projects. 
Eclipse Project cDecl oState func mDecl stmt cond else 
Compare 0.01 0.06 0.08 0.05 0.74 0.03 0.03 
jFace 0.02 0.04 0.08 0.11 0.70 0.02 0.03 
JDT Debug 0.02 0.06 0.08 0.10 0.70 0.02 0.02 
Resource 0.01 0.04 0.02 0.11 0.77 0.03 0.02 
Runtime 0.01 0.05 0.07 0.10 0.73 0.03 0.01 
Team Core 0.05 0.04 0.13 0.17 0.57 0.02 0.02 
CVS Core 0.01 0.04 0.10 0.07 0.73 0.02 0.03 
Debug Core 0.04 0.07 0.02 0.13 0.69 0.02 0.03 
jFace Text 0.04 0.03 0.06 0.11 0.70 0.03 0.03 
Update Core 0.02 0.04 0.07 0.09 0.74 0.02 0.02 
Debug UI 0.02 0.06 0.09 0.07 0.70 0.03 0.03 
JDT Debug UI 0.01 0.07 0.07 0.05 0.75 0.02 0.03 
Help 0.02 0.05 0.08 0.07 0.73 0.02 0.03 
JDT Core 0.00 0.03 0.03 0.05 0.80 0.05 0.04 
OSGI 0.03 0.04 0.06 0.11 0.71 0.03 0.02 
Mean 0.02 0.05 0.07 0.09 0.72 0.03 0.03 
Variance 0.000 0.000 0.001 0.001 0.003 0.000 0.000 
18
Non parametric Spearman rank correlation of 
and H1: SCC SCC . is * correlated marks significant with #bugs 
correlations at 
Larger values are printed bold. 
Eclipse Project LM SCC 
Compare 0.68 0.76  
jFace 0.74 0.71 
JDT Debug 0.62 0.8 
Resource 0.75 0.86 
Runtime 0.66 0.79 
Team Core 0.15 0.66 
CVS Core 0.60 0.79 
Debug Core 0.63 0.78 
jFace Text 0.75 0.74 
Update Core 0.43 0.62 
Debug UI 0.56 0.81 
JDT Debug UI 0.80 0.81 
Help 0.54 0.48 
JDT Core 0.70 0.74 
OSGI 0.70 0.77 
Median 0.66 0.77 
Table 5: and categories  = 0.01. 
Eclipse Project Compare jFace Resource Team Core CVS Core Debug Core Runtime JDT Debug jFace Text JDT Debug Update Core Debug UI Help OSGI JDT Core Mean *significant 
correlation at 0.01 
19 
Spearman correlation: 
+/-0.5 substantial 
+/-0.7 strong
both provided by the Rapid Miner toolkit [24]. The classifiers 
calculate and assign a probability to a file if it is bug-prone or 
not Predicting bug-prone. 
bug-prone files 
For each Eclipse project we binned files into bug-prone and 
not bug-prone using the median of the number of bugs per file 
(#Bug-bugs): 
prone vs. not bug-prone 
! 
bugClass = 
! 
⇢ 
not bug  prone : #bugs = median 
bug  prone : #bugs  median 
When using the median as cut point the labeling of a file is 
relative ! 
to how much bugs other files have in a project. There 
exist several ways of binning files afore. They mainly vary in 
that 10-Fold they Cross result Validation 
in different prior probabilities: For instance 
Zimmerman Whole data set et split al. into [40] 10 sub-and samples Bernstein of equal et size 
al. [4] labeled files as 
bug-Train prone model if with they 9 sub-had samples 
at least one bug. When having heavily 
skewed Predict distributions bug-prone files of the this 10th approach sub-sample 
may lead to high a prior 
probability - 10 runs after, towards performance a one measures class. are Nagappan averaged 
et al. [28] used a 
statistical lower confidence bound. The different prior prob-abilities 
20
AUC values of E 1 using logistic regression with 
SCC H2: as SCC predictors can predict for bug-bug-prone prone and files 
a not bug-prone 
Larger values are printed in bold. 
Eclipse Project AUC LM AUC SCC 
Compare 0.84 0.85 
jFace 0.90 0.90 
JDT Debug 0.83 0.95 
Resource 0.87 0.93 
Runtime 0.83 0.91 
Team Core 0.62 0.87 
CVS Core 0.80 0.90 
Debug Core 0.86 0.94 
jFace Text 0.87 0.87 
Update Core 0.78 0.85 
Debug UI 0.85 0.93 
JDT Debug UI 0.90 0.91 
Help 0.75 0.70 
JDT Core 0.86 0.87 
OSGI 0.88 0.88 
Median 0.85 0.90 
Overall 0.85 0.89 
21 
Models trained with 
Logistic Regression 
SCC outperforms LM
SCC categories to predict bug-prone files of E 2 using different learners and the number of each SCC category as predictors The largest AUC value of each row is printed in bold. 
Eclipse Project LogReg J48 RndFor B-Net eCHAID LibSVM N-Bayes NN 
Compare 0.82 0.77 0.77 0.83 0.74 0.81 0.82 0.82 
jFace 0.90 0.85 0.88 0.89 0.83 0.91 0.87 0.88 
JDT Debug 0.94 0.92 0.94 0.95 0.89 0.95 0.87 0.89 
Resource 0.89 0.86 0.89 0.91 0.77 0.92 0.90 0.91 
Runtime 0.89 0.82 0.83 0.87 0.80 0.87 0.86 0.87 
Team Core 0.86 0.78 0.79 0.85 0.77 0.86 0.85 0.86 
CVS Core 0.89 0.81 0.87 0.88 0.74 0.87 0.86 0.88 
Debug Core 0.92 0.86 0.89 0.91 0.79 0.93 0.92 0.86 
jFace Text 0.86 0.77 0.81 0.85 0.76 0.79 0.82 0.81 
Update Core 0.82 0.87 0.90 0.86 0.86 0.89 0.89 0.90 
Debug UI 0.92 0.88 0.91 0.92 0.82 0.92 0.89 0.91 
JDT Debug UI 0.89 0.89 0.90 0.89 0.81 0.90 0.85 0.89 
Help 0.69 0.65 0.67 0.69 0.63 0.69 0.69 0.68 
JDT Core 0.85 0.86 0.88 0.90 0.80 0.88 0.85 0.87 
OSGI 0.86 0.81 0.86 0.88 0.77 0.87 0.87 0.87 
Median 0.89 0.85 0.88 0.88 0.79 0.88 0.86 0.87 
Overall 0.88 0.87 0.84 0.89 0.82 0.89 0.85 0.84 
projects even have the values of 0.85 
models explain a large amount of the vari-ance 
dataset. A strong positive correlation 
1.50 
1.00 
1.00 
22 
SCC categories do not improve performance
Predicting the number of bugs 
Non linear regression with asymptotic model: 
! 
23 
0 1000 2000 3000 4000 
#SCC 
#Bugs 
60 
40 
20 
0 
Eclipse Team Core 
f(#Bugs) = a1 + b2*eb3*SCC
THa3bl:e S8C: CR ecsaunlts porfetdhiecnt otnhlein neaurmrebgerers osifo nbuingsterms of R2 
and Spearman correlation using LM and SCC as predictors. 
Project R2 
LM R2 
SCC SpearmanLM SpearmanSCC 
Compare 0.84 0.88 0.68 0.76 
jFace 0.74 0.79 0.74 0.71 
JDT Debug 0.69 0.68 0.62 0.8 
Resource 0.81 0.85 0.75 0.86 
Runtime 0.69 0.72 0.66 0.79 
Team Core 0.26 0.53 0.15 0.66 
CVS Core 0.76 0.83 0.62 0.79 
Debug Core 0.88 0.92 0.63 0.78 
Jface Text 0.83 0.89 0.75 0.74 
Update Core 0.41 0.48 0.43 0.62 
Debug UI 0.7 0.79 0.56 0.81 
JDT Debug UI 0.82 0.82 0.8 0.81 
Help 0.66 0.67 0.54 0.84 
JDT Core 0.69 0.77 0.7 0.74 
OSGI 0.51 0.8 0.74 0.77 
Median 0.7 0.79 0.66 0.77 
Overall 0.65 0.72 0.62 0.74 
nrm. Residuals 
1.50 
1.00 
.50 
.00 
- . 5 0 
-1.00 
6,000.0 
5,000.0 
4,000.0 
3,000.0 
2,000.0 
1,000.0 
24 
SCC outperforms LM
Summary of results 
SCC performs significantly better than Code Churn (LM) 
Advanced learners are not always better 
Change types do not yield extra discriminatory power 
Predicting the number of bugs is “possible” 
Non linear regression model with a median R^2 = 0.79 
More information in our paper 
“Comparing Fine-Grained Source Code Changes And Code Churn For Bug 
Prediction”, Giger et al. 2011 
25
Method-Level Bug Prediction 
with Emanuel Giger, Marco D’Ambros*, Harald Gall 
University of Zurich 
*University of Lugano
Prediction granularity 
class 1 class 2 class 3 ... class n 
11 methods on average 
4 methods are bug prone (ca. 36%) 
Retrieving bug-prone methods saves manual inspection effort and 
testing effort 
27 
Large files are typically the most bug-prone files
Research questions 
28 
RQ1 What is the performance of bug prediction on 
method level using change  code metrics? 
RQ2 Which set of predictors provides the best 
performance? 
RQ3 How does the performance vary if the number 
of buggy methods decreases?
21 Java open source projects 
29 
Project #Classes #Methods #M-Histories #Bugs 
JDT Core 1.140 17.703 43.134 4.888 
Jena2 897 8.340 7.764 704 
Lucene 477 3.870 1.754 377 
Xerces 693 8.189 6.866 1.017 
Derby Engine 1.394 18.693 9.507 1.663 
Ant Core 827 8.698 17.993 1.900
Investigated metrics 
30 
Source code metrics (from the last release) 
fanIn, fanOut, localVar, parameters, commentToCodeRatio, countPath, McCabe 
Complexity, statements, maxNesting 
Change metrics 
methodHistories, authors, 
stmtAdded, maxStmtAdded, avgStmtAdded, 
stmtDeleted, maxStmtDeleted, avgStmtDeleted, 
churn, maxChurn, avgChurn, 
decl, cond, elseAdded, elseDeleted 
Bugs 
Count bug references in commit logs for changed methods
3.1 Predicting Experimental bug-prone methods 
Setup 
Prior to model building and classification we labeled method in our dataset either as bug-prone or not bug-prone 
Bug-prone vs. not bug-prone 
as follows: 
! 
! 
31 
bugClass = ! not bug − prone : #bugs = 0 
bug − prone : #bugs = 1 
These two classes represent the binary target classes training and validating the prediction models. Using 0 1) as cut-point is a common approach applied many studies covering bug prediction models, e.g., [30, 47, 4, 27, 37]. Other cut-points are applied in literature, 
for instance, a statistical lower confidence bound [33] or median [16]. Those varying cut-points as well as the diverse 
datasets result in different prior probabilities. For instance,
RQ1  RQ2: Performance of prediction models 
Table 4: Median classification results over all pro-jects 
! 
! 
! 
! 
! 
Models computed with change metrics (CM) perform best 
authors and methodHistories are the most important measures 
32 
per classifier and per model 
CM SCM CMSCM 
AUC P R AUC P R AUC P R 
RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95 
SVM .96 .83 .86 .7 .48 .63 .95 .8 .96 
BN .96 .82 .86 .73 .46 .73 .96 .81 .96 
J48 .95 .84 .82 .69 .56 .58 .91 .83 .89 
values of the code metrics model are approximately 0.7 for 
each classifier—what is defined by Lessman et al. as ”promis-ing” 
[26]. However, the source code metrics suffer from con-siderably 
low precision values. The highest median precision
how the classification performance varies (RQ3) as the Predicting of samples bug-in prone the target methods class with shrinks, diff. cut-and points 
whether observe similar findings as in Section 3.2 regarding the of the change and code metrics (RQ2). For that, applied Bug-prone three vs. not additional bug-prone 
cut-point values as follows: 
! 
! 
p = 75%, 90%, 95% percentiles of #bugs in methods per project 
- predict the top 25%, 10%, and 5% bug-prone methods 
33 
bugClass = ! not bug − prone : #bugs = p 
bug − prone : #bugs  p 
where p represents either the value of the 75%, 90%, or percentile of the distribution of the number of bugs in per project. For example, using the 95% percentile cut-point for prior binning would mean to predict the percent”methods in terms of the number of bugs. 
To conduct this study we applied the same experimental 
setup as in Section 3.1, except for the differently chosen
RQ3: Decreasing the number of bug-prone methods 
Table 5: Median classification results for RndFor 
over all projects per cut-point and per model 
! 
! 
! 
! 
! 
Models trained with Random Forest (RndFor) 
Change metrics (CM) perform best 
Precision decreases (as expected) 
34 
CM SCM CMSCM 
AUC P R AUC P R AUC P R 
GT0 .95 .84 .88 .72 .50 .64 .95 .85 .95 
75% .97 .72 .95 .75 .39 .63 .97 .74 .95 
90% .97 .58 .94 .77 .20 .69 .98 .64 .94 
95% .97 .62 .92 .79 .13 .72 .98 .68 .92 
sion in the case of the 95% percentile (median precision of 
0.13). Looking at the change metrics and the combined 
model the median precision is significantly higher for the 
GT0 and the 75% percentiles compared to the 90% and the
Application: file level vs. method level prediction 
JDT Core 3.0 - LocalDeclaration.class 
Contains 6 methods / 1 affected by post release bugs 
LocalDeclaration.resolve(...) was predicted bug-prone with p=0.97 
File-level: p=0.17 to guess the bug-prone method 
Need to manually rule out 5 methods to reach 0.82 precision 1 / (6-5) 
JDT Core 3.0 - Main.class 
Contains 26 methods / 11 affected by post release bugs 
Main.configure(...) was predicted bug-prone with p=1.0 
File-level: p=0.42 to guess a bug-prone method 
Need to rule out 13 methods to reach 0.82 precision 11 / (26-13) 
35
What can we learn from that? 
Large files are more likely to change and have bugs 
Test large files more thoroughly - YES 
Bugs are fixed through changes that again lead to bugs 
Stop changing our systems - NO, of course not! 
Test changing entities more thoroughly - YES 
Are we not already doing that? 
Do we really need (complex) prediction models for that? 
Not sure - might be the reason why these models are not really used, yet 
Microsoft started to add prediction models to their quality assurance tools 
But, use at least metric tools and keep track of your code quality 
- E.g., continuous integration environments, SonarQube 
36
What is next? 
Ease understanding of changes 
Analysis of the effect(s) of changes 
What is the effect on the design? 
What is the effect on the quality? 
Recommender techniques 
Provide advice on the effects of changes 
37
Facilitating understanding of changes 
38 
Changes 
FineMem 
Alex fixed Bug 14: 
Changed if condition in method send() in 
module BandPass. 
Alex 
Peter
Research: understanding changes 
39 
Change 
extractor 
FineMem 
! 
Changes 
Detailed 
changes 
Change summaries 
Change visualization 
Subscription to 
detailed changes 
Alex
Conclusions 
40 
0 1000 2000 3000 4000 
#SCC 
#Bugs 
60 
40 
20 
0 
Eclipse Team Core 
Questions? 
! 
Martin Pinzger 
martin.pinzger@aau.at 
we take into account three main pieces of information about 
the history of a software system to assemble the dataset for 
our experiments: (1) versioning data including lines modi-fied 
(LM), (2) bug data, i.e., which files contained bugs and 
how many of them (Bugs), and (3) fine-grained source code 
changes (SCC). 
2. Bug Data 
1.Versioning Data 3. Source Code Changes (SCC) 
4. Experiment 
CVS, SVN, 
GIT 
Evolizer 
RHDB 
Log Entries 
ChangeDistiller 
Subsequent 
Versions 
Changes 
#bug123 
Message Bug 
Support 
Vector 
Machine 
AST 
Comparison 
1.1 1.2 
Figure 1: Stepwise overview of the data extraction process. 
1. Versioning Data. We use EVOLIZER [14] to access the ver-sioning 
repositories , e.g., CVS, SVN, or GIT. They provide 
log entries that contain information about revisions of files 
that belong to a system. From the log entries we extract the 
revision number (to identify the revisions of a file in correct 
temporal order), the revision timestamp, the name of the de-veloper 
who checked-in the new revision, and the commit 
message. We then compute LM for a source file as the sum of 
lines added, lines deleted, and lines changed per file revision. 
2. Bug Data. Bug reports are stored in bug repositories such 
as Bugzilla. Traditional bug tracking and versioning repos-itories 
Update Core 595 8’496 251’434 36’151 532 Oct01-Debug UI 1’954 18’862 444’061 81’836 3’120 May01-JDT Debug UI 775 8’663 168’598 45’645 2’002 Nov01-Help 598 3’658 66’743 12’170 243 May01-JDT Core 1’705 63’038 2’814K 451’483 6’033 Jun01-OSGI 748 9’866 335’253 56’238 1’411 Nov03-single source code statements, e.g., method invocation between two versions of a program by comparing 
their respective abstract syntax trees (AST). Each change represents a tree edit operation that is required to transform 
one version of the AST into the other. The algorithm is in CHANGEDISTILLER [14] that pairwise compares 
the ASTs between all direct subsequent revisions of each Based on this information, we then count the number source code changes (SCC) per file revision. 
The preprocessed data from step 1-3 is stored into History Database (RHDB) [10]. From that data, compute LM, SCC, and Bugs for each source file by aggregat-ing 
the values over the given observation period. 
3. EMPIRICAL STUDY 
In this section, we present the empirical study that to investigate the hypotheses stated in Section discuss the dataset, the statistical methods and machine algorithms we used, and report on the results and of the experiments. 
3.1 Dataset and Data Preparation 
We performed our experiments on 15 plugins of the platform. Eclipse is a popular open source system been studied extensively before [4, 27, 38, 39]. 
Table 1 gives an overview of the Eclipse dataset this study with the number of unique *.java files (Files), FineMem 
Change 
extractor 
! 
Changes 
Detailed 
changes 
Change summaries 
Change visualization 
Subscription to detailed 
changes 
Alex

Weitere ähnliche Inhalte

Ähnlich wie A tale of bug prediction in software development

A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug predictionMartin Pinzger
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...University of Antwerp
 
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...jsvetter
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
Issre2014 test defectprediction
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectpredictionKim Herzig
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsRoberto Natella
 
PVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Enkitec
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...cscpconf
 

Ähnlich wie A tale of bug prediction in software development (20)

A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
poster_3.0
poster_3.0poster_3.0
poster_3.0
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
 
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Issre2014 test defectprediction
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectprediction
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
 
PVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ code
 
Static analysis by tools
Static analysis by toolsStatic analysis by tools
Static analysis by tools
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Higher Homework
Higher HomeworkHigher Homework
Higher Homework
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...
 
MTV15
MTV15MTV15
MTV15
 

Kürzlich hochgeladen

Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.ShwetaHattimare
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptxVijayaKumarR28
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxAkinrotimiOluwadunsi
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba caohsadfeeling
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsSafaFallah
 
MARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENTMARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENTjipexe1248
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)GRAPE
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentRahulVishwakarma71547
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPirithiRaju
 
MARSILEA notes in detail for II year Botany.ppt
MARSILEA  notes in detail for II year Botany.pptMARSILEA  notes in detail for II year Botany.ppt
MARSILEA notes in detail for II year Botany.pptaigil2
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptxVaishnaviAware
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPirithiRaju
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrashi Coaching
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxkastureyashashree
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPirithiRaju
 

Kürzlich hochgeladen (20)

Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptx
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba ca
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibiotics
 
MARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENTMARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENT
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform Environment
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPR
 
MARSILEA notes in detail for II year Botany.ppt
MARSILEA  notes in detail for II year Botany.pptMARSILEA  notes in detail for II year Botany.ppt
MARSILEA notes in detail for II year Botany.ppt
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptx
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
 
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptx
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
 

A tale of bug prediction in software development

  • 1. A Tale of Bug Prediction in Software Development Martin Pinzger Professor of Software Engineering University of Klagenfurt, Austria Follow me: @pinzger
  • 3. Hmm, wait a minute 3 Can’t we learn “something” from that data?
  • 4. Goal of software repository mining Software Analytics To obtain insightful and actionable information for completing various tasks around developing and maintaining software systems Examples Quality analysis and defect prediction Detecting “hot-spots” Recommender (advisory) systems Code completion Suggesting good code examples Helping in using an API ... 4
  • 5. Examples from my mining research Predicting failure-prone source files using changes (MSR 2011) Predicting failure-prone methods (ESEM 2012) The relationship between developer contributions and failure-prone Microsoft Vista binaries (FSE 2008) ! Surveys on software repository mining A survey and taxonomy of approaches for mining software repositories in the context of software evolution, Kagdi et al. 2007 Evaluating defect prediction approaches: a benchmark and an extensive comparison, D’Ambros et al. 2012 Conference: MSR 2015 http://2015.msrconf.org/ 5
  • 6. Using Fine-Grained Source Code Changes for Bug Prediction with Emanuel Giger, Harald Gall University of Zurich
  • 7. Bug prediction Goal Train models to predict the bug-prone source files of the next release How Using product measures, process measures, organizational measures with machine learning techniques 7
  • 8. Many existing studies A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Moser et al. 2008 Use of relative code churn measures to predict system defect density, Nagappan et al. 2005 Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, Zimmermann et al. 2009 Predicting faults using the complexity of code changes, Hassan et al. 2009 8
  • 9. Classical change measures Number of file revisions Code Churn aka lines added/deleted/changed ! Can we improve existing prediction models? 9
  • 10. Revisions are coarse grained What did change in a revision? 10
  • 11. Code Churn can be imprecise 11 Extra changes not relevant for locating bugs
  • 12. Fine Grained-Source Code Changes (SCC) Account.java 1.5 THEN MI IF "balance > 0" "withDraw(amount);" THEN MI Account.java 1.6 "balance > 0 && amount <= balance" IF "withDraw(amount);" ELSE MI notify(); 3 SCC: 1x condition change, 1x else-part insert, 1x invocation statement insert 12
  • 13. Categories of SCC cDecl = changes to class declarations oState = insertion and deletion of class attributes func = insertion and deletion of methods mDecl = changes to method declarations stmt = insertion or deletion of executable statements cond = changes to conditional expressions else = insertion and deletion of else-parts 13
  • 14. Research hypotheses 14 H1 SCC is correlated with the number of bugs in source files H2 SCC is a predictor for bug-prone source files (and outperforms Code Churn) H3 SCC is a predictor for the number of bugs in source files (and outperforms Code Churn)
  • 15. 15 Eclipse plug-ins Data >850’000 fine-grained source code changes (SCC) >10’000 files >9’700’000 lines modified (LM = Code Churn) >9 years of development history ..... and a lot of bugs referenced in commit messages (e.g., bug #345) 15
  • 16. Typical experimental set-up 1. Analyze quality and distribution of the data Use descriptive statistics, histograms, Q-Q plots -> Determines the statistical methods that you can use 2. Perform correlation analysis Spearman (non-parametric) 3. Machine learners/classifiers Simple ones first (binary logistic regression, linear regression, decision trees) 10-fold cross validation, precision, recall, AUC ROC 4. Interpretation and discussion of results (incl. threats to validity) 16
  • 17. (LM), (2) bug data, i.e., which files contained bugs and how many of them (Bugs), and (3) fine-grained source code changes (SCC). Approach overview 2. Bug Data 3. 1.Versioning Data Source Code Changes (SCC) 4. Experiment CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine AST Comparison 1.1 1.2 Figure 1: Stepwise overview of the data extraction process. 17
  • 18. Table 3: Relative frequencies of SCC categories per Eclipse project, Frequency plus of their change mean type and categories variance over all selected projects. Eclipse Project cDecl oState func mDecl stmt cond else Compare 0.01 0.06 0.08 0.05 0.74 0.03 0.03 jFace 0.02 0.04 0.08 0.11 0.70 0.02 0.03 JDT Debug 0.02 0.06 0.08 0.10 0.70 0.02 0.02 Resource 0.01 0.04 0.02 0.11 0.77 0.03 0.02 Runtime 0.01 0.05 0.07 0.10 0.73 0.03 0.01 Team Core 0.05 0.04 0.13 0.17 0.57 0.02 0.02 CVS Core 0.01 0.04 0.10 0.07 0.73 0.02 0.03 Debug Core 0.04 0.07 0.02 0.13 0.69 0.02 0.03 jFace Text 0.04 0.03 0.06 0.11 0.70 0.03 0.03 Update Core 0.02 0.04 0.07 0.09 0.74 0.02 0.02 Debug UI 0.02 0.06 0.09 0.07 0.70 0.03 0.03 JDT Debug UI 0.01 0.07 0.07 0.05 0.75 0.02 0.03 Help 0.02 0.05 0.08 0.07 0.73 0.02 0.03 JDT Core 0.00 0.03 0.03 0.05 0.80 0.05 0.04 OSGI 0.03 0.04 0.06 0.11 0.71 0.03 0.02 Mean 0.02 0.05 0.07 0.09 0.72 0.03 0.03 Variance 0.000 0.000 0.001 0.001 0.003 0.000 0.000 18
  • 19. Non parametric Spearman rank correlation of and H1: SCC SCC . is * correlated marks significant with #bugs correlations at Larger values are printed bold. Eclipse Project LM SCC Compare 0.68 0.76 jFace 0.74 0.71 JDT Debug 0.62 0.8 Resource 0.75 0.86 Runtime 0.66 0.79 Team Core 0.15 0.66 CVS Core 0.60 0.79 Debug Core 0.63 0.78 jFace Text 0.75 0.74 Update Core 0.43 0.62 Debug UI 0.56 0.81 JDT Debug UI 0.80 0.81 Help 0.54 0.48 JDT Core 0.70 0.74 OSGI 0.70 0.77 Median 0.66 0.77 Table 5: and categories = 0.01. Eclipse Project Compare jFace Resource Team Core CVS Core Debug Core Runtime JDT Debug jFace Text JDT Debug Update Core Debug UI Help OSGI JDT Core Mean *significant correlation at 0.01 19 Spearman correlation: +/-0.5 substantial +/-0.7 strong
  • 20. both provided by the Rapid Miner toolkit [24]. The classifiers calculate and assign a probability to a file if it is bug-prone or not Predicting bug-prone. bug-prone files For each Eclipse project we binned files into bug-prone and not bug-prone using the median of the number of bugs per file (#Bug-bugs): prone vs. not bug-prone ! bugClass = ! ⇢ not bug prone : #bugs = median bug prone : #bugs median When using the median as cut point the labeling of a file is relative ! to how much bugs other files have in a project. There exist several ways of binning files afore. They mainly vary in that 10-Fold they Cross result Validation in different prior probabilities: For instance Zimmerman Whole data set et split al. into [40] 10 sub-and samples Bernstein of equal et size al. [4] labeled files as bug-Train prone model if with they 9 sub-had samples at least one bug. When having heavily skewed Predict distributions bug-prone files of the this 10th approach sub-sample may lead to high a prior probability - 10 runs after, towards performance a one measures class. are Nagappan averaged et al. [28] used a statistical lower confidence bound. The different prior prob-abilities 20
  • 21. AUC values of E 1 using logistic regression with SCC H2: as SCC predictors can predict for bug-bug-prone prone and files a not bug-prone Larger values are printed in bold. Eclipse Project AUC LM AUC SCC Compare 0.84 0.85 jFace 0.90 0.90 JDT Debug 0.83 0.95 Resource 0.87 0.93 Runtime 0.83 0.91 Team Core 0.62 0.87 CVS Core 0.80 0.90 Debug Core 0.86 0.94 jFace Text 0.87 0.87 Update Core 0.78 0.85 Debug UI 0.85 0.93 JDT Debug UI 0.90 0.91 Help 0.75 0.70 JDT Core 0.86 0.87 OSGI 0.88 0.88 Median 0.85 0.90 Overall 0.85 0.89 21 Models trained with Logistic Regression SCC outperforms LM
  • 22. SCC categories to predict bug-prone files of E 2 using different learners and the number of each SCC category as predictors The largest AUC value of each row is printed in bold. Eclipse Project LogReg J48 RndFor B-Net eCHAID LibSVM N-Bayes NN Compare 0.82 0.77 0.77 0.83 0.74 0.81 0.82 0.82 jFace 0.90 0.85 0.88 0.89 0.83 0.91 0.87 0.88 JDT Debug 0.94 0.92 0.94 0.95 0.89 0.95 0.87 0.89 Resource 0.89 0.86 0.89 0.91 0.77 0.92 0.90 0.91 Runtime 0.89 0.82 0.83 0.87 0.80 0.87 0.86 0.87 Team Core 0.86 0.78 0.79 0.85 0.77 0.86 0.85 0.86 CVS Core 0.89 0.81 0.87 0.88 0.74 0.87 0.86 0.88 Debug Core 0.92 0.86 0.89 0.91 0.79 0.93 0.92 0.86 jFace Text 0.86 0.77 0.81 0.85 0.76 0.79 0.82 0.81 Update Core 0.82 0.87 0.90 0.86 0.86 0.89 0.89 0.90 Debug UI 0.92 0.88 0.91 0.92 0.82 0.92 0.89 0.91 JDT Debug UI 0.89 0.89 0.90 0.89 0.81 0.90 0.85 0.89 Help 0.69 0.65 0.67 0.69 0.63 0.69 0.69 0.68 JDT Core 0.85 0.86 0.88 0.90 0.80 0.88 0.85 0.87 OSGI 0.86 0.81 0.86 0.88 0.77 0.87 0.87 0.87 Median 0.89 0.85 0.88 0.88 0.79 0.88 0.86 0.87 Overall 0.88 0.87 0.84 0.89 0.82 0.89 0.85 0.84 projects even have the values of 0.85 models explain a large amount of the vari-ance dataset. A strong positive correlation 1.50 1.00 1.00 22 SCC categories do not improve performance
  • 23. Predicting the number of bugs Non linear regression with asymptotic model: ! 23 0 1000 2000 3000 4000 #SCC #Bugs 60 40 20 0 Eclipse Team Core f(#Bugs) = a1 + b2*eb3*SCC
  • 24. THa3bl:e S8C: CR ecsaunlts porfetdhiecnt otnhlein neaurmrebgerers osifo nbuingsterms of R2 and Spearman correlation using LM and SCC as predictors. Project R2 LM R2 SCC SpearmanLM SpearmanSCC Compare 0.84 0.88 0.68 0.76 jFace 0.74 0.79 0.74 0.71 JDT Debug 0.69 0.68 0.62 0.8 Resource 0.81 0.85 0.75 0.86 Runtime 0.69 0.72 0.66 0.79 Team Core 0.26 0.53 0.15 0.66 CVS Core 0.76 0.83 0.62 0.79 Debug Core 0.88 0.92 0.63 0.78 Jface Text 0.83 0.89 0.75 0.74 Update Core 0.41 0.48 0.43 0.62 Debug UI 0.7 0.79 0.56 0.81 JDT Debug UI 0.82 0.82 0.8 0.81 Help 0.66 0.67 0.54 0.84 JDT Core 0.69 0.77 0.7 0.74 OSGI 0.51 0.8 0.74 0.77 Median 0.7 0.79 0.66 0.77 Overall 0.65 0.72 0.62 0.74 nrm. Residuals 1.50 1.00 .50 .00 - . 5 0 -1.00 6,000.0 5,000.0 4,000.0 3,000.0 2,000.0 1,000.0 24 SCC outperforms LM
  • 25. Summary of results SCC performs significantly better than Code Churn (LM) Advanced learners are not always better Change types do not yield extra discriminatory power Predicting the number of bugs is “possible” Non linear regression model with a median R^2 = 0.79 More information in our paper “Comparing Fine-Grained Source Code Changes And Code Churn For Bug Prediction”, Giger et al. 2011 25
  • 26. Method-Level Bug Prediction with Emanuel Giger, Marco D’Ambros*, Harald Gall University of Zurich *University of Lugano
  • 27. Prediction granularity class 1 class 2 class 3 ... class n 11 methods on average 4 methods are bug prone (ca. 36%) Retrieving bug-prone methods saves manual inspection effort and testing effort 27 Large files are typically the most bug-prone files
  • 28. Research questions 28 RQ1 What is the performance of bug prediction on method level using change code metrics? RQ2 Which set of predictors provides the best performance? RQ3 How does the performance vary if the number of buggy methods decreases?
  • 29. 21 Java open source projects 29 Project #Classes #Methods #M-Histories #Bugs JDT Core 1.140 17.703 43.134 4.888 Jena2 897 8.340 7.764 704 Lucene 477 3.870 1.754 377 Xerces 693 8.189 6.866 1.017 Derby Engine 1.394 18.693 9.507 1.663 Ant Core 827 8.698 17.993 1.900
  • 30. Investigated metrics 30 Source code metrics (from the last release) fanIn, fanOut, localVar, parameters, commentToCodeRatio, countPath, McCabe Complexity, statements, maxNesting Change metrics methodHistories, authors, stmtAdded, maxStmtAdded, avgStmtAdded, stmtDeleted, maxStmtDeleted, avgStmtDeleted, churn, maxChurn, avgChurn, decl, cond, elseAdded, elseDeleted Bugs Count bug references in commit logs for changed methods
  • 31. 3.1 Predicting Experimental bug-prone methods Setup Prior to model building and classification we labeled method in our dataset either as bug-prone or not bug-prone Bug-prone vs. not bug-prone as follows: ! ! 31 bugClass = ! not bug − prone : #bugs = 0 bug − prone : #bugs = 1 These two classes represent the binary target classes training and validating the prediction models. Using 0 1) as cut-point is a common approach applied many studies covering bug prediction models, e.g., [30, 47, 4, 27, 37]. Other cut-points are applied in literature, for instance, a statistical lower confidence bound [33] or median [16]. Those varying cut-points as well as the diverse datasets result in different prior probabilities. For instance,
  • 32. RQ1 RQ2: Performance of prediction models Table 4: Median classification results over all pro-jects ! ! ! ! ! Models computed with change metrics (CM) perform best authors and methodHistories are the most important measures 32 per classifier and per model CM SCM CMSCM AUC P R AUC P R AUC P R RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95 SVM .96 .83 .86 .7 .48 .63 .95 .8 .96 BN .96 .82 .86 .73 .46 .73 .96 .81 .96 J48 .95 .84 .82 .69 .56 .58 .91 .83 .89 values of the code metrics model are approximately 0.7 for each classifier—what is defined by Lessman et al. as ”promis-ing” [26]. However, the source code metrics suffer from con-siderably low precision values. The highest median precision
  • 33. how the classification performance varies (RQ3) as the Predicting of samples bug-in prone the target methods class with shrinks, diff. cut-and points whether observe similar findings as in Section 3.2 regarding the of the change and code metrics (RQ2). For that, applied Bug-prone three vs. not additional bug-prone cut-point values as follows: ! ! p = 75%, 90%, 95% percentiles of #bugs in methods per project - predict the top 25%, 10%, and 5% bug-prone methods 33 bugClass = ! not bug − prone : #bugs = p bug − prone : #bugs p where p represents either the value of the 75%, 90%, or percentile of the distribution of the number of bugs in per project. For example, using the 95% percentile cut-point for prior binning would mean to predict the percent”methods in terms of the number of bugs. To conduct this study we applied the same experimental setup as in Section 3.1, except for the differently chosen
  • 34. RQ3: Decreasing the number of bug-prone methods Table 5: Median classification results for RndFor over all projects per cut-point and per model ! ! ! ! ! Models trained with Random Forest (RndFor) Change metrics (CM) perform best Precision decreases (as expected) 34 CM SCM CMSCM AUC P R AUC P R AUC P R GT0 .95 .84 .88 .72 .50 .64 .95 .85 .95 75% .97 .72 .95 .75 .39 .63 .97 .74 .95 90% .97 .58 .94 .77 .20 .69 .98 .64 .94 95% .97 .62 .92 .79 .13 .72 .98 .68 .92 sion in the case of the 95% percentile (median precision of 0.13). Looking at the change metrics and the combined model the median precision is significantly higher for the GT0 and the 75% percentiles compared to the 90% and the
  • 35. Application: file level vs. method level prediction JDT Core 3.0 - LocalDeclaration.class Contains 6 methods / 1 affected by post release bugs LocalDeclaration.resolve(...) was predicted bug-prone with p=0.97 File-level: p=0.17 to guess the bug-prone method Need to manually rule out 5 methods to reach 0.82 precision 1 / (6-5) JDT Core 3.0 - Main.class Contains 26 methods / 11 affected by post release bugs Main.configure(...) was predicted bug-prone with p=1.0 File-level: p=0.42 to guess a bug-prone method Need to rule out 13 methods to reach 0.82 precision 11 / (26-13) 35
  • 36. What can we learn from that? Large files are more likely to change and have bugs Test large files more thoroughly - YES Bugs are fixed through changes that again lead to bugs Stop changing our systems - NO, of course not! Test changing entities more thoroughly - YES Are we not already doing that? Do we really need (complex) prediction models for that? Not sure - might be the reason why these models are not really used, yet Microsoft started to add prediction models to their quality assurance tools But, use at least metric tools and keep track of your code quality - E.g., continuous integration environments, SonarQube 36
  • 37. What is next? Ease understanding of changes Analysis of the effect(s) of changes What is the effect on the design? What is the effect on the quality? Recommender techniques Provide advice on the effects of changes 37
  • 38. Facilitating understanding of changes 38 Changes FineMem Alex fixed Bug 14: Changed if condition in method send() in module BandPass. Alex Peter
  • 39. Research: understanding changes 39 Change extractor FineMem ! Changes Detailed changes Change summaries Change visualization Subscription to detailed changes Alex
  • 40. Conclusions 40 0 1000 2000 3000 4000 #SCC #Bugs 60 40 20 0 Eclipse Team Core Questions? ! Martin Pinzger martin.pinzger@aau.at we take into account three main pieces of information about the history of a software system to assemble the dataset for our experiments: (1) versioning data including lines modi-fied (LM), (2) bug data, i.e., which files contained bugs and how many of them (Bugs), and (3) fine-grained source code changes (SCC). 2. Bug Data 1.Versioning Data 3. Source Code Changes (SCC) 4. Experiment CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine AST Comparison 1.1 1.2 Figure 1: Stepwise overview of the data extraction process. 1. Versioning Data. We use EVOLIZER [14] to access the ver-sioning repositories , e.g., CVS, SVN, or GIT. They provide log entries that contain information about revisions of files that belong to a system. From the log entries we extract the revision number (to identify the revisions of a file in correct temporal order), the revision timestamp, the name of the de-veloper who checked-in the new revision, and the commit message. We then compute LM for a source file as the sum of lines added, lines deleted, and lines changed per file revision. 2. Bug Data. Bug reports are stored in bug repositories such as Bugzilla. Traditional bug tracking and versioning repos-itories Update Core 595 8’496 251’434 36’151 532 Oct01-Debug UI 1’954 18’862 444’061 81’836 3’120 May01-JDT Debug UI 775 8’663 168’598 45’645 2’002 Nov01-Help 598 3’658 66’743 12’170 243 May01-JDT Core 1’705 63’038 2’814K 451’483 6’033 Jun01-OSGI 748 9’866 335’253 56’238 1’411 Nov03-single source code statements, e.g., method invocation between two versions of a program by comparing their respective abstract syntax trees (AST). Each change represents a tree edit operation that is required to transform one version of the AST into the other. The algorithm is in CHANGEDISTILLER [14] that pairwise compares the ASTs between all direct subsequent revisions of each Based on this information, we then count the number source code changes (SCC) per file revision. The preprocessed data from step 1-3 is stored into History Database (RHDB) [10]. From that data, compute LM, SCC, and Bugs for each source file by aggregat-ing the values over the given observation period. 3. EMPIRICAL STUDY In this section, we present the empirical study that to investigate the hypotheses stated in Section discuss the dataset, the statistical methods and machine algorithms we used, and report on the results and of the experiments. 3.1 Dataset and Data Preparation We performed our experiments on 15 plugins of the platform. Eclipse is a popular open source system been studied extensively before [4, 27, 38, 39]. Table 1 gives an overview of the Eclipse dataset this study with the number of unique *.java files (Files), FineMem Change extractor ! Changes Detailed changes Change summaries Change visualization Subscription to detailed changes Alex