7. Hmm, wait a minute
Can’t we learn “something” from that data?
7
8. Software repository mining for
preventing upgrade failures
Martin Pinzger
Assistant Professor
Delft University of Technology
9. Goal of software repository mining
Making the information stored in software repositories
available to software developers
Quality analysis and defect prediction
Recommender systems
...
9
11. Examples from my mining research
Predicting failure-prone source files using changes (MSR 2011)
The relationship between developer contributions and failures
(FSE 2008)
There are many more studies
MSR 2012 http://2012.msrconf.org/
A survey and taxonomy of approaches for mining software repositories in
the context of software evolution, Kagdi et al. 2007
11
12. Using Fine-Grained Source
Code Changes for Bug
Prediction
Joint work with Emanuel Giger, Harald Gall
University of Zurich
13. Bug prediction
Goal
Train models to predict the bug-prone source files of the next release
How
Using product measures, process measures, organizational measures with
machine learning techniques
Many existing studies on building prediction models
Moser et al., Nagappan et al., Zimmermann et al., Hassan et al., etc.
Process measures performed particularly well
13
14. Classical change measures
Number of file revisions
Code Churn aka lines added/deleted/changed
Research question of this study: Can we further improve these
models?
14
16. Code Churn can be imprecise
Extra changes not relevant for locating bugs
16
17. Fine Grained-Source Code Changes (SCC)
Account.java 1.5 Account.java 1.6
"balance > 0 && amount <= balance"
IF "balance > 0"
IF
THEN THEN ELSE
MI MI MI
notify();
"withDraw(amount);" "withDraw(amount);"
3 SCC: 1x condition change, 1x else-part insert, 1x invocation
statement insert 17
18. Research hypotheses
H1 SCC is correlated with the number of
bugs in source files
H2 SCC is a predictor for bug-prone source
files (and outperforms LM)
H3 SCC is a predictor for the number of bugs
in source files (and outperforms LM)
18
19. 15 Eclipse plug-ins
Data
>850’000 fine-grained source code changes (SCC)
>10’000 files
>9’700’000 lines modified (LM)
>9 years of development history
..... and a lot of bugs referenced in commit messages
19
20. on parametric Spearman rank correlation of Table 5: N
nd SCC . * is correlated with #bugsat
H1: SCC marks significant correlations and cate
rger values are printed bold. = 0.01
Eclipse Project LM SCC Eclipse Pr
Compare 0.68 0.76 Compare
jFace 0.74 0.71 jFace
JDT Debug 0.62 0.8 Resource
Resource 0.75 0.86 Team Cor
Runtime 0.66 0.79 CVS Core
Team Core 0.15 0.66 Debug Co
CVS Core 0.60 0.79 Runtime
Debug Core 0.63 0.78 JDT Debu
jFace Text 0.75 0.74 jFace Text
Update Core 0.43 0.62 JDT Debu
Debug UI 0.56 0.81 Update C
+/-0.5 substantial Debug UI
JDT Debug UI 0.80 0.81
Help 0.54 0.48 +/-0.7 strong Help
JDT Core 0.70 0.74 OSGI
OSGI 0.70 0.77 *significant JDT Core
Median 0.66 0.77 correlation at 0.01Mean
20
21. calculate and assign a probability to a file if it is bug-prone or
not bug-prone. bug-prone files
Predicting
For each Eclipse project we binned files into bug-prone and
not bug-prone using the median of the number of bugs per file
Bug-prone vs. not bug-prone
(#bugs):
⇢
not bug prone : #bugs <= median
bugClass =
bug prone : #bugs > median
When using the median as cut point the labeling of a file is
relative to how much bugs other files have in a project. There
exist several ways of binning files afore. They mainly vary in
that they result in different prior probabilities: For instance
Zimmerman et al. [40] and Bernstein et al. [4] labeled files as
bug-prone if they had at least one bug. When having heavily
skewed distributions this approach may lead to high a prior
probability towards a one class. Nagappan et al. [28] used a 21
22. UC values of E 1 using logistic regression with
CC as predictors for bug-prone and a notfiles
H2: SCC can predict bug-prone bug-
Larger values are printed in bold.
Eclipse Project AUC LM AUC SCC
Compare 0.84 0.85
jFace 0.90 0.90
JDT Debug 0.83 0.95
Resource 0.87 0.93
Runtime 0.83 0.91
Team Core 0.62 0.87
CVS Core 0.80 0.90
Debug Core 0.86 0.94 SCC outperforms LM
jFace Text 0.87 0.87
Update Core 0.78 0.85
Debug UI 0.85 0.93
JDT Debug UI 0.90 0.91
Help 0.75 0.70
JDT Core 0.86 0.87
OSGI 0.88 0.88
Median 0.85 0.90
Overall 0.85 0.89 22
23. Predicting the number of bugs
Non linear regression with asymptotic model:
Team Core
60
#Bugs
40
20
f(#Bugs) = a1 + b2*eb3*SCC
0
0 1000 2000 3000 4000
23
#SCC
24. 1.50
Table 8: Results of predict the number of of R
H3: SCC can the nonlinear regression in terms bugs
2
and Spearman correlation using LM and SCC as predictors. 1.00
nrm. Residuals
Project R2 LM R2 SCC SpearmanLM SpearmanSCC .50
Compare 0.84 0.88 0.68 0.76
jFace 0.74 0.79 0.74 0.71 .00
JDT Debug 0.69 0.68 0.62 0.8
Resource 0.81 0.85 0.75 0.86 -.50
Runtime 0.69 0.72 0.66 0.79
Team Core 0.26 0.53 0.15 0.66 -1.00
CVS Core 0.76 0.83 0.62 0.79
Debug Core 0.88 0.92 0.63 0.78
Jface Text 0.83 0.89 0.75 0.74 6,000.0
Update Core 0.41 0.48 0.43 0.62
Debug UI 0.7 0.79 0.56 0.81 5,000.0
JDT Debug UI 0.82 0.82 0.8 0.81
Help 0.66 0.67 0.54 0.84 4,000.0
JDT Core 0.69 0.77 0.7 0.74
OSGI 0.51 0.8 0.74 0.77
3,000.0
Median 0.7 0.79 0.66 0.77
Overall 0.65 0.72 0.62 0.74
2,000.0
SCC outperforms LM
1,000.0 4
2
25. Summary of results
SCC performs significantly better than LM
Advanced learners are not always better
Change types do not yield extra discriminatory power
Predicting the number of bugs is “possible”
More information
“Comparing Fine-Grained Source Code Changes And Code Churn For Bug
Prediction”, MSR 2011
25
26. What is next?
Analysis of the effect(s) of changes
What is the effect on the design?
What is the effect on the quality?
Ease understanding of changes
Recommender techniques
Models that can provide feedback on the effects
26
29. Research question
Are binaries with fragmented contributions from many
developers more likely to have post-release failures?
Should developers focus on one thing?
29
30. Study with MS Vista project
Data
Released in January, 2007
> 4 years of development
Several thousand developers
Several thousand binaries (*.exe, *.dll)
Several millions of commits
30
31. Approach in a nutshell
Fu Alice
6 6
Change Eric b Bob a Go
5 2 4 2
Logs 4 5 7
Dan Hin c
4
Binary #bugs #centrality
a 12 0.9
Bugs b 7 0.5
c 3 0.2
Regression Analysis
Validation with data splitting
31
32. Contribution network
Windows binary (*.dll)
Developer
Which binary is failure-prone?
32
34. Research hypotheses
Binaries with fragmented contributions
H1
are failure-prone
Fragmentation correlates positively with
H2
the number of post-release failures
Advanced fragmentation measures
H3
improve failure estimation
34
35. Correlation analysis
Spearman rank correlation
nrCommits nrAuthors Power dPower Closeness Reach Betweenness
Failures 0,700 0,699 0,692 0,740 0,747 0,746 0,503
nrCommits 0,704 0,996 0,773 0,748 0,732 0,466
nrAuthors 0,683 0,981 0,914 0,944 0,830
Power 0,756 0,732 0,714 0,439
dPower 0,943 0,964 0,772
Closeness 0,990 0,738
Reach 0,773
All correlations are significant at the 0.01 level (2-tailed)
35
39. Summary of results
Centrality measures can predict more than 83% of failure-
pone Vista binaries
Closeness, nrAuthors, and nrCommits can predict the number
of post-release failures
Closeness or Reach can improve prediction of the number of
post-release failures by 32%
More information
Can Developer-Module Networks Predict Failures?, FSE 2008
39
40. What can we learn from that?
6 6
5 2 4 2
4 5 7
4
Increase testing effort for central binaries? - yes
Re-factor central binaries? - maybe
Re-organize contributions? - maybe
40
41. What is next?
Analysis of the contributions of a developer
Who is working on which parts of the system?
What exactly is the contribution of a developer?
Who is introducing bugs/smells and how can we avoid it?
Global distributed software engineering
What are the contributions of teams, smells and how to avoid it?
Can we empirically prove Conway’s Law?
Expert recommendation
Whom to ask for advice on a piece of code?
41
42. Ideas for software upgrade research
1. Mining software repositories to identify the upgrade-critical
components
What are the characteristics of such components?
Product and process measures
What are the characteristics of the target environments?
Hardware, operating system, configuration
Train a model with these characteristics and reported bugs
42
43. Further ideas for research
Who is upgrading which applications when?
Study upgrade behavior of users?
What is the environment of the users when they upgrade?
Where did it work, where did it fail?
Collect crash reports for software upgrades?
Upgrades in distributed applications?
Finding the optimal time when to upgrade which component?
43