Keynote HotSWUp 2012

Will my system run (correctly)
after the upgrade?

Martin Pinzger
Assistant Professor
Delft University of Technology

Martin’s upgrades

Assistant
Professor

PhD

Postdoc

Pfunds 2

My Experience with Software Upgrades

3

Bugs on upgrades get reported

6

Hmm, wait a minute

Can’t we learn “something” from that data?

7

Software repository mining for
preventing upgrade failures

Martin Pinzger
Assistant Professor
Delft University of Technology

Goal of software repository mining

Making the information stored in software repositories
available to software developers
Quality analysis and defect prediction
Recommender systems
...

9

Software repositories

10

Examples from my mining research

Predicting failure-prone source ﬁles using changes (MSR 2011)

The relationship between developer contributions and failures
(FSE 2008)

There are many more studies
MSR 2012 http://2012.msrconf.org/
A survey and taxonomy of approaches for mining software repositories in
the context of software evolution, Kagdi et al. 2007

11

Using Fine-Grained Source
Code Changes for Bug
Prediction

Joint work with Emanuel Giger, Harald Gall
University of Zurich

Bug prediction

Goal
Train models to predict the bug-prone source ﬁles of the next release

How
Using product measures, process measures, organizational measures with
machine learning techniques

Many existing studies on building prediction models
Moser et al., Nagappan et al., Zimmermann et al., Hassan et al., etc.
Process measures performed particularly well

13

Classical change measures

Number of ﬁle revisions

Code Churn aka lines added/deleted/changed

Research question of this study: Can we further improve these
models?

14

Revisions are coarse grained

What did change in a revision?

15

Code Churn can be imprecise

Extra changes not relevant for locating bugs

16

Fine Grained-Source Code Changes (SCC)

Account.java 1.5 Account.java 1.6
"balance > 0 && amount <= balance"
IF "balance > 0"
IF

THEN THEN ELSE

MI MI MI
notify();
"withDraw(amount);" "withDraw(amount);"

3 SCC: 1x condition change, 1x else-part insert, 1x invocation
statement insert 17

Research hypotheses

H1 SCC is correlated with the number of
bugs in source files

H2 SCC is a predictor for bug-prone source
files (and outperforms LM)

H3 SCC is a predictor for the number of bugs
in source files (and outperforms LM)

18

15 Eclipse plug-ins

Data
>850’000 fine-grained source code changes (SCC)
>10’000 files
>9’700’000 lines modified (LM)
>9 years of development history
..... and a lot of bugs referenced in commit messages

19

on parametric Spearman rank correlation of Table 5: N
nd SCC . * is correlated with #bugsat
H1: SCC marks signiﬁcant correlations and cate
rger values are printed bold. = 0.01
Eclipse Project LM SCC Eclipse Pr
Compare 0.68 0.76 Compare
jFace 0.74 0.71 jFace
JDT Debug 0.62 0.8 Resource
Resource 0.75 0.86 Team Cor
Runtime 0.66 0.79 CVS Core
Team Core 0.15 0.66 Debug Co
CVS Core 0.60 0.79 Runtime
Debug Core 0.63 0.78 JDT Debu
jFace Text 0.75 0.74 jFace Text
Update Core 0.43 0.62 JDT Debu
Debug UI 0.56 0.81 Update C
+/-0.5 substantial Debug UI
JDT Debug UI 0.80 0.81
Help 0.54 0.48 +/-0.7 strong Help
JDT Core 0.70 0.74 OSGI
OSGI 0.70 0.77 *signiﬁcant JDT Core
Median 0.66 0.77 correlation at 0.01Mean
20

calculate and assign a probability to a file if it is bug-prone or
not bug-prone. bug-prone files
Predicting
For each Eclipse project we binned files into bug-prone and
not bug-prone using the median of the number of bugs per file
Bug-prone vs. not bug-prone
(#bugs):
⇢
not bug prone : #bugs <= median
bugClass =
bug prone : #bugs > median

When using the median as cut point the labeling of a file is
relative to how much bugs other files have in a project. There
exist several ways of binning files afore. They mainly vary in
that they result in different prior probabilities: For instance
Zimmerman et al. [40] and Bernstein et al. [4] labeled files as
bug-prone if they had at least one bug. When having heavily
skewed distributions this approach may lead to high a prior
probability towards a one class. Nagappan et al. [28] used a 21

UC values of E 1 using logistic regression with
CC as predictors for bug-prone and a notﬁles
H2: SCC can predict bug-prone bug-
Larger values are printed in bold.
Eclipse Project AUC LM AUC SCC
Compare 0.84 0.85
jFace 0.90 0.90
JDT Debug 0.83 0.95
Resource 0.87 0.93
Runtime 0.83 0.91
Team Core 0.62 0.87
CVS Core 0.80 0.90
Debug Core 0.86 0.94 SCC outperforms LM
jFace Text 0.87 0.87
Update Core 0.78 0.85
Debug UI 0.85 0.93
JDT Debug UI 0.90 0.91
Help 0.75 0.70
JDT Core 0.86 0.87
OSGI 0.88 0.88
Median 0.85 0.90
Overall 0.85 0.89 22

Predicting the number of bugs

Non linear regression with asymptotic model:
Team Core

60
#Bugs

40

20

f(#Bugs) = a1 + b2*eb3*SCC
0
0 1000 2000 3000 4000
23
#SCC

1.50

Table 8: Results of predict the number of of R
H3: SCC can the nonlinear regression in terms bugs
2

and Spearman correlation using LM and SCC as predictors. 1.00

nrm. Residuals
Project R2 LM R2 SCC SpearmanLM SpearmanSCC .50

Compare 0.84 0.88 0.68 0.76
jFace 0.74 0.79 0.74 0.71 .00

JDT Debug 0.69 0.68 0.62 0.8
Resource 0.81 0.85 0.75 0.86 -.50

Runtime 0.69 0.72 0.66 0.79
Team Core 0.26 0.53 0.15 0.66 -1.00

CVS Core 0.76 0.83 0.62 0.79
Debug Core 0.88 0.92 0.63 0.78
Jface Text 0.83 0.89 0.75 0.74 6,000.0

Update Core 0.41 0.48 0.43 0.62
Debug UI 0.7 0.79 0.56 0.81 5,000.0

JDT Debug UI 0.82 0.82 0.8 0.81
Help 0.66 0.67 0.54 0.84 4,000.0
JDT Core 0.69 0.77 0.7 0.74
OSGI 0.51 0.8 0.74 0.77
3,000.0
Median 0.7 0.79 0.66 0.77
Overall 0.65 0.72 0.62 0.74
2,000.0

SCC outperforms LM
1,000.0 4
2

Summary of results

SCC performs signiﬁcantly better than LM
Advanced learners are not always better
Change types do not yield extra discriminatory power

Predicting the number of bugs is “possible”

More information
“Comparing Fine-Grained Source Code Changes And Code Churn For Bug
Prediction”, MSR 2011

25

What is next?

Analysis of the effect(s) of changes
What is the effect on the design?
What is the effect on the quality?

Ease understanding of changes

Recommender techniques
Models that can provide feedback on the effects

26

Can developer-module
networks predict failures?

Joint work with Nachi Nagappan, Brendan Murphy
Microsoft Research

Research question

Are binaries with fragmented contributions from many
developers more likely to have post-release failures?
Should developers focus on one thing?

29

Study with MS Vista project

Data
Released in January, 2007
> 4 years of development
Several thousand developers
Several thousand binaries (*.exe, *.dll)
Several millions of commits

30

Approach in a nutshell

Fu Alice
6 6
Change Eric b Bob a Go
5 2 4 2
Logs 4 5 7

Dan Hin c
4

Binary #bugs #centrality
a 12 0.9
Bugs b 7 0.5
c 3 0.2

Regression Analysis
Validation with data splitting
31

Contribution network

Windows binary (*.dll)
Developer

Which binary is failure-prone?
32

Measuring fragmentation

Freeman degree

Closeness Bonacich’s power
33

Research hypotheses

Binaries with fragmented contributions
H1
are failure-prone

Fragmentation correlates positively with
H2
the number of post-release failures

Advanced fragmentation measures
H3
improve failure estimation

34

Correlation analysis

Spearman rank correlation

nrCommits nrAuthors Power dPower Closeness Reach Betweenness

Failures 0,700 0,699 0,692 0,740 0,747 0,746 0,503
nrCommits 0,704 0,996 0,773 0,748 0,732 0,466
nrAuthors 0,683 0,981 0,914 0,944 0,830
Power 0,756 0,732 0,714 0,439
dPower 0,943 0,964 0,772
Closeness 0,990 0,738
Reach 0,773

All correlations are signiﬁcant at the 0.01 level (2-tailed)

35

H1: Predicting failure-prone binaries

Binary logistic regression of 50 random splits
4 principal components from 7 centrality measures

Precision Recall AUC
1.00 1.00 1.00

0.90 0.90 0.90

0.80 0.80 0.80

0.70 0.70 0.70

0.60 0.60 0.60

0.50 0.50 0.50
0 20 40 0 20 40 0 20 40

36

H2: Predicting the number of failures

Linear regression of 50 random splits
#Failures = b0 + b1*nCloseness + b2*nrAuthors + b3*nrCommits

R-Square Pearson Spearman
1.00 1.00 1.00

0.90 0.90 0.90

0.80 0.80 0.80

0.70 0.70 0.70

0.60 0.60 0.60

0.50 0.50 0.50
0 20 40 0 20 40 0 20 40

All correlations are signiﬁcant at the 0.01 level (2-tailed)
37

H3: Basic vs. advanced measures
Model with nrAuthors, Model with nCloseness,
nrCommits nrAuthors, nrCommits
1.00 1.00

R-Square
0.90 0.90
0.80 0.80
0.70 0.70
0.60 0.60
0.50 0.50
0.40 0.40
0.30 0.30
0 20 40 0 20 40

Spearman
1.00 1.00
0.90 0.90
0.80 0.80
0.70 0.70
0.60 0.60
0.50 0.50
0.40 0.40
0.30 0.30
0 20 40 0 20 40
38

Summary of results

Centrality measures can predict more than 83% of failure-
pone Vista binaries

Closeness, nrAuthors, and nrCommits can predict the number
of post-release failures

Closeness or Reach can improve prediction of the number of
post-release failures by 32%

More information
Can Developer-Module Networks Predict Failures?, FSE 2008

39

What can we learn from that?

6 6

5 2 4 2
4 5 7

4

Increase testing effort for central binaries? - yes

Re-factor central binaries? - maybe

Re-organize contributions? - maybe
40

What is next?

Analysis of the contributions of a developer
Who is working on which parts of the system?
What exactly is the contribution of a developer?
Who is introducing bugs/smells and how can we avoid it?

Global distributed software engineering
What are the contributions of teams, smells and how to avoid it?
Can we empirically prove Conway’s Law?

Expert recommendation
Whom to ask for advice on a piece of code?

41

Ideas for software upgrade research

1. Mining software repositories to identify the upgrade-critical
components
What are the characteristics of such components?
Product and process measures
What are the characteristics of the target environments?
Hardware, operating system, conﬁguration
Train a model with these characteristics and reported bugs

42

Further ideas for research

Who is upgrading which applications when?
Study upgrade behavior of users?

What is the environment of the users when they upgrade?
Where did it work, where did it fail?
Collect crash reports for software upgrades?

Upgrades in distributed applications?
Finding the optimal time when to upgrade which component?

43

Conclusions
Team Core

60

6 6
#Bugs

40

5 2 4 2
4 5 7
20

4
0
0 1000 2000 3000 4000

#SCC

Questions?
Martin Pinzger
m.pinzger@tudelft.nl
44

Keynote HotSWUp 2012

Recommended

Recommended

More Related Content

Similar to Keynote HotSWUp 2012

Similar to Keynote HotSWUp 2012 (20)

Recently uploaded

Recently uploaded (20)

Keynote HotSWUp 2012