10. Programmer Interaction
and Software Quality
“Errors are from cognitive breakdown
while understanding and implementing
requirements”
- Ko et al. 2005
11. Programmer Interaction
and Software Quality
“Errors are from cognitive breakdown
while understanding and implementing
requirements”
- Ko et al. 2005
“Work interruptions or task switching
may affect programmer productivity”
- DeLine et al. 2006
12. Don’t we need to also consider
developers’ interactions
as defect indicators?
13. …, but the existing indicators
can NOT directly capture
developers’ interactions
14. Using Mylyn data, we propose novel
“Micro Interaction Metrics (MIMs)”
capturing developers’ interactions
15. The Mylyn* data is stored
as an attachment to the
corresponding bug reports in
the XML format
* Eclipse plug-in storing and recovering task contexts
21. Two levels of MIMs Design
File-level MIMs
specific interactions for a
file in a task
(e.g., AvgTimeIntervalEditEdit)
22. Two levels of MIMs Design
File-level MIMs
specific interactions for a
file in a task
(e.g., AvgTimeIntervalEditEdit)
Task-level MIMs
property values shared
over the whole task
(e.g., TimeSpent)
23. Two levels of MIMs Design
File-level MIMs Mylyn Task Logs
specific interactions for a
file in a task 10:30 Selection file A
(e.g., AvgTimeIntervalEditEdit)
11:00 Edit file B
12:30 Edit file B
24. Two levels of MIMs Design
Mylyn Task Logs
10:30 Selection file A
Task-level MIMs 11:00 Edit file B
property values shared 12:30 Edit file B
over the whole task
(e.g., TimeSpent)
29. For example,
NumPatternSXEY is to capture
this interaction:
“How much times did a programmer
Select a file of group X
and then Edit a file of group Y
in a task activity?”
30. X if a file shows defect
locality* properties
group X or Y
Y otherwise
H if a file has
group H or L high** DOI value
L otherwise
* hinted by the paper [Kim et al. 2007]
** threshold: median of degree of interest (DOI) values in a task
51. Understand JAVA tool was used
for extracting 32 source Code
Metrics (CMs)*
* Chidamber and Kemerer, and OO metrics
52. Understand JAVA tool was used
for extracting 32 source Code
Metrics (CMs)*
List of selected source code metrics
CVS
last revision
…
Dec 2005 Time P Sep 2010
* Chidamber and Kemerer, and OO metrics
53. Fifteen History Metrics (HMs)* were
collected from the corresponding
CVS repository
* Moser et al.
54. Fifteen History Metrics (HMs)* were
collected from the corresponding
CVS repository
List of history metrics (HMs)
CVS revisions
…
Dec 2005 Time P Sep 2010
* Moser et al.
55. STEP3: Creating a training corpus
Instance
Extracted MIMs Label
Name
Training
… Classifier
Instance # of post
Extracted MIMs
Name defects
Training
… Regression
56. STEP4: Building prediction models
Classification and Regression
modeling with different machine
learning algorithms using the
WEKA* tool
* an open source data mining tool
58. STEP5: Prediction Evaluation
How many instances
are really buggy among
the buggy-predicted
outcomes?
Classification
Measures
59. STEP5: Prediction Evaluation
How many instances
are really buggy among
the buggy-predicted
outcomes?
How many instances
are correctly predicted as
‘buggy’ among the real
buggy ones
Classification
Measures
61. STEP5: Prediction Evaluation
Regression
between # of real buggy
instances and # of instances Measures
predicted as buggy
correlation coefficient (-1~1)
mean absolute error (0~1)
root square error (0~1)
62. T-test with 100 times of
10-fold cross validation
Reject H0* and accept H1*
if p-value < 0.05
(at the 95% confidence level)
* H0: no difference in average performance, H1: different (better!)
63. Result Summary
MIMs improve prediction accuracy for
1 different Eclipse project subjects
2 different machine learning algorithms
3 different model training periods
65. Prediction for different project subjects
MIM: the proposed metrics CM: source code metrics HM: history metrics
66. Prediction for different project subjects
BASELINE: Dummy Classifier
predicts in a purely random manner
e.g., for 12.5% of buggy instances
Precision(B)=12.5%, Recall(B)=50%
F-measure(B)=20%
MIM: the proposed metrics CM: source code metrics HM: history metrics
67. Prediction for different project subjects
MIM: the proposed metrics CM: source code metrics HM: history metrics
68. Prediction for different project subjects
T-test results (significant figures are in bold, p-value < 0.05)
78. Possible Insight
TOP 1: NumLowDOIEdit
TOP 2: NumPatternEXSX
TOP 3: TimeSpentOnEdit
Chances are that more defects might be generated
if a programmer TOP2 repeatedly edit and browse a
file especially related to the previous defects TOP3
with putting more weight on editing time, and
especially TOP1 when editing such the files less
frequently or less recently accessed ever …
84. We believe future defect prediction
models will use more developers’ direct and
micro level interaction information
MIMs are a first step towards it
85. Thank you! Any Question?
• Problem
– Developer’s interaction information can affect
software quality (defects)?
• Approach
– We proposed novel micro interaction metrics
(MIMs) overcoming the popular static metrics
• Result
– MIMs significantly improve prediction accuracy
compared to source code metrics (CMs) and
history metrics (HMs)
89. Error chance in counting post-defects
as a result getting biased labels
(i.e., incorrect % of buggy instances)
90. Repeated experiment using
same instances but with a
different heuristics of defect
counting, CVS-log-based
approach*
* with keywords: “fix”, “bug”, “bug report ID” in change logs
94. CVS-log-based approach reported more
additional post-defects
(more % of buggy-labeled instances)
MIMs failed to feature them due to
lack of the corresponding Mylyn data
95. Note that it is difficult to
100% guarantee the quality of
CVS change logs
(e.g., no explicit bug ID, missing logs)