SlideShare ist ein Scribd-Unternehmen logo
1 von 95
Downloaden Sie, um offline zu lesen
Micro Interaction Metrics
                for Defect Prediction



Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In
                FSE 2011, Hungary, Sep. 5-9
Outline

• Research motivation
• The existing metrics
• The proposed metrics
• Experiment results
• Threats to validity
• Conclusion
Defect Prediction?
 why is it necessary?
Software quality assurance
 is inherently a resource
   constrained activity!
Predicting defect-prone
     software entities* is
to put the best labor effort
        on the entities


                  * functions or code files
Indicators of defects

• Complexity of source codes        (Chidamber and Kemerer 1994)



• Frequent code changes     (Moser et al. 2008)



• Previous defect information       (Kim et al. 2007)



• Code dependencies   (Zimmermann 2007)
Indeed,
where do defects
   come from?
Humans Error!
Programmers make mistakes,
  consequently defects are
 injected, and software fails

    Human      Bugs     Software
    Errors   Injected     fails
Programmer Interaction
 and Software Quality
Programmer Interaction
    and Software Quality

“Errors are from cognitive breakdown
while understanding and implementing
            requirements”
                        - Ko et al. 2005
Programmer Interaction
    and Software Quality

“Errors are from cognitive breakdown
while understanding and implementing
            requirements”
                          - Ko et al. 2005

“Work interruptions or task switching
may affect programmer productivity”
                     - DeLine et al. 2006
Don’t we need to also consider
   developers’ interactions
          as defect indicators?
…, but the existing indicators
can NOT directly capture
  developers’ interactions
Using Mylyn data, we propose novel
“Micro Interaction Metrics (MIMs)”
    capturing developers’ interactions
The Mylyn* data is stored
  as an attachment to the
corresponding bug reports in
       the XML format


      * Eclipse plug-in storing and recovering task contexts
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
        … StructureHandle=“ ” … Interest=“ ” … >
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
        … StructureHandle=“ ” … Interest=“ ” … >
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
        … StructureHandle=“ ” … Interest=“ ” … >
Two levels of MIMs Design
Two levels of MIMs Design

File-level MIMs
specific interactions for a
file in a task
(e.g., AvgTimeIntervalEditEdit)
Two levels of MIMs Design

File-level MIMs
specific interactions for a
file in a task
(e.g., AvgTimeIntervalEditEdit)

Task-level MIMs
property values shared
over the whole task
(e.g., TimeSpent)
Two levels of MIMs Design

File-level MIMs                   Mylyn Task Logs
specific interactions for a
file in a task                    10:30   Selection   file A
(e.g., AvgTimeIntervalEditEdit)

                                  11:00     Edit      file B


                                  12:30     Edit      file B
Two levels of MIMs Design

                         Mylyn Task Logs


                         10:30   Selection   file A




Task-level MIMs          11:00     Edit      file B


property values shared   12:30     Edit      file B

over the whole task
(e.g., TimeSpent)
The Proposed Micro Interaction Metrics
The Proposed Micro Interaction Metrics
The Proposed Micro Interaction Metrics
For example,
NumPatternSXEY is to capture
     this interaction:
For example,
 NumPatternSXEY is to capture
      this interaction:

“How much times did a programmer
     Select a file of group X
  and then Edit a file of group Y
        in a task activity?”
X if a file shows defect
                           locality* properties
group X or Y
                           Y otherwise


                           H if a file has
group H or L               high** DOI value
                           L otherwise


                                 * hinted by the paper [Kim et al. 2007]
        ** threshold: median of degree of interest (DOI) values in a task
Bug Prediction Process
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3         Task i        Task i+1      Task i+2   Task i+3

                                                                                   f1.java
                                                              f3.java   f2.java
                                      …                                            f2.java    …
                                                              f1.java   f3.java
                                                                                   f3.java



Dec 2005                                             Time P                              Sep 2010




 All the Mylyn task data collectable from
  Eclipse subprojects (Dec 05 ~Sep 10)
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances

   The number of counted post defects
      (edited files only within bug fixing tasks)
         Task 1
                       f1.java Task 3
                    Task 2
                               =1                Task i        Task i+1      Task i+2   Task i+3
                       f2.java = 1
                       f3.java = 2                                                      f1.java
                                                                   f3.java   f2.java
                            …            …                         f1.java   f3.java
                                                                                        f2.java    …
                                                                                        f3.java
     Labeling rule for the file instance
           “buggy” (if # of post-defects > 0)
Dec 2005    “clean” (if # of post-defects = 0)            Time P                              Sep 2010


                                                                    Post-defect counting period
STEP2: Extraction of MIMs




Dec 2005                Time P     Sep 2010
STEP2: Extraction of MIMs


           Task 1    Task 2     Task 3          Task 4


                                                         …




Dec 2005                                                     Time P   Sep 2010


                    Metrics extraction period
STEP2: Extraction of MIMs


           Task 1

           f3.java
              ...
             edit
              …
             edit
              …




Dec 2005                                         Time P   Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1

           f3.java
              ...
             edit
              …
             edit
              …




Dec 2005                                         Time P                  Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1                                   MIMf3.java  valueTask1
           f3.java
              ...
             edit
              …
             edit
              …




Dec 2005                                         Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1     Task 2                        MIMf3.java  valueTask1
           f3.java
              ...
                      f1.java
                         ...
                                                   MIMf1.java  valueTask2
             edit       edit
              …          …
             edit       edit
              …          …




Dec 2005                                         Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1     Task 2      Task 3            MIMf3.java  valueTask1
           f3.java
              ...
                      f1.java
                         ...
                                 f2.java
                                    ...
                                                   MIMf1.java  valueTask2
             edit       edit       edit
              …
             edit
                         …
                        edit
                                    …
                                   edit
                                                    MIMf2.java  valueTask3
              …          …          …




Dec 2005                                         Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                                 Metrics Computation
           Task 1     Task 2      Task 3         Task 4           MIMf3.java  valueTask1
           f3.java
              ...
                      f1.java
                         ...
                                 f2.java
                                    ...
                                             f1.java             MIMf1.java  valueTask2
                                                 …edit
             edit       edit       edit
                                                                  MIMf2.java  valueTask3
                                                 …edit..
              …          …          …                      …
             edit       edit       edit      f2.java
              …          …          …            …edit…




Dec 2005                                                       Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                                 Metrics Computation
           Task 1     Task 2      Task 3         Task 4           MIMf3.java  valueTask1
           f3.java    f1.java    f2.java     f1.java              MIMf1.java  (valueTask2+valueTask4)
              ...        ...        ...          …edit
             edit
              …
                        edit
                         …
                                   edit
                                    …
                                                 …edit..
                                             f2.java
                                                           …      MIMf2.java  (valueTask3+valueTask4)
             edit       edit       edit
              …          …          …            …edit…




Dec 2005                                                       Time P                          Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                                 Metrics Computation
           Task 1     Task 2      Task 3         Task 4          MIMf3.java  valueTask1
           f3.java    f1.java    f2.java     f1.java             MIMf1.java  (valueTask2+valueTask4)/2
                                                                                                    )
              ...        ...        ...          …edit
             edit
              …
                        edit
                         …
                                   edit
                                    …
                                                 …edit..
                                             f2.java
                                                           …     MIMf2.java  (valueTask3+valueTask4)/2
                                                                                                    )
             edit       edit       edit
              …          …          …            …edit…




Dec 2005                                                       Time P                        Sep 2010


                     Metrics extraction period
Understand JAVA tool was used
for extracting 32 source Code
      Metrics (CMs)*




            * Chidamber and Kemerer, and OO metrics
Understand JAVA tool was used
      for extracting 32 source Code
                                Metrics (CMs)*
                                              List of selected source code metrics
                CVS
           last revision

       …

Dec 2005               Time P   Sep 2010




                                           * Chidamber and Kemerer, and OO metrics
Fifteen History Metrics (HMs)* were
    collected from the corresponding
              CVS repository




                              * Moser et al.
Fifteen History Metrics (HMs)* were
     collected from the corresponding
               CVS repository
                                             List of history metrics (HMs)


           CVS revisions


    …

Dec 2005                   Time P Sep 2010




                                                                    * Moser et al.
STEP3: Creating a training corpus

   Instance
               Extracted MIMs     Label
     Name


                                               Training
                    …                          Classifier




    Instance                       # of post
                 Extracted MIMs
      Name                          defects


                                                Training
                    …                          Regression
STEP4: Building prediction models

Classification and Regression
modeling with different machine
 learning algorithms using the
          WEKA* tool


                 * an open source data mining tool
STEP5: Prediction Evaluation




Classification
  Measures
STEP5: Prediction Evaluation

                     How many instances
                    are really buggy among
                      the buggy-predicted
                           outcomes?




Classification
  Measures
STEP5: Prediction Evaluation

                        How many instances
                       are really buggy among
                         the buggy-predicted
                              outcomes?

                     How many instances
                  are correctly predicted as
                   ‘buggy’ among the real
                         buggy ones




Classification
  Measures
STEP5: Prediction Evaluation



                         Regression
                          Measures


               correlation coefficient (-1~1)
                 mean absolute error (0~1)
                    root square error (0~1)
STEP5: Prediction Evaluation



                                         Regression
  between # of real buggy
instances and # of instances              Measures
     predicted as buggy



                               correlation coefficient (-1~1)
                                 mean absolute error (0~1)
                                    root square error (0~1)
T-test with 100 times of
 10-fold cross validation

  Reject H0* and accept H1*
        if p-value < 0.05
 (at the 95% confidence level)


 * H0: no difference in average performance, H1: different (better!)
Result Summary
 MIMs improve prediction accuracy for
1 different Eclipse project subjects

2 different machine learning algorithms

3 different model training periods
Prediction for different project subjects



          File instances and % of defects
Prediction for different project subjects




    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects



                                                BASELINE: Dummy Classifier
                                             predicts in a purely random manner

                                             e.g., for 12.5% of buggy instances
                                            Precision(B)=12.5%, Recall(B)=50%
                                                     F-measure(B)=20%




    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects




    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects



    T-test results (significant figures are in bold, p-value < 0.05)
Prediction with different algorithms
Prediction with different algorithms




  T-test results (significant figures are in bold, p-value < 0.05)
Prediction in different training periods




           Model training period        Model testing period


Dec 2005                                                  Sep 2010
                                   Time P
Prediction in different training periods



               50%                    :               50%
               70%                    :               30%
               80%                    :               20%

           Model training period          Model testing period


Dec 2005                                                    Sep 2010
                                   Time P
Prediction in different training periods
Prediction in different training periods




   T-test results (significant figures are in bold, p-value < 0.05)
Top 42 (37%) from MIMs
 among total 113 metrics
   (MIMs+CMs+HMs)
Possible Insight
TOP 1: NumLowDOIEdit
TOP 2: NumPatternEXSX
TOP 3: TimeSpentOnEdit
Possible Insight
        TOP 1: NumLowDOIEdit
        TOP 2: NumPatternEXSX
        TOP 3: TimeSpentOnEdit
Chances are that more defects might be generated
if a programmer       TOP2   repeatedly edit and browse a
file especially related to the previous defects     TOP3

  with putting more weight on editing time, and
  especially   TOP1   when editing such the files less
   frequently or less recently accessed ever …
Performance comparison
with regression modeling
for predicting # of post-defects
Predicting Post-Defect Numbers
Predicting Post-Defect Numbers




T-test results (significant figures are in bold, p-value < 0.05)
Threats to Validity
• Systems examined might not be representative

• Systems are all open source projects

• Defect information might be biased
Conclusion

Our findings exemplify that developer’s
interaction can affect software quality

Our proposed micro interaction metrics
 improve defect prediction accuracy
            significantly
                  …
We believe future defect prediction
models will use more developers’ direct and
   micro level interaction information




MIMs are a first step towards it
Thank you! Any Question?
• Problem
  – Developer’s interaction information can affect
    software quality (defects)?
• Approach
  – We proposed novel micro interaction metrics
    (MIMs) overcoming the popular static metrics
• Result
  – MIMs significantly improve prediction accuracy
    compared to source code metrics (CMs) and
    history metrics (HMs)
Backup Slides
One possible ARGUMENT:
 Some developers may not
have used Mylyn to fix bugs
Error chance in counting post-defects
    as a result getting biased labels
(i.e., incorrect % of buggy instances)
Repeated experiment using
 same instances but with a
different heuristics of defect
 counting, CVS-log-based
         approach*


    * with keywords: “fix”, “bug”, “bug report ID” in change logs
Prediction with CVS-log-based approach




                                     CVS-log-based
Prediction with CVS-log-based approach




                                                                      CVS-log-based
   T-test results (significant figures are in bold, p-value < 0.05)
CVS-log-based approach reported more
        additional post-defects
 (more % of buggy-labeled instances)
CVS-log-based approach reported more
        additional post-defects
 (more % of buggy-labeled instances)


 MIMs failed to feature them due to
lack of the corresponding Mylyn data
Note that it is difficult to
100% guarantee the quality of
          CVS change logs
 (e.g., no explicit bug ID, missing logs)

Weitere ähnliche Inhalte

Mehr von Sung Kim

REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesSung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksSung Kim
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionSung Kim
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012Sung Kim
 

Mehr von Sung Kim (20)

REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

  • 1. Micro Interaction Metrics for Defect Prediction Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In FSE 2011, Hungary, Sep. 5-9
  • 2. Outline • Research motivation • The existing metrics • The proposed metrics • Experiment results • Threats to validity • Conclusion
  • 3. Defect Prediction? why is it necessary?
  • 4. Software quality assurance is inherently a resource constrained activity!
  • 5. Predicting defect-prone software entities* is to put the best labor effort on the entities * functions or code files
  • 6. Indicators of defects • Complexity of source codes (Chidamber and Kemerer 1994) • Frequent code changes (Moser et al. 2008) • Previous defect information (Kim et al. 2007) • Code dependencies (Zimmermann 2007)
  • 8. Humans Error! Programmers make mistakes, consequently defects are injected, and software fails Human Bugs Software Errors Injected fails
  • 9. Programmer Interaction and Software Quality
  • 10. Programmer Interaction and Software Quality “Errors are from cognitive breakdown while understanding and implementing requirements” - Ko et al. 2005
  • 11. Programmer Interaction and Software Quality “Errors are from cognitive breakdown while understanding and implementing requirements” - Ko et al. 2005 “Work interruptions or task switching may affect programmer productivity” - DeLine et al. 2006
  • 12. Don’t we need to also consider developers’ interactions as defect indicators?
  • 13. …, but the existing indicators can NOT directly capture developers’ interactions
  • 14. Using Mylyn data, we propose novel “Micro Interaction Metrics (MIMs)” capturing developers’ interactions
  • 15. The Mylyn* data is stored as an attachment to the corresponding bug reports in the XML format * Eclipse plug-in storing and recovering task contexts
  • 16.
  • 17. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  • 18. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  • 19. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  • 20. Two levels of MIMs Design
  • 21. Two levels of MIMs Design File-level MIMs specific interactions for a file in a task (e.g., AvgTimeIntervalEditEdit)
  • 22. Two levels of MIMs Design File-level MIMs specific interactions for a file in a task (e.g., AvgTimeIntervalEditEdit) Task-level MIMs property values shared over the whole task (e.g., TimeSpent)
  • 23. Two levels of MIMs Design File-level MIMs Mylyn Task Logs specific interactions for a file in a task 10:30 Selection file A (e.g., AvgTimeIntervalEditEdit) 11:00 Edit file B 12:30 Edit file B
  • 24. Two levels of MIMs Design Mylyn Task Logs 10:30 Selection file A Task-level MIMs 11:00 Edit file B property values shared 12:30 Edit file B over the whole task (e.g., TimeSpent)
  • 25. The Proposed Micro Interaction Metrics
  • 26. The Proposed Micro Interaction Metrics
  • 27. The Proposed Micro Interaction Metrics
  • 28. For example, NumPatternSXEY is to capture this interaction:
  • 29. For example, NumPatternSXEY is to capture this interaction: “How much times did a programmer Select a file of group X and then Edit a file of group Y in a task activity?”
  • 30. X if a file shows defect locality* properties group X or Y Y otherwise H if a file has group H or L high** DOI value L otherwise * hinted by the paper [Kim et al. 2007] ** threshold: median of degree of interest (DOI) values in a task
  • 32. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010
  • 33. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 All the Mylyn task data collectable from Eclipse subprojects (Dec 05 ~Sep 10)
  • 34. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010
  • 35. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 36. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 37. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 38. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 39. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 40. STEP1: Counting & Labeling Instances The number of counted post defects (edited files only within bug fixing tasks) Task 1 f1.java Task 3 Task 2 =1 Task i Task i+1 Task i+2 Task i+3 f2.java = 1 f3.java = 2 f1.java f3.java f2.java … … f1.java f3.java f2.java … f3.java Labeling rule for the file instance “buggy” (if # of post-defects > 0) Dec 2005 “clean” (if # of post-defects = 0) Time P Sep 2010 Post-defect counting period
  • 41. STEP2: Extraction of MIMs Dec 2005 Time P Sep 2010
  • 42. STEP2: Extraction of MIMs Task 1 Task 2 Task 3 Task 4 … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 43. STEP2: Extraction of MIMs Task 1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 44. STEP2: Extraction of MIMs Metrics Computation Task 1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 45. STEP2: Extraction of MIMs Metrics Computation Task 1 MIMf3.java  valueTask1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 46. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 MIMf3.java  valueTask1 f3.java ... f1.java ... MIMf1.java  valueTask2 edit edit … … edit edit … … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 47. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 MIMf3.java  valueTask1 f3.java ... f1.java ... f2.java ... MIMf1.java  valueTask2 edit edit edit … edit … edit … edit MIMf2.java  valueTask3 … … … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 48. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java ... f1.java ... f2.java ... f1.java MIMf1.java  valueTask2 …edit edit edit edit MIMf2.java  valueTask3 …edit.. … … … … edit edit edit f2.java … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
  • 49. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java f1.java f2.java f1.java MIMf1.java  (valueTask2+valueTask4) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java  (valueTask3+valueTask4) edit edit edit … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
  • 50. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java f1.java f2.java f1.java MIMf1.java  (valueTask2+valueTask4)/2 ) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java  (valueTask3+valueTask4)/2 ) edit edit edit … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
  • 51. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* * Chidamber and Kemerer, and OO metrics
  • 52. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* List of selected source code metrics CVS last revision … Dec 2005 Time P Sep 2010 * Chidamber and Kemerer, and OO metrics
  • 53. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository * Moser et al.
  • 54. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository List of history metrics (HMs) CVS revisions … Dec 2005 Time P Sep 2010 * Moser et al.
  • 55. STEP3: Creating a training corpus Instance Extracted MIMs Label Name Training … Classifier Instance # of post Extracted MIMs Name defects Training … Regression
  • 56. STEP4: Building prediction models Classification and Regression modeling with different machine learning algorithms using the WEKA* tool * an open source data mining tool
  • 58. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? Classification Measures
  • 59. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? How many instances are correctly predicted as ‘buggy’ among the real buggy ones Classification Measures
  • 60. STEP5: Prediction Evaluation Regression Measures correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
  • 61. STEP5: Prediction Evaluation Regression between # of real buggy instances and # of instances Measures predicted as buggy correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
  • 62. T-test with 100 times of 10-fold cross validation Reject H0* and accept H1* if p-value < 0.05 (at the 95% confidence level) * H0: no difference in average performance, H1: different (better!)
  • 63. Result Summary MIMs improve prediction accuracy for 1 different Eclipse project subjects 2 different machine learning algorithms 3 different model training periods
  • 64. Prediction for different project subjects File instances and % of defects
  • 65. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
  • 66. Prediction for different project subjects BASELINE: Dummy Classifier predicts in a purely random manner e.g., for 12.5% of buggy instances Precision(B)=12.5%, Recall(B)=50% F-measure(B)=20% MIM: the proposed metrics CM: source code metrics HM: history metrics
  • 67. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
  • 68. Prediction for different project subjects T-test results (significant figures are in bold, p-value < 0.05)
  • 70. Prediction with different algorithms T-test results (significant figures are in bold, p-value < 0.05)
  • 71. Prediction in different training periods Model training period Model testing period Dec 2005 Sep 2010 Time P
  • 72. Prediction in different training periods 50% : 50% 70% : 30% 80% : 20% Model training period Model testing period Dec 2005 Sep 2010 Time P
  • 73. Prediction in different training periods
  • 74. Prediction in different training periods T-test results (significant figures are in bold, p-value < 0.05)
  • 75.
  • 76. Top 42 (37%) from MIMs among total 113 metrics (MIMs+CMs+HMs)
  • 77. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit
  • 78. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit Chances are that more defects might be generated if a programmer TOP2 repeatedly edit and browse a file especially related to the previous defects TOP3 with putting more weight on editing time, and especially TOP1 when editing such the files less frequently or less recently accessed ever …
  • 79. Performance comparison with regression modeling for predicting # of post-defects
  • 81. Predicting Post-Defect Numbers T-test results (significant figures are in bold, p-value < 0.05)
  • 82. Threats to Validity • Systems examined might not be representative • Systems are all open source projects • Defect information might be biased
  • 83. Conclusion Our findings exemplify that developer’s interaction can affect software quality Our proposed micro interaction metrics improve defect prediction accuracy significantly …
  • 84. We believe future defect prediction models will use more developers’ direct and micro level interaction information MIMs are a first step towards it
  • 85. Thank you! Any Question? • Problem – Developer’s interaction information can affect software quality (defects)? • Approach – We proposed novel micro interaction metrics (MIMs) overcoming the popular static metrics • Result – MIMs significantly improve prediction accuracy compared to source code metrics (CMs) and history metrics (HMs)
  • 87.
  • 88. One possible ARGUMENT: Some developers may not have used Mylyn to fix bugs
  • 89. Error chance in counting post-defects as a result getting biased labels (i.e., incorrect % of buggy instances)
  • 90. Repeated experiment using same instances but with a different heuristics of defect counting, CVS-log-based approach* * with keywords: “fix”, “bug”, “bug report ID” in change logs
  • 91. Prediction with CVS-log-based approach CVS-log-based
  • 92. Prediction with CVS-log-based approach CVS-log-based T-test results (significant figures are in bold, p-value < 0.05)
  • 93. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances)
  • 94. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances) MIMs failed to feature them due to lack of the corresponding Mylyn data
  • 95. Note that it is difficult to 100% guarantee the quality of CVS change logs (e.g., no explicit bug ID, missing logs)