SlideShare ist ein Scribd-Unternehmen logo
1 von 78
Downloaden Sie, um offline zu lesen
Think Locally, Act Globally
                       Improving Defect and Effort Prediction Models

                            Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. Hassan
                                                Queen’s University • Kingston, ON, Canada




                                                                      SOFTWARE ANALYSIS
                                                                       & INTELLIGENCE LAB
                                                                                            T
Saturday, 2 June, 12
Data Modelling in Empirical SE


                             measured from project data

                                Observations




                                                          2

Saturday, 2 June, 12
Data Modelling in Empirical SE


                                      measured from project data

                                          Observations



                       describe observations
                          mathematically       Model




                                                                   2

Saturday, 2 June, 12
Data Modelling in Empirical SE


                                      measured from project data

                                          Observations



                       describe observations
                          mathematically       Model                  Prediction
                                                                   guide decision making




                                         Understanding
                            guide process optimizations and future research


                                                                                           2

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset




                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data




                                      Testing Data




                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data   Learned Model
                                                           M




                                      Testing Data




                                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data   Learned Model
                                                           M




                                                            Y


                                      Testing Data     Predictions




                                                                     3

Saturday, 2 June, 12
Model Building Today




                  Whole Dataset      Training Data   Learned Model
                                                           M




                                                            Y


                                      Testing Data     Predictions

                                       Compare




                                                                     3

Saturday, 2 June, 12
Much Research Effort on
                       new metrics and new models!




                                                     4

Saturday, 2 June, 12
Maybe we need to look more at the data part




Saturday, 2 June, 12
In the Field




Saturday, 2 June, 12
In the Field




        Tom Zimmermann




Saturday, 2 June, 12
In the Field
                            We ran 622 cross-project
                         predictions and found that only
                             3.4% actually worked.




        Tom Zimmermann




Saturday, 2 June, 12
In the Field
                            We ran 622 cross-project
                         predictions and found that only
                             3.4% actually worked.




        Tom Zimmermann




                                                  Tim Menzies
Saturday, 2 June, 12
In the Field
                                            We ran 622 cross-project
                                         predictions and found that only
                                             3.4% actually worked.




        Tom Zimmermann




                             Rather than focus on
                       generalities, empirical SE should
                        focus more on context-specific
                                   principles.

                                                                  Tim Menzies
Saturday, 2 June, 12
In the Field
                                            We ran 622 cross-project
                                         predictions and found that only
                                             3.4% actually worked.




        Tom Zimmermann                 Taking local properties of data into
                                      consideration leads to better models!



                             Rather than focus on
                       generalities, empirical SE should
                        focus more on context-specific
                                   principles.

                                                                  Tim Menzies
Saturday, 2 June, 12
Using Locality in Statistical Models




Saturday, 2 June, 12
Using Locality in Statistical Models


             1         Does this principle work for statistical models?




Saturday, 2 June, 12
Using Locality in Statistical Models


             1         Does this principle work for statistical models?

             2         Does it work for Prediction?




Saturday, 2 June, 12
Using Locality in Statistical Models


             1         Does this principle work for statistical models?

             2         Does it work for Prediction?


             3         Can we do better?




Saturday, 2 June, 12
Building Local Models




                 Whole Dataset       Training Data   Learned Model
                                                           M




                                                            Y

                                     Testing Data      Predictions




                                                                     8

Saturday, 2 June, 12
Building Local Models


                                         ter Data
                                     Clus

                 Whole Dataset       Training Data   Learned Model
                                                           M




                                                            Y

                                     Testing Data      Predictions




                                                                     8

Saturday, 2 June, 12
Building Local Models
                                                              ltiple
                                                         n Mu
                                             Data    Lear dels
                                         ter             Mo
                                     Clus

                 Whole Dataset       Training Data   Learned Models
                                                        M1   M2   M3




                                                             Y

                                     Testing Data       Predictions




                                                                       8

Saturday, 2 June, 12
Building Local Models
                                                              ltiple
                                                         n Mu
                                             Data    Lear dels
                                         ter             Mo
                                     Clus

                 Whole Dataset       Training Data   Learned Models
                                                        M1       M2   M3




                                                             Y    Y   Y


                                     Testing Data       Predictions



                                                              dict
                                                          Pre ally
                                                        Ind ividu


                                                                           8

Saturday, 2 June, 12
Building Local Models
                                                              ltiple
                                                         n Mu
                                             Data    Lear dels
                                         ter             Mo
                                     Clus

                 Whole Dataset       Training Data   Learned Models
                                                        M1       M2   M3




                                                             Y    Y   Y


                                     Testing Data       Predictions

                                      Compare
                                                              dict
                                                          Pre ally
                                                        Ind ividu


                                                                           8

Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X

                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X

                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X

                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
HAPTER 2.
                                   Global StatisticalMODELS
                        GENERAL ASPECTS OF FITTING REGRESSION
                                                              Model                                                 34




                           f(X)




                                          0          1         2          3         4          5            6

                                                                          X


           Model fit leaves much room for improvement!
                                  Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                                9

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                       0          1         2          3         4          5            6

                                                                       X

                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                       0          1         2          3         4          5            6

                                                                       X

                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                                                                         Model 2

                                                      Model 1

                                       0          1         2          3         4          5            6

                                                                       X

                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
Local Statistical Model
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           3




                        f(X)




                                                                                         Model 2

                                                      Model 1

                                       0          1         2          3         4          5            6

                                                                       X


                                                   Improved Fit!
                               Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                                                             10

 Saturday, 2 June, 12
How can we use this approach to get an
                  even better fit?




Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                                                            Great Fit!


                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Be Even More Local !
HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                           34




                       f(X)




                                                                            Great Fit!
                                      BUT: Risk of Overfitting the Data!!
                                      0          1         2          3         4          5            6

                                                                      X

                              Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.

                                                                                                            12

Saturday, 2 June, 12
Saturday, 2 June, 12
Clustering independent of Fit




Saturday, 2 June, 12
CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                34




                                                                                                         f(X)
     f(X)




                                                                                                                        0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                        X
                                                    X
                                                                                                                Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.




                                                                                                                   C(Y |X) = f (X) = X ,
               C(Y |X) = f (X) = X ,
                                          where X                                                               = 0 + 1 X1 + 2 X2 + 3 X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                          and
                                                                                                            X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12
                                                                                                      X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit


                                                                                          CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                34




                                                                                                         f(X)
     f(X)




                                                                                                                        0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                        X
                                                    X
                                                                                                                Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.




                                                                                                                   C(Y |X) = f (X) = X ,
               C(Y |X) = f (X) = X ,
                                          where X                                                               = 0 + 1 X1 + 2 X2 + 3 X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                          and
                                                                                                            X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12
                                                                                                      X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                                    and
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 ,
                                                    and
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = Multivariate2 Adaptive4X4,
              0 + 1X1 + 2X + 3X3 + Regression Splines (MARS)
                                                    and
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Optimize Local Fit wrt. Minimizing Global Overfit
 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                                                                                                          34




                                                                                              CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS
GENERAL ASPECTS OF FITTING REGRESSION MODELS                                                    34

                                             f(X)




                                                                                                             f(X)
     f(X)




                                                                                                                            0          1         2          3         4          5            6
                    0          1         2          3         4          5            6
                                                                                                                                                            X
                                                    X
                                                                                                                    Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
            Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5.
                                                                  0                   1   2          3              4                 5                6

                                                                                                     X
                                                                                                                        C(Y |X) = f (X) = X ,
             C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5.
                          Figure 2.1: A
                                        ,
                                                    where X = 0 + 1X1 + 2X2 + 3X3 + 4
X           = Multivariate2 Adaptive4X4,
              0 + 1X1 + 2X + 3X3 + Regression Splines (MARS)
                                                    and
                     create local knowledge that optimizes process globally
                                                                                                                X1 = X X2 = (X                                                        a)+ 14
                        X1 = X X2 = (X                                        a)+
Saturday, 2 June, 12                                     C(Y |X) = f (X) = X ,                            X3 = (X b)+ X4 = (X                                                         c)+.
Case Study




                       15

Saturday, 2 June, 12
Case Study


                   Xalan 2.6
                               Post-Release Defects per Class
                                               20 CK Metrics
                 Lucene 2.4




                                                                15

Saturday, 2 June, 12
Case Study


                   Xalan 2.6
                                Post-Release Defects per Class
                                                20 CK Metrics
                 Lucene 2.4



                               Total Development Effort in Hours
                       CHINA
                                                 14 FP Metrics




                                                                   15

Saturday, 2 June, 12
Case Study


                   Xalan 2.6
                                Post-Release Defects per Class
                                                20 CK Metrics
                 Lucene 2.4



                               Total Development Effort in Hours
                       CHINA
                                                 14 FP Metrics



                                 Development Length in Months
                   NasaCoc              24 COCOMO-II Metrics
                                                                   15

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99




                                                                    16

Saturday, 2 June, 12
Results: Goodness of Fit

                                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                                                         Local
                                                                 Global                                                 MARS
                                                                                      (Clustered)

                              8    Xalan 2.6                      0.33                        0.52                         0.69
         Number of Clusters




                                                                                                                                         Dataset
                              6
                                                                                                                                            CHINA

                              4   Lucene 2.4                      0.32                        0.60                         0.83             Lucene 2.4
                                                                                                                                            NasaCoc
                                                                                                                                            Xalan 2.6
                              2


                              0     CHINA                         0.83                        0.89                         0.89
                                     Fold01   Fold02    Fold03    Fold04   Fold05    Fold06    Fold07    Fold08   Fold09    Fold10




                                   NasaCOC                        0.93                        0.97                         0.99
                                         Figure 3: Number of clusters generated by MCLUST in each run of the 10-fold cross validation.
  term for each additional prediction variable entering the                               is too small to continue or until a maximum number of terms
  regression model [23].                                                                  is reached. In our case study, the maximum number of terms
      For practical purposes, we use a publicly available imple-                          is automatically determined by the implementation, and is
  mentation of BIC-based model selection, contained in the                                based on the amount of independent variables we give as
  R package: BMA. The input to the BMA implementation                                     input. For MARS models, we use all independent variables
  is the dataset itself, as well as a list of all dependent and                           in a dataset after VIF analysis.
  independent variables that should be considered. In our case                               The first phase often builds a model that suffers from
                                                                                                                                               16
  study, we always supply a list of all independent variables                             overfitting. As a result, the second phase, called the back-
Saturday,were 12
  that 2 June, left after VIF analysis. The output of the BMA                             ward phase, prunes the model, to increase the model’s gen-
Results: Goodness of Fit

                  Rank-Correlation (0 = worst fit, 1 = optimal fit)
                                                Local
                                    Global                 MARS
                                             (Clustered)

                       Xalan 2.6     0.33       0.52       0.69

                       Lucene 2.4    0.32       0.60       0.83

                        CHINA        0.83       0.89       0.89

                       NasaCOC       0.93       0.97       0.99


   UP TO 2.5x BETTER FIT WHEN USING DATA LOCALITY!
                                                                    16

Saturday, 2 June, 12
Results: Prediction Error                           Global      Local         MARS



                       0.7                             1.2

                0.525                                  0.9

                   0.35      0.64                      0.6      1.15     1.15
                                      0.52                                       0.94
                0.175                           0.4    0.3

                         0                              0
                                    Xalan 2.6                      Lucene 2.4
                       800                              4

                       600                              3

                       400   765                        2
                                                                3.26
                                     552.85
                       200                              1                2.14
                                                                                 1.63
                                              234.43
                        0                               0
                                     CHINA                             NasaCoC




                                                                                        17

Saturday, 2 June, 12
Results: Prediction Error                           Global      Local         MARS



                       0.7                             1.2

                0.525                                  0.9

                   0.35      0.64                      0.6      1.15     1.15
                                      0.52                                       0.94
                0.175                           0.4    0.3

                         0                              0
                                    Xalan 2.6                      Lucene 2.4
                       800                              4

                       600                              3

                       400   765                        2
                                                                3.26
                                     552.85
                       200                              1                2.14
                                                                                 1.63
                                              234.43
                        0                               0
                                     CHINA                             NasaCoC


           Up to 4x lower prediction error with Local Models!
                                                                                        17

Saturday, 2 June, 12
?
                Model
            Interpretation




Saturday, 2 June, 12
Model Interpretation
        0.5
                             1 avg_cc                                         2 ca                                              3 cam                                                   4 cbm




                                                      0.80




                                                                                                                                                                1.1
                                                                                                         0.52




                                                                                                                                                                                                                           1.6
        −0.5




                                                      0.70




                                                                                                                                                                0.9
                                                                                                         0.48




                                                                                                                                                                                                                           1.2
        −1.5




                                                      0.60




                                                                                                                                                                0.7
                                                                                                         0.44
                                                      0.50




                                                                                                                                                                0.5
        −2.5




                                                                                                                                                                                                                           0.8
               0         5        10     15      20          0           50            100         150          0.0   0.2       0.4     0.6       0.8     1.0          0   5       10       15     20   25   30                  0.0



                                  5 ce                                        6 dam                                              7 dit                                                      8 ic
        0.62




                                                                                                         0.6




                                                                                                                                                                                                                           0.8
                                                                                                                                                                0.65
        0.58




                                                                                                         0.5
                                                      0.45




                                                                                                                                                                                                                           0.6
                                                                                                                                                                0.60
                                                                                                         0.4
        0.54




                                                                                                                                                                0.55




                                                                                                                                                                                                                           0.4
                                                                                                         0.3
                                                      0.35
        0.50




                                                                                                                                                                0.50




                                                                                                                                                                                                                           0.2
               0    10       20     30   40   50             0.0   0.2    0.4         0.6    0.8   1.0          1     2     3    4      5     6     7      8           0       1        2          3     4        5              1




                          (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset
                            9
                               Part            Model              11 loc           12 max_cc
                                                                                                                                                                                                                      (b) P
        1.8




                                                      0.7




                                                                                                                                                                                                                           6
                                                                                                                                                                                                                      2.6 d
                                                                                                         2.0




                                                                                                                                                                4
                                                      0.6




                                                                                                                                                                                                                           5
        1.4




                                                                                                                                                                                                                           4
                                                                                                                                                                3
                                                      0.5




                                                                                                         1.5


Figure 6: Global models report general trends, while global models with local c
        1.0




                                                                                                                                                                                                                           3
                                                                                                                                                                2
                                                      0.4




                                                                                                         1.0




                                                                                                                                                                                                                           2
                                                                                                                                                                1
                                                      0.3
        0.6




describes the response (in this case bugs) while keeping all other prediction variab
                                                                                                         0.5




                                                                                                                                                                                                                           1
               0   1000           3000    5000               0.0    0.5         1.0         1.5    2.0          0     1000       2000       3000        4000           0   20      40    60      80      120                     0


                                                                         Fold 9, Cluster 1
                             13 mfa                                       14 moa                                                15 noc                                                  16 npm                              pr
                                                                                                         0.50
                                                      0.58




                                                                                                                                                                                                                           1.0
        0.51




                                         ic                                                   npm                                                              mfa
                                                                                                                                                                                                                            O
                                                                                                                                                                0.70




                                                                                                                                                                                                                           0.5
                                                                                                                                                                                                                      19
        0.49




                                                                                                         0.46




                                                                                                                                                                                                                            w




                                                                                                                                                                                                                           0.0
                                                      0.54




                                                                                                                                                                0.60
        .47




Saturday, 2 June, 12
Model Interpretation
        0.5
                             1 avg_cc                                         2 ca                                              3 cam                                                   4 cbm




                                                      0.80




                                                                                                                                                                1.1
                                                                                                         0.52




                                                                                                                                                                                                                           1.6
        −0.5




                                                      0.70




                                                                                                                                                                0.9
                                                                                                         0.48




                                                                                                                                                                                                                           1.2
        −1.5




                                                      0.60




                                                                                                                                                                0.7
                                                                                                         0.44
                                                      0.50




                                                                                                                                                                0.5
        −2.5




                                                                                                                                                                                                                           0.8
               0         5        10     15      20          0           50            100         150          0.0   0.2       0.4     0.6       0.8     1.0          0   5       10       15     20   25   30                  0.0



                                  5 ce                                        6 dam                                              7 dit                                                      8 ic
        0.62




                                                                                                         0.6




                                                                                                                                                                                                                           0.8
                                                                                                                                                                0.65
        0.58




                                                                                                         0.5
                                                      0.45




                                                                                                                                                                                                                           0.6
                                                                                                                                                                0.60
                                                                                                         0.4
        0.54




                                                                                                                                                                0.55




                                                                                                                                                                                                                           0.4
                                                                                                         0.3
                                                      0.35
        0.50




                                                                                                                                                                0.50




                                                                                                                                                                                                                           0.2
               0    10       20     30   40   50             0.0   0.2    0.4         0.6    0.8   1.0          1     2     3    4      5     6     7      8           0       1        2          3     4        5              1




                          (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset
                            9
                               Part            Model              11 loc           12 max_cc
                                                                                                                                                                                                                      (b) P
        1.8




                                                      0.7




                                                                                                                                                                                                                           6
                                                                                                                                                                                                                      2.6 d
                                                                                                         2.0




                                                                                                                                                                4
                                                      0.6




                                                                                                                                                                                                                           5
        1.4




                                                                                                                                                                                                                           4
                                                                                                                                                                3
                                                      0.5




                                                                                                         1.5


Figure 6: Global models report general trends, while global models with local c
          Traditional Global Model: General Trends
        1.0




                                                                                                                                                                                                                           3
                                                                                                                                                                2
                                                      0.4




                                                                                                         1.0




                                                                                                                                                                                                                           2
                                                                                                                                                                1
                                                      0.3
        0.6




describes the response (in this case bugs) while keeping all other prediction variab
                                                                                                         0.5




                                                                                                                                                                                                                           1
               0   1000           3000    5000               0.0    0.5         1.0         1.5    2.0          0     1000       2000       3000        4000           0   20      40    60      80      120                     0


                                                                         Fold 9, Cluster 1
                             13 mfa                                       14 moa                                                15 noc                                                  16 npm                              pr
                                                                                                         0.50
                                                      0.58




                                                                                                                                                                                                                           1.0
        0.51




                                         ic                                                   npm                                                              mfa
                                                                                                                                                                                                                            O
                                                                                                                                                                0.70




                                                                                                                                                                                                                           0.5
                                                                                                                                                                                                                      19
        0.49




                                                                                                         0.46




                                                                                                                                                                                                                            w




                                                                                                                                                                                                                           0.0
                                                      0.54




                                                                                                                                                                0.60
        .47




Saturday, 2 June, 12
Model Interpretation
        0.5
                             1 avg_cc                                         2 ca                                              3 cam                                                   4 cbm




                                                      0.80




                                                                                                                                                                1.1
                                                                                                         0.52




                                                                                                                                                                                                                           1.6
        −0.5




                                                      0.70




                                                                                                                                                                0.9
                                                                                                         0.48




                                                                                                                                                                                                                           1.2
        −1.5




                                                      0.60




                                                                                                                                                                0.7
                                                                                                         0.44
                                                      0.50




                                                                                                                                                                0.5
        −2.5




                                                                                                                                                                                                                           0.8
               0         5        10     15      20          0           50            100         150          0.0   0.2       0.4     0.6       0.8     1.0          0   5       10       15     20   25   30                  0.0



                                  5 ce                                        6 dam                                              7 dit                                                      8 ic
        0.62




                                                                                                         0.6




                                                                                                                                                                                                                           0.8
                                                                                                                                                                0.65
        0.58




                                                                                                         0.5
                                                      0.45




                                                                                                                                                                                                                           0.6
                                                                                                                                                                0.60
                                                                                                         0.4
        0.54




                                                                                                                                                                0.55




                                                                                                                                                                                                                           0.4
                                                                                                         0.3
                                                      0.35
        0.50




                                                                                                                                                                0.50




                                                                                                                                                                                                                           0.2
               0    10       20     30   40   50             0.0   0.2    0.4         0.6    0.8   1.0          1     2     3    4      5     6     7      8           0       1        2          3     4        5              1




                          (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset
                            9
                               Part            Model              11 loc           12 max_cc
                                                                                                                                                                                                                      (b) P
        1.8




                                                      0.7




                                                                                                                                                                                                                           6
                                                                                                                                                                                                                      2.6 d
                                                                                                         2.0




                                                                                                                                                                4
                                                      0.6




                                                                                                                                                                                                                           5
        1.4




                                                                                                                                                                                                                           4
                                                                                                                                                                3
                                                      0.5




                                                                                                         1.5


Figure 6: Global models report general trends, while global models with local c
             Traditional Global Model: General Trends
        1.0




                                                                                                                                                                                                                           3
                                                                                                                                                                2
                                                      0.4




                                                                                                         1.0




                                                                                                                                                                                                                           2
describes One Curve per metric, run corp on all other prediction variab
          the response (in this case bugs) while keeping that curve



                                                                                                                                                                1
                                                      0.3
        0.6




                                                                                                         0.5




                                                                                                                                                                                                                           1
               0   1000           3000    5000               0.0    0.5         1.0         1.5    2.0          0     1000       2000       3000        4000           0   20      40    60      80      120                     0


                                                                         Fold 9, Cluster 1
                             13 mfa                                       14 moa                                                15 noc                                                  16 npm                              pr
                                                                                                         0.50
                                                      0.58




                                                                                                                                                                                                                           1.0
        0.51




                                         ic                                                   npm                                                              mfa
                                                                                                                                                                                                                            O
                                                                                                                                                                0.70




                                                                                                                                                                                                                           0.5
                                                                                                                                                                                                                      19
        0.49




                                                                                                         0.46




                                                                                                                                                                                                                            w




                                                                                                                                                                                                                           0.0
                                                      0.54




                                                                                                                                                                0.60
        .47




Saturday, 2 June, 12
1
                                                                                                                                                                                                                                                                      4
                                                                                  0.3 0.4 0.




                                                                                                                                        0.5 1.0 1.




                                                                                                                                                                                                       3
                                                                                0.3 0.4 0.5
  Figure 6: Global models report general trends, while global models with local considerations give insig




                                                                                                                                      0.5 1.0 1.5
 Model Interpretation
 Figure 6: Global models report general trends, while global models with local considerations give insight
        1.0




                                                                                                                                                                                                                                                               3




                                                                                                                                                                                                                                                                                                                      1.0
                                                                                                                                                                                                  2
      1.0




                                                                                                                                                                                                                                                                      3




                                                                                                                                                                                                                                                                                                                            1.0
                                                                                                                                                                                                       2




                                                                                                                                                                                                                                                               2
                                                                                                                                                                                                                                                                      2
                                                                                                                                                                                                  1
        0.6
  describes the response (in this case bugs) while keeping all other prediction variables atat their median val
   describes the response (in this case bugs) while keeping all other prediction variables their median value




                                                                                                                                                                                                                                                                                                                      0.8
                                                                                                                                                                                                                                                               1
                                                                                                                                                                                                       1
      0.6




                                                                                                                                                                                                                                                                                                                            0.8
                                                                                                                                                                                                                                                                      1
                                            0    1000     3000    5000                          0.0   0.5     1.0    1.5        2.0                  0   1000 2000 3000 4000                                        0       20 40 60 80             120                   0         1000 2000 3000 4000                     0.0      0.2    0.4
                                        0       1000     3000    5000                          0.0    0.5     1.0    1.5        2.0                  0   1000 2000 3000 4000                                            0    20 40 60 80              120                       0     1000 2000 3000 4000                         0.0    0.2    0


                                Fold 9, Cluster 1 15 noc
                                Fold 9, Cluster 1
                                                                             prediction models lead
                                                                              prediction models lea
              13 mfa             14 moa                              16 npm               13 npm




                                                                                                                                         0.50
                                                                                                                                                                                                                                                                                                   13 npm




                                                                                   0.58
                                                        13 mfa                                              14 moa                                            15 noc                                                                   16 npm




                                                                                                                                                                                                                                                                  0.0 0.5 1.0
                                 0.51




                                                                                                                                      0.50
                                                                                0.58
                      ic                npm                mfa
                                                                             Our findings thus co




                                                                                                                                                                                                                                                               0.0 0.5 1.0
                              0.51




                                                                                                                                                                                                     0.70
                     ic                 npm                mfa
                                                                              Our findings thus c




                                                                                                                                                                                                  0.70
                         0.49




                                                                                                                                         0.46
                                                                             who observed a asimil
                      0.49




                                                                                   0.54




                                                                                                                                      0.46
                                                                              who observed sim




                                                                                                                                                                                                             0.60
                                                                                0.54
                 0.47




                                                                                                                                                                                                          0.60
                                                                            Clustermachine-lear
                                                                             WHICH 1
              0.47




                                                                                                                                         0.42
                                                                              WHICH machine-lea




                                                                                                                                                                                                                                                                  −1.0
                                                                                                                                      0.42




                                                                                                                                                                                                     0.50
                                                                                   0.50
         0.45




                                                                                                                                                                                                                                                               −1.0
                                                                                                                                                                                                  0.50
                                                                                0.50
      0.45




                                                                             have practical implic
                                 0.0   0.2  0.4  0.6              0.8     1.0                   0       5       10         15                        0   5   10   15              20   25   30                      0       20    40    60    80    100 120               0         20        40     60 80 100 120


                                                                              have practical impli
                               0.0   0.2   0.4  0.6              0.8     1.0                   0        5       10         15                        0   5   10      15           20   25    30                         0    20    40    60    80    100 120                    0        20        40 60 80 100 120


                                  0  2   4  6   8 10
                                                                             using regression mod
                                  0  2   4  6   8 10
                                                                              using regression mo
                                                                             are more insightful th
                                 Fold 9, Cluster 6 ...
                                                                              are more insightful t
                                                                             general trends across
                                 Fold 9, Cluster 6                            general trends acros
                      ic                 npm               mfa
                                                                             demonstrated that such
                     ic                  npm               mfa
                                                                              demonstrated that su
                                                                             particular parts of the


                                                                                                                                                                  0 01 12 2 3 3
                                                                              particular parts of th
                                                                             in the Xalan 2.6 def
                                                                              in the Xalan 2.6 de
                                                                            Cluster 6 are infl
                                                                             sets of classes
                0  1   2  3 4     0 10 20 30 40   60
                                                                              sets of classes are in
                                                                             as inheritance, cohes
               0  1   2  3  4     0 10 20 30 40    60                         as inheritance, coh
                                                                             reinforce the recomm
      Figure 7: Example of contradicting trends in local models (Xalan 2.6,
    Figure 17: Example ofin Fold 9). trends in local models (Xalan 2.6,
                              contradicting
                                                                             the use of the recom
                                                                              reinforce a “one-size
      Cluster and Cluster 6                                                  model, whenatrying to
                                                                              the use of “one-si
    Cluster 1 and Cluster 6 in Fold 9).                                       model, when trying t
      model already partition the data into regions with individual
    model already partition the data into regions increase of ic
      properties. For example, we observe that an with individual B. Act Globally
    properties. For example, we observethrough parent classes)                B. Act Globally
      (measuring the inheritance coupling that an increase of ic When the goal is carry
    (measuring the only have a negative effect on bug-proneness
      is predicted to                                                         When the goal is car
                         inheritance coupling through parent classes) understanding, local m   20

Saturday, predicted to only have a negative effect on bug-proneness
    is 2 June, 12                                                             understanding, local
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

Weitere ähnliche Inhalte

Mehr von Nicolas Bettenburg

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...Nicolas Bettenburg
 
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Nicolas Bettenburg
 
Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...Nicolas Bettenburg
 
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeUsing Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeNicolas Bettenburg
 
A Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured DataA Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured DataNicolas Bettenburg
 
Managing Community Contributions: Lessons Learned from a Case Study on Andro...
Managing Community Contributions:  Lessons Learned from a Case Study on Andro...Managing Community Contributions:  Lessons Learned from a Case Study on Andro...
Managing Community Contributions: Lessons Learned from a Case Study on Andro...Nicolas Bettenburg
 
Studying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityStudying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityNicolas Bettenburg
 
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelAn Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelNicolas Bettenburg
 
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...Nicolas Bettenburg
 
Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*Nicolas Bettenburg
 
Automatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing ChangesAutomatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing ChangesNicolas Bettenburg
 
Cloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered HarmfulCloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered HarmfulNicolas Bettenburg
 
Predictors of Customer Perceived Quality
Predictors of Customer Perceived QualityPredictors of Customer Perceived Quality
Predictors of Customer Perceived QualityNicolas Bettenburg
 
Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.Nicolas Bettenburg
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And RecallNicolas Bettenburg
 
Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Nicolas Bettenburg
 
The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07Nicolas Bettenburg
 

Mehr von Nicolas Bettenburg (20)

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...
 
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...
 
Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...Mining Development Repositories to Study the Impact of Collaboration on Softw...
Mining Development Repositories to Study the Impact of Collaboration on Softw...
 
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeUsing Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
 
A Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured DataA Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical Information in Unstructured Data
 
Managing Community Contributions: Lessons Learned from a Case Study on Andro...
Managing Community Contributions:  Lessons Learned from a Case Study on Andro...Managing Community Contributions:  Lessons Learned from a Case Study on Andro...
Managing Community Contributions: Lessons Learned from a Case Study on Andro...
 
Mud flash
Mud flashMud flash
Mud flash
 
Studying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityStudying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software Quality
 
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelAn Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
 
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...
 
Fuzzy Logic in Smart Homes
Fuzzy Logic in Smart HomesFuzzy Logic in Smart Homes
Fuzzy Logic in Smart Homes
 
Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*Finding Paths in Large Spaces - A* and Hierarchical A*
Finding Paths in Large Spaces - A* and Hierarchical A*
 
Automatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing ChangesAutomatic Identification of Bug Introducing Changes
Automatic Identification of Bug Introducing Changes
 
Cloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered HarmfulCloning Considered Harmful Considered Harmful
Cloning Considered Harmful Considered Harmful
 
Approximation Algorithms
Approximation AlgorithmsApproximation Algorithms
Approximation Algorithms
 
Predictors of Customer Perceived Quality
Predictors of Customer Perceived QualityPredictors of Customer Perceived Quality
Predictors of Customer Perceived Quality
 
Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.Extracting Structural Information from Bug Reports.
Extracting Structural Information from Bug Reports.
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And Recall
 
Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?
 
The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07The Quality of Bug Reports in Eclipse ETX'07
The Quality of Bug Reports in Eclipse ETX'07
 

Kürzlich hochgeladen

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Kürzlich hochgeladen (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

  • 1. Think Locally, Act Globally Improving Defect and Effort Prediction Models Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. Hassan Queen’s University • Kingston, ON, Canada SOFTWARE ANALYSIS & INTELLIGENCE LAB T Saturday, 2 June, 12
  • 2. Data Modelling in Empirical SE measured from project data Observations 2 Saturday, 2 June, 12
  • 3. Data Modelling in Empirical SE measured from project data Observations describe observations mathematically Model 2 Saturday, 2 June, 12
  • 4. Data Modelling in Empirical SE measured from project data Observations describe observations mathematically Model Prediction guide decision making Understanding guide process optimizations and future research 2 Saturday, 2 June, 12
  • 5. Model Building Today Whole Dataset 3 Saturday, 2 June, 12
  • 6. Model Building Today Whole Dataset Training Data Testing Data 3 Saturday, 2 June, 12
  • 7. Model Building Today Whole Dataset Training Data Learned Model M Testing Data 3 Saturday, 2 June, 12
  • 8. Model Building Today Whole Dataset Training Data Learned Model M Y Testing Data Predictions 3 Saturday, 2 June, 12
  • 9. Model Building Today Whole Dataset Training Data Learned Model M Y Testing Data Predictions Compare 3 Saturday, 2 June, 12
  • 10. Much Research Effort on new metrics and new models! 4 Saturday, 2 June, 12
  • 11. Maybe we need to look more at the data part Saturday, 2 June, 12
  • 13. In the Field Tom Zimmermann Saturday, 2 June, 12
  • 14. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Saturday, 2 June, 12
  • 15. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Tim Menzies Saturday, 2 June, 12
  • 16. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Rather than focus on generalities, empirical SE should focus more on context-specific principles. Tim Menzies Saturday, 2 June, 12
  • 17. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Taking local properties of data into consideration leads to better models! Rather than focus on generalities, empirical SE should focus more on context-specific principles. Tim Menzies Saturday, 2 June, 12
  • 18. Using Locality in Statistical Models Saturday, 2 June, 12
  • 19. Using Locality in Statistical Models 1 Does this principle work for statistical models? Saturday, 2 June, 12
  • 20. Using Locality in Statistical Models 1 Does this principle work for statistical models? 2 Does it work for Prediction? Saturday, 2 June, 12
  • 21. Using Locality in Statistical Models 1 Does this principle work for statistical models? 2 Does it work for Prediction? 3 Can we do better? Saturday, 2 June, 12
  • 22. Building Local Models Whole Dataset Training Data Learned Model M Y Testing Data Predictions 8 Saturday, 2 June, 12
  • 23. Building Local Models ter Data Clus Whole Dataset Training Data Learned Model M Y Testing Data Predictions 8 Saturday, 2 June, 12
  • 24. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Testing Data Predictions 8 Saturday, 2 June, 12
  • 25. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Y Y Testing Data Predictions dict Pre ally Ind ividu 8 Saturday, 2 June, 12
  • 26. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Y Y Testing Data Predictions Compare dict Pre ally Ind ividu 8 Saturday, 2 June, 12
  • 27. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 28. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 29. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 30. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Model fit leaves much room for improvement! Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  • 31. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 32. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 33. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) Model 2 Model 1 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 34. Local Statistical Model CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) Model 2 Model 1 0 1 2 3 4 5 6 X Improved Fit! Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  • 35. How can we use this approach to get an even better fit? Saturday, 2 June, 12
  • 36. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 37. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 38. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 39. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) Great Fit! 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 40. Be Even More Local ! HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) Great Fit! BUT: Risk of Overfitting the Data!! 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12 Saturday, 2 June, 12
  • 42. Clustering independent of Fit Saturday, 2 June, 12
  • 43. CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. C(Y |X) = f (X) = X , C(Y |X) = f (X) = X , where X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 X3 = (X b)+ X4 = (X c)+.
  • 44. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. C(Y |X) = f (X) = X , C(Y |X) = f (X) = X , where X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 X3 = (X b)+ X4 = (X c)+.
  • 45. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 46. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 47. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = Multivariate2 Adaptive4X4, 0 + 1X1 + 2X + 3X3 + Regression Splines (MARS) and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 48. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4 X = Multivariate2 Adaptive4X4, 0 + 1X1 + 2X + 3X3 + Regression Splines (MARS) and create local knowledge that optimizes process globally X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+ Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  • 49. Case Study 15 Saturday, 2 June, 12
  • 50. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 15 Saturday, 2 June, 12
  • 51. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 Total Development Effort in Hours CHINA 14 FP Metrics 15 Saturday, 2 June, 12
  • 52. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 Total Development Effort in Hours CHINA 14 FP Metrics Development Length in Months NasaCoc 24 COCOMO-II Metrics 15 Saturday, 2 June, 12
  • 53. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) 16 Saturday, 2 June, 12
  • 54. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 55. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 56. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 57. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16 Saturday, 2 June, 12
  • 58. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) 8 Xalan 2.6 0.33 0.52 0.69 Number of Clusters Dataset 6 CHINA 4 Lucene 2.4 0.32 0.60 0.83 Lucene 2.4 NasaCoc Xalan 2.6 2 0 CHINA 0.83 0.89 0.89 Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 NasaCOC 0.93 0.97 0.99 Figure 3: Number of clusters generated by MCLUST in each run of the 10-fold cross validation. term for each additional prediction variable entering the is too small to continue or until a maximum number of terms regression model [23]. is reached. In our case study, the maximum number of terms For practical purposes, we use a publicly available imple- is automatically determined by the implementation, and is mentation of BIC-based model selection, contained in the based on the amount of independent variables we give as R package: BMA. The input to the BMA implementation input. For MARS models, we use all independent variables is the dataset itself, as well as a list of all dependent and in a dataset after VIF analysis. independent variables that should be considered. In our case The first phase often builds a model that suffers from 16 study, we always supply a list of all independent variables overfitting. As a result, the second phase, called the back- Saturday,were 12 that 2 June, left after VIF analysis. The output of the BMA ward phase, prunes the model, to increase the model’s gen-
  • 59. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 UP TO 2.5x BETTER FIT WHEN USING DATA LOCALITY! 16 Saturday, 2 June, 12
  • 60. Results: Prediction Error Global Local MARS 0.7 1.2 0.525 0.9 0.35 0.64 0.6 1.15 1.15 0.52 0.94 0.175 0.4 0.3 0 0 Xalan 2.6 Lucene 2.4 800 4 600 3 400 765 2 3.26 552.85 200 1 2.14 1.63 234.43 0 0 CHINA NasaCoC 17 Saturday, 2 June, 12
  • 61. Results: Prediction Error Global Local MARS 0.7 1.2 0.525 0.9 0.35 0.64 0.6 1.15 1.15 0.52 0.94 0.175 0.4 0.3 0 0 Xalan 2.6 Lucene 2.4 800 4 600 3 400 765 2 3.26 552.85 200 1 2.14 1.63 234.43 0 0 CHINA NasaCoC Up to 4x lower prediction error with Local Models! 17 Saturday, 2 June, 12
  • 62. ? Model Interpretation Saturday, 2 June, 12
  • 63. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5 Figure 6: Global models report general trends, while global models with local c 1.0 3 2 0.4 1.0 2 1 0.3 0.6 describes the response (in this case bugs) while keeping all other prediction variab 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47 Saturday, 2 June, 12
  • 64. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5 Figure 6: Global models report general trends, while global models with local c Traditional Global Model: General Trends 1.0 3 2 0.4 1.0 2 1 0.3 0.6 describes the response (in this case bugs) while keeping all other prediction variab 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47 Saturday, 2 June, 12
  • 65. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5 Figure 6: Global models report general trends, while global models with local c Traditional Global Model: General Trends 1.0 3 2 0.4 1.0 2 describes One Curve per metric, run corp on all other prediction variab the response (in this case bugs) while keeping that curve 1 0.3 0.6 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47 Saturday, 2 June, 12
  • 66. 1 4 0.3 0.4 0. 0.5 1.0 1. 3 0.3 0.4 0.5 Figure 6: Global models report general trends, while global models with local considerations give insig 0.5 1.0 1.5 Model Interpretation Figure 6: Global models report general trends, while global models with local considerations give insight 1.0 3 1.0 2 1.0 3 1.0 2 2 2 1 0.6 describes the response (in this case bugs) while keeping all other prediction variables atat their median val describes the response (in this case bugs) while keeping all other prediction variables their median value 0.8 1 1 0.6 0.8 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 1000 2000 3000 4000 0.0 0.2 0.4 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 1000 2000 3000 4000 0.0 0.2 0 Fold 9, Cluster 1 15 noc Fold 9, Cluster 1 prediction models lead prediction models lea 13 mfa 14 moa 16 npm 13 npm 0.50 13 npm 0.58 13 mfa 14 moa 15 noc 16 npm 0.0 0.5 1.0 0.51 0.50 0.58 ic npm mfa Our findings thus co 0.0 0.5 1.0 0.51 0.70 ic npm mfa Our findings thus c 0.70 0.49 0.46 who observed a asimil 0.49 0.54 0.46 who observed sim 0.60 0.54 0.47 0.60 Clustermachine-lear WHICH 1 0.47 0.42 WHICH machine-lea −1.0 0.42 0.50 0.50 0.45 −1.0 0.50 0.50 0.45 have practical implic 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 0 5 10 15 20 25 30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 have practical impli 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 0 5 10 15 20 25 30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 2 4 6 8 10 using regression mod 0 2 4 6 8 10 using regression mo are more insightful th Fold 9, Cluster 6 ... are more insightful t general trends across Fold 9, Cluster 6 general trends acros ic npm mfa demonstrated that such ic npm mfa demonstrated that su particular parts of the 0 01 12 2 3 3 particular parts of th in the Xalan 2.6 def in the Xalan 2.6 de Cluster 6 are infl sets of classes 0 1 2 3 4 0 10 20 30 40 60 sets of classes are in as inheritance, cohes 0 1 2 3 4 0 10 20 30 40 60 as inheritance, coh reinforce the recomm Figure 7: Example of contradicting trends in local models (Xalan 2.6, Figure 17: Example ofin Fold 9). trends in local models (Xalan 2.6, contradicting the use of the recom reinforce a “one-size Cluster and Cluster 6 model, whenatrying to the use of “one-si Cluster 1 and Cluster 6 in Fold 9). model, when trying t model already partition the data into regions with individual model already partition the data into regions increase of ic properties. For example, we observe that an with individual B. Act Globally properties. For example, we observethrough parent classes) B. Act Globally (measuring the inheritance coupling that an increase of ic When the goal is carry (measuring the only have a negative effect on bug-proneness is predicted to When the goal is car inheritance coupling through parent classes) understanding, local m 20 Saturday, predicted to only have a negative effect on bug-proneness is 2 June, 12 understanding, local