SlideShare a Scribd company logo
1 of 33
Download to read offline
Physical and
   Conceptual
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e          Physical and Conceptual Identifier Dispersion:
    Gu´h´neuc,
      e e
Giuliano Antoniol
                            Measures and Relation to Fault Proneness
Introduction

Our study

Dispersion
                            Venera Arnaoudova Laleh Eshkevari Rocco Oliveto
measures                          Yann-Ga¨l Gu´h´neuc Giuliano Antoniol
                                         e    e e
Our study - refined

Case study                                         ´
                            SOCCER Lab. – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada
                                                                                  e
RQ1 – Metric Relevance
                                   SE@SA Lab – DMI, University of Salerno - Salerno - Italy
RQ2 – Relation to Faults
                                                  ´
                             Ptidej Team – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada
                                                                                e
Conclusions and
future work
                                                September 15, 2010



                                                    SOftware Cost-effective Change and Evolution Research Lab
                                                    Software Engineering @ SAlerno
                                                    Pattern Trace Identification, Detection, and Enhancement in Java
Physical and
   Conceptual              Outline
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
                           Introduction
    Gu´h´neuc,
      e e
Giuliano Antoniol
                           Our study
Introduction

Our study
                           Dispersion measures
Dispersion
measures

Our study - refined         Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults   Case study
Conclusions and
future work
                              RQ1 – Metric Relevance
                              RQ2 – Relation to Faults

                           Conclusions and future work


        2 / 16
Physical and
   Conceptual              Introduction
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e                      Fault identification
Giuliano Antoniol
                                   size (e.g., [Gyim´thy et al., 2005])
                                                    o
Introduction
                                   cohesion (e.g., [Liu et al., 2009])
Our study
                                   coupling (e.g., [Marcus et al., 2008])
Dispersion
measures
                                   number of changes (e.g., [Zimmermann et al., 2007])
Our study - refined             Importance of linguistic information
Case study
RQ1 – Metric Relevance             program comprehension (e.g.,
RQ2 – Relation to Faults
                                   [Takang et al., 1996, Deissenboeck and Pizka, 2006,
Conclusions and
future work
                                   Haiduc and Marcus, 2008, Binkley et al., 2009])
                                   code quality (e.g., [Marcus et al., 2008,
                                   Poshyvanyk and Marcus, 2006, Butler et al., 2009])




        3 / 16
Physical and
   Conceptual              Our study
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
    Gu´h´neuc,
      e e
                e          Term dispersion
Giuliano Antoniol
                               We are interested in studying the relation between term
Introduction                   dispersion and the quality of the source code.
Our study

Dispersion
                                  term basic component of identifiers
measures
                             dispersion the way terms are scattered among different
Our study - refined
                                        entities (attributes and methods)
Case study
RQ1 – Metric Relevance          quality absence of faults
RQ2 – Relation to Faults


Conclusions and
future work                    Example: What is the impact of using getRelativePath,
                               returnAbsolutePath, and setPath as method names on
                               the fault proneness of those methods?



        4 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             (1/3)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
    Gu´h´neuc,
      e e
                e          Physical dispersion - Entropy
Giuliano Antoniol
                                    Terms
Introduction                                                                                              Entropy
Our study

Dispersion                    fee
measures

Our study - refined

Case study                    foo
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work
                              bar

                                                                                                       Entities
                                            E1         E2           E3          E4         E5
                                           The circle indicates the occurrences of a term in an entity.
                                     The higher the size of the circle the higher the number of occurrences.




        5 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             (2/3)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e          Conceptual dispersion - Context Coverage
    Gu´h´neuc,
      e e
Giuliano Antoniol                        Entity Contexts
                                                     C4
Introduction
                                C1              E4                C2
Our study
                                    E1                     E2
Dispersion
measures                                                        E5
                                                                                     Terms                                                       Context
Our study - refined                         E3                                                                                                   coverage
Case study                                       C3
RQ1 – Metric Relevance                                                         fee
RQ2 – Relation to Faults   Entity contexts are identified taking into account
                                  the terms contained in the entities.
Conclusions and
                                                                               foo
future work

                                                                               bar

                                                                                             C1         C2           C3           C4        Contexts
                                                                                      The star indicates that the term appears in the particular context.




        6 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol

Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                              th
                                                                   CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol

Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                                                ?
Conclusions and
future work


                                                              th
                                                                   CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol

Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance                          H: used in few identifiers
RQ2 – Relation to Faults
                                               CC: used in similar contexts
Conclusions and
future work


                                                                              th
                                                                                   CC
                                                                    Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol
                                              ?
Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                              th
                                                                   CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol
                                                H: used in many identifiers
Introduction                                   CC: used in similar contexts
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                                              th
                                                                                   CC
                                                                     Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol

Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study                                                              ?
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                              th
                                                                   CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol

Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                                                            H: used in few identifiers
Conclusions and                                           CC: used in different contexts
future work


                                                               th
                                                                    CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy
Giuliano Antoniol
                                                                        ?
Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                              th
                                                                   CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy                     H: used in many identifiers
Giuliano Antoniol                                         CC: used in different contexts


Introduction
                                    th
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                                 th
                                                                      CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy                     H: used in many identifiers
Giuliano Antoniol                                         CC: used in different contexts


Introduction
                                    th
                                                                                           !
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                                 th
                                                                      CC
                                                          Context Coverage




        7 / 16
Physical and
   Conceptual              Dispersion measures
    Identifier
    Dispersion             Aggregated metric - numHEHCC
      Venera
                           (3/3)
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
                               Entropy                     H: used in many identifiers
Giuliano Antoniol                                         CC: used in different contexts


Introduction
                                    th
                                                                                           !
                                         H
Our study

Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work


                                                                 th
                                                                      CC
                                                          Context Coverage

                                 For each entity, numHEHCC counts the number of
                                 such terms
        7 / 16
Physical and
   Conceptual              Our study - refined
    Identifier
    Dispersion             (1/2)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
Giuliano Antoniol
                           Research question 1
Introduction
                                   RQ1 – Metric Relevance: Does numHEHCC capture
Our study
                                   characteristics different from size?
Dispersion
measures                           Our believe: Yes it does, although we expect some
Our study - refined
                                   overlap.
Case study
RQ1 – Metric Relevance
                                   To this end, we verify the following:
RQ2 – Relation to Faults
                                    1. To what extend numHEHCC and size vary together.
Conclusions and
future work                         2. Can size explain numHEHCC ?
                                    3. Does numHEHCC bring additional information to size
                                       for fault explanation?




        8 / 16
Physical and
   Conceptual              Our study - refined
    Identifier
    Dispersion             (2/2)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
Giuliano Antoniol
                           Research question 2
Introduction

Our study                          RQ2 – Relation to Faults: Do term entropy and
Dispersion                         context coverage help to explain the presence of faults
measures
                                   in an entity?
Our study - refined

Case study
                                   Our believe: Yes it does!
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                                   How?
Conclusions and                     1. Estimate the risk of being faulty when entities contain
future work                            terms with high entropy and high context coverage.




        9 / 16
Physical and
   Conceptual              Objects
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
Giuliano Antoniol          Objects
Introduction                   ArgoUML v0.16 – a UML modeling CASE tool.
Our study
                               Rhino v1.4R3 – a JavaScript/ECMAScript interpreter
Dispersion
measures                       and compiler.
Our study - refined

Case study
RQ1 – Metric Relevance
                                     Program      LOC # Entities               # Terms
RQ2 – Relation to Faults
                                     ArgoUML 97,946           12,423                2517
Conclusions and
future work
                                     Rhino      18,163         1,624                 949
                                     We consider as entities both methods and attributes.




       10 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             RQ1 – Metric Relevance (1/3)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e           Results for RQ1 – Metric Relevance
    Gu´h´neuc,
      e e
Giuliano Antoniol
                                 To what extend numHEHCC and size vary together?
Introduction

Our study

Dispersion
                                     Correlation between numHEHCC and LOC
measures

Our study - refined                  ArgoUML: 40%
                                    Rhino: 43%
Case study                                                numHEHCC
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work                                                   LOC




       11 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             RQ1 – Metric Relevance (2/3)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e           Results for RQ1 – Metric Relevance
    Gu´h´neuc,
      e e
Giuliano Antoniol
                                 Can size explain numHEHCC ?
Introduction

Our study

Dispersion
measures
                                                          ArgoUML: 17%
Our study - refined                                        Rhino: 19%

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work




                                        Composition of numHEHCC.


       12 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             RQ1 – Metric Relevance (3/3)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
    Gu´h´neuc,
      e e
                e           Results for RQ1 – Metric Relevance (cont’d)
Giuliano Antoniol

Introduction
                                 Does numHEHCC bring additional information to size
Our study
                                 for fault explanation?
Dispersion
measures
                                              Variables      Coefficients   p-values
                                              Intercept      -1.688e+00      2e − 16
Our study - refined
                                              LOC            7.703e-03    8.34e − 10
Case study                         MArgoUML
RQ1 – Metric Relevance
                                              numHEHCC       7.490e-02    1.42e − 05
RQ2 – Relation to Faults                      LOC:numHEHCC   -2.819e-04   0.000211
Conclusions and
future work
                                              Intercept      -4.9625130      2e − 16
                                              LOC            0.0041486    0.17100
                                   MRhino
                                              numHEHCC       0.2446853    0.00310
                                              LOC:numHEHCC   -0.0004976   0.29788



       13 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             Results for RQ2 – Relation to Faults (1/1)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e                         The risk of being faulty when entities contain terms
Giuliano Antoniol
                                  with high entropy and high context coverage.
Introduction

Our study                               All entities
Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work




       14 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             Results for RQ2 – Relation to Faults (1/1)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e                         The risk of being faulty when entities contain terms
Giuliano Antoniol
                                  with high entropy and high context coverage.
Introduction

Our study                               All entities
Dispersion
measures

Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work




       14 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             Results for RQ2 – Relation to Faults (1/1)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e                         The risk of being faulty when entities contain terms
Giuliano Antoniol
                                  with high entropy and high context coverage.
Introduction

Our study                               All entities
Dispersion
measures

Our study - refined                                                      10% of the
                                                                         entities
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                                                             numHEHCC
Conclusions and
future work




       14 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             Results for RQ2 – Relation to Faults (1/1)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e                         The risk of being faulty when entities contain terms
Giuliano Antoniol
                                  with high entropy and high context coverage.
Introduction

Our study                               All entities
Dispersion
measures

Our study - refined                                                         10% of the
                                                                            entities
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                                                             numHEHCC
Conclusions and
future work

                                                                        Risk of being faulty?




       14 / 16
Physical and
   Conceptual              Case study
    Identifier
    Dispersion             Results for RQ2 – Relation to Faults (1/1)
      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e                         The risk of being faulty when entities contain terms
Giuliano Antoniol
                                  with high entropy and high context coverage.
Introduction

Our study                               All entities
Dispersion
measures

Our study - refined                                                         10% of the
                                                                            entities
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                                                             numHEHCC
Conclusions and
future work

                                                                        Risk of being faulty?
                                                                          ArgoUML: 2 x higher
                                                                          Rhino: 6 x higher




       14 / 16
Physical and
   Conceptual              Conclusions and future work
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e          Conclusions
    Gu´h´neuc,
      e e
Giuliano Antoniol              Entropy and context coverage, together, capture
Introduction
                               characteristics different from size!
Our study                      Entropy and context coverage, together, help to explain
Dispersion                     the presence of faults in entities!
measures

Our study - refined

Case study
                           Future directions
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                               Replicate the study to other systems.
Conclusions and                Use entropy and context coverage to suggest
future work
                               refactoring.
                               Study the impact of lexicon evolution on entropy and
                               context coverage.


       15 / 16
Physical and
   Conceptual              Thank you!
    Identifier
    Dispersion

      Venera
Arnaoudova, Laleh
 Eshkevari, Rocco
Oliveto, Yann-Ga¨l
                e
    Gu´h´neuc,
      e e
Giuliano Antoniol

Introduction

Our study

Dispersion
measures                                Questions?
Our study - refined

Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults


Conclusions and
future work




       16 / 16
Physical and
   Conceptual
                           Binkley, D., Davis, M., Lawrie, D., and Morrell, C.
    Identifier
    Dispersion
                           (2009).
      Venera
                           To CamelCase or Under score.
Arnaoudova, Laleh
 Eshkevari, Rocco
                           In Proceedings of 17th IEEE International Conference on
Oliveto, Yann-Ga¨l
    Gu´h´neuc,
      e e
                e          Program Comprehension. IEEE CS Press.
Giuliano Antoniol
                           Butler, S., Wermelinger, M., Yu, Y., and Sharp, H.
Introduction
                           (2009).
Our study
                           Relating identifier naming flaws and code quality: An
Dispersion
measures                   empirical study.
Our study - refined         In Proceedings of the 16th Working Conference on
Case study                 Reverse Engineering, pages 31–35. IEEE CS Press.
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                           Deissenboeck, F. and Pizka, M. (2006).
Conclusions and
future work                Concise and consistent naming.
                           Software Quality Journal, 14(3):261–282.
                           Gyim´thy, T., Ferenc, R., and Siket, I. (2005).
                                o
                           Empirical validation of object-oriented metrics on open
                           source software for fault prediction.
       16 / 16
Physical and            IEEE Transactions on Software Engineering,
   Conceptual
    Identifier
                           31(10):897–910.
    Dispersion

      Venera
                           Haiduc, S. and Marcus, A. (2008).
Arnaoudova, Laleh
 Eshkevari, Rocco
                           On the use of domain terms in source code.
Oliveto, Yann-Ga¨l
    Gu´h´neuc,
      e e
                e
                           In Proceedings of 16th IEEE International Conference on
Giuliano Antoniol          Program Comprehension, pages 113–122. IEEE CS
Introduction               Press.
Our study
                           Liu, Y., Poshyvanyk, D., Ferenc, R., Gyim´thy, T., and
                                                                     o
Dispersion
measures                   Chrisochoides, N. (2009).
Our study - refined         Modelling class cohesion as mixtures of latent topics.
Case study                 In Proceedings of 25th IEEE International Conference on
RQ1 – Metric Relevance
RQ2 – Relation to Faults
                           Software Maintenance, pages 233–242, Edmonton,
Conclusions and            Canada. IEEE CS Press.
future work
                           Marcus, A., Poshyvanyk, D., and Ferenc, R. (2008).
                           Using the conceptual cohesion of classes for fault
                           prediction in object-oriented systems.
                           IEEE Transactions on Software Engineering,
                           34(2):287–300.
       16 / 16
Physical and
   Conceptual
                           Poshyvanyk, D. and Marcus, A. (2006).
    Identifier
    Dispersion
                           The conceptual coupling metrics for object-oriented
      Venera
                           systems.
Arnaoudova, Laleh
 Eshkevari, Rocco
                           In Proceedings of 22nd IEEE International Conference on
Oliveto, Yann-Ga¨l
    Gu´h´neuc,
      e e
                e          Software Maintenance, pages 469 – 478. IEEE CS Press.
Giuliano Antoniol
                           Takang, A., Grubb, P., and Macredie, R. (1996).
Introduction
                           The effects of comments and identifier names on
Our study
                           program comprehensibility: an experiential study.
Dispersion
measures                   Journal of Program Languages, 4(3):143–167.
Our study - refined
                           Zimmermann, T., Premraj, R., and Zeller, A. (2007).
Case study
RQ1 – Metric Relevance     Predicting defects for eclipse.
RQ2 – Relation to Faults
                           In Proceedings of the Third International Workshop on
Conclusions and
future work                Predictor Models in Software Engineering.




       16 / 16

More Related Content

Viewers also liked

στρατηγικές για δεπ υ - ζαφειριάδης
στρατηγικές για δεπ υ - ζαφειριάδηςστρατηγικές για δεπ υ - ζαφειριάδης
στρατηγικές για δεπ υ - ζαφειριάδης
2epalkav
 
Pandamic Procedure
Pandamic ProcedurePandamic Procedure
Pandamic Procedure
Paul Breneol
 

Viewers also liked (7)

στρατηγικές για δεπ υ - ζαφειριάδης
στρατηγικές για δεπ υ - ζαφειριάδηςστρατηγικές για δεπ υ - ζαφειριάδης
στρατηγικές για δεπ υ - ζαφειριάδης
 
Purver pdf
Purver pdfPurver pdf
Purver pdf
 
Pandamic Procedure
Pandamic ProcedurePandamic Procedure
Pandamic Procedure
 
Arduino & Internet of Things - First Step
Arduino & Internet of Things - First StepArduino & Internet of Things - First Step
Arduino & Internet of Things - First Step
 
Software Design Patterns in Theory
Software Design Patterns in TheorySoftware Design Patterns in Theory
Software Design Patterns in Theory
 
Software Design Patterns in Practice
Software Design Patterns in PracticeSoftware Design Patterns in Practice
Software Design Patterns in Practice
 
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future ChallengesAsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
 

Similar to ICSM10a.ppt

Impact of design complexity on software quality - A systematic review
Impact of design complexity on software quality - A systematic reviewImpact of design complexity on software quality - A systematic review
Impact of design complexity on software quality - A systematic review
Anh Nguyen Duc
 
Nick.chrissotimos
Nick.chrissotimosNick.chrissotimos
Nick.chrissotimos
NASAPMC
 
3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx
grssieee
 
Detail description of scientific publications
Detail description of scientific publicationsDetail description of scientific publications
Detail description of scientific publications
Manolis Vavalis
 
Technical Seminar PPT
Technical Seminar PPTTechnical Seminar PPT
Technical Seminar PPT
Kshitiz_Vj
 

Similar to ICSM10a.ppt (11)

Icsm10a.ppt
Icsm10a.pptIcsm10a.ppt
Icsm10a.ppt
 
MSR11.ppt
MSR11.pptMSR11.ppt
MSR11.ppt
 
Impact of design complexity on software quality - A systematic review
Impact of design complexity on software quality - A systematic reviewImpact of design complexity on software quality - A systematic review
Impact of design complexity on software quality - A systematic review
 
Nick.chrissotimos
Nick.chrissotimosNick.chrissotimos
Nick.chrissotimos
 
Extended Analysis of Cauchy’s Inequality
Extended Analysis of Cauchy’s InequalityExtended Analysis of Cauchy’s Inequality
Extended Analysis of Cauchy’s Inequality
 
(2007) Performance Analysis for Multi Sensor Fingerprint Recognition System
(2007) Performance Analysis for Multi Sensor Fingerprint Recognition System(2007) Performance Analysis for Multi Sensor Fingerprint Recognition System
(2007) Performance Analysis for Multi Sensor Fingerprint Recognition System
 
Iciap 2
Iciap 2Iciap 2
Iciap 2
 
3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx3 IGARSS2011_Pasolli_Final.pptx
3 IGARSS2011_Pasolli_Final.pptx
 
Detail description of scientific publications
Detail description of scientific publicationsDetail description of scientific publications
Detail description of scientific publications
 
Technical Seminar PPT
Technical Seminar PPTTechnical Seminar PPT
Technical Seminar PPT
 
Multivariate Analysis of Cauchy’s Inequality
Multivariate Analysis of Cauchy’s InequalityMultivariate Analysis of Cauchy’s Inequality
Multivariate Analysis of Cauchy’s Inequality
 

More from Ptidej Team

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

ICSM10a.ppt

  • 1. Physical and Conceptual Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Physical and Conceptual Identifier Dispersion: Gu´h´neuc, e e Giuliano Antoniol Measures and Relation to Fault Proneness Introduction Our study Dispersion Venera Arnaoudova Laleh Eshkevari Rocco Oliveto measures Yann-Ga¨l Gu´h´neuc Giuliano Antoniol e e e Our study - refined Case study ´ SOCCER Lab. – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada e RQ1 – Metric Relevance SE@SA Lab – DMI, University of Salerno - Salerno - Italy RQ2 – Relation to Faults ´ Ptidej Team – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada e Conclusions and future work September 15, 2010 SOftware Cost-effective Change and Evolution Research Lab Software Engineering @ SAlerno Pattern Trace Identification, Detection, and Enhancement in Java
  • 2. Physical and Conceptual Outline Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Introduction Gu´h´neuc, e e Giuliano Antoniol Our study Introduction Our study Dispersion measures Dispersion measures Our study - refined Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Case study Conclusions and future work RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work 2 / 16
  • 3. Physical and Conceptual Introduction Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Fault identification Giuliano Antoniol size (e.g., [Gyim´thy et al., 2005]) o Introduction cohesion (e.g., [Liu et al., 2009]) Our study coupling (e.g., [Marcus et al., 2008]) Dispersion measures number of changes (e.g., [Zimmermann et al., 2007]) Our study - refined Importance of linguistic information Case study RQ1 – Metric Relevance program comprehension (e.g., RQ2 – Relation to Faults [Takang et al., 1996, Deissenboeck and Pizka, 2006, Conclusions and future work Haiduc and Marcus, 2008, Binkley et al., 2009]) code quality (e.g., [Marcus et al., 2008, Poshyvanyk and Marcus, 2006, Butler et al., 2009]) 3 / 16
  • 4. Physical and Conceptual Our study Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l Gu´h´neuc, e e e Term dispersion Giuliano Antoniol We are interested in studying the relation between term Introduction dispersion and the quality of the source code. Our study Dispersion term basic component of identifiers measures dispersion the way terms are scattered among different Our study - refined entities (attributes and methods) Case study RQ1 – Metric Relevance quality absence of faults RQ2 – Relation to Faults Conclusions and future work Example: What is the impact of using getRelativePath, returnAbsolutePath, and setPath as method names on the fault proneness of those methods? 4 / 16
  • 5. Physical and Conceptual Dispersion measures Identifier Dispersion (1/3) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l Gu´h´neuc, e e e Physical dispersion - Entropy Giuliano Antoniol Terms Introduction Entropy Our study Dispersion fee measures Our study - refined Case study foo RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work bar Entities E1 E2 E3 E4 E5 The circle indicates the occurrences of a term in an entity. The higher the size of the circle the higher the number of occurrences. 5 / 16
  • 6. Physical and Conceptual Dispersion measures Identifier Dispersion (2/3) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Conceptual dispersion - Context Coverage Gu´h´neuc, e e Giuliano Antoniol Entity Contexts C4 Introduction C1 E4 C2 Our study E1 E2 Dispersion measures E5 Terms Context Our study - refined E3 coverage Case study C3 RQ1 – Metric Relevance fee RQ2 – Relation to Faults Entity contexts are identified taking into account the terms contained in the entities. Conclusions and foo future work bar C1 C2 C3 C4 Contexts The star indicates that the term appears in the particular context. 6 / 16
  • 7. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 8. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults ? Conclusions and future work th CC Context Coverage 7 / 16
  • 9. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance H: used in few identifiers RQ2 – Relation to Faults CC: used in similar contexts Conclusions and future work th CC Context Coverage 7 / 16
  • 10. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol ? Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 11. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol H: used in many identifiers Introduction CC: used in similar contexts th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 12. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol Introduction th H Our study Dispersion measures Our study - refined Case study ? RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 13. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults H: used in few identifiers Conclusions and CC: used in different contexts future work th CC Context Coverage 7 / 16
  • 14. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy Giuliano Antoniol ? Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 15. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy H: used in many identifiers Giuliano Antoniol CC: used in different contexts Introduction th H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 16. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy H: used in many identifiers Giuliano Antoniol CC: used in different contexts Introduction th ! H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage 7 / 16
  • 17. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3) Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy H: used in many identifiers Giuliano Antoniol CC: used in different contexts Introduction th ! H Our study Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work th CC Context Coverage For each entity, numHEHCC counts the number of such terms 7 / 16
  • 18. Physical and Conceptual Our study - refined Identifier Dispersion (1/2) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Giuliano Antoniol Research question 1 Introduction RQ1 – Metric Relevance: Does numHEHCC capture Our study characteristics different from size? Dispersion measures Our believe: Yes it does, although we expect some Our study - refined overlap. Case study RQ1 – Metric Relevance To this end, we verify the following: RQ2 – Relation to Faults 1. To what extend numHEHCC and size vary together. Conclusions and future work 2. Can size explain numHEHCC ? 3. Does numHEHCC bring additional information to size for fault explanation? 8 / 16
  • 19. Physical and Conceptual Our study - refined Identifier Dispersion (2/2) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Giuliano Antoniol Research question 2 Introduction Our study RQ2 – Relation to Faults: Do term entropy and Dispersion context coverage help to explain the presence of faults measures in an entity? Our study - refined Case study Our believe: Yes it does! RQ1 – Metric Relevance RQ2 – Relation to Faults How? Conclusions and 1. Estimate the risk of being faulty when entities contain future work terms with high entropy and high context coverage. 9 / 16
  • 20. Physical and Conceptual Objects Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Giuliano Antoniol Objects Introduction ArgoUML v0.16 – a UML modeling CASE tool. Our study Rhino v1.4R3 – a JavaScript/ECMAScript interpreter Dispersion measures and compiler. Our study - refined Case study RQ1 – Metric Relevance Program LOC # Entities # Terms RQ2 – Relation to Faults ArgoUML 97,946 12,423 2517 Conclusions and future work Rhino 18,163 1,624 949 We consider as entities both methods and attributes. 10 / 16
  • 21. Physical and Conceptual Case study Identifier Dispersion RQ1 – Metric Relevance (1/3) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Results for RQ1 – Metric Relevance Gu´h´neuc, e e Giuliano Antoniol To what extend numHEHCC and size vary together? Introduction Our study Dispersion Correlation between numHEHCC and LOC measures Our study - refined ArgoUML: 40% Rhino: 43% Case study numHEHCC RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work LOC 11 / 16
  • 22. Physical and Conceptual Case study Identifier Dispersion RQ1 – Metric Relevance (2/3) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Results for RQ1 – Metric Relevance Gu´h´neuc, e e Giuliano Antoniol Can size explain numHEHCC ? Introduction Our study Dispersion measures ArgoUML: 17% Our study - refined Rhino: 19% Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work Composition of numHEHCC. 12 / 16
  • 23. Physical and Conceptual Case study Identifier Dispersion RQ1 – Metric Relevance (3/3) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l Gu´h´neuc, e e e Results for RQ1 – Metric Relevance (cont’d) Giuliano Antoniol Introduction Does numHEHCC bring additional information to size Our study for fault explanation? Dispersion measures Variables Coefficients p-values Intercept -1.688e+00 2e − 16 Our study - refined LOC 7.703e-03 8.34e − 10 Case study MArgoUML RQ1 – Metric Relevance numHEHCC 7.490e-02 1.42e − 05 RQ2 – Relation to Faults LOC:numHEHCC -2.819e-04 0.000211 Conclusions and future work Intercept -4.9625130 2e − 16 LOC 0.0041486 0.17100 MRhino numHEHCC 0.2446853 0.00310 LOC:numHEHCC -0.0004976 0.29788 13 / 16
  • 24. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain terms Giuliano Antoniol with high entropy and high context coverage. Introduction Our study All entities Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work 14 / 16
  • 25. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain terms Giuliano Antoniol with high entropy and high context coverage. Introduction Our study All entities Dispersion measures Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work 14 / 16
  • 26. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain terms Giuliano Antoniol with high entropy and high context coverage. Introduction Our study All entities Dispersion measures Our study - refined 10% of the entities Case study RQ1 – Metric Relevance RQ2 – Relation to Faults numHEHCC Conclusions and future work 14 / 16
  • 27. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain terms Giuliano Antoniol with high entropy and high context coverage. Introduction Our study All entities Dispersion measures Our study - refined 10% of the entities Case study RQ1 – Metric Relevance RQ2 – Relation to Faults numHEHCC Conclusions and future work Risk of being faulty? 14 / 16
  • 28. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain terms Giuliano Antoniol with high entropy and high context coverage. Introduction Our study All entities Dispersion measures Our study - refined 10% of the entities Case study RQ1 – Metric Relevance RQ2 – Relation to Faults numHEHCC Conclusions and future work Risk of being faulty? ArgoUML: 2 x higher Rhino: 6 x higher 14 / 16
  • 29. Physical and Conceptual Conclusions and future work Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Conclusions Gu´h´neuc, e e Giuliano Antoniol Entropy and context coverage, together, capture Introduction characteristics different from size! Our study Entropy and context coverage, together, help to explain Dispersion the presence of faults in entities! measures Our study - refined Case study Future directions RQ1 – Metric Relevance RQ2 – Relation to Faults Replicate the study to other systems. Conclusions and Use entropy and context coverage to suggest future work refactoring. Study the impact of lexicon evolution on entropy and context coverage. 15 / 16
  • 30. Physical and Conceptual Thank you! Identifier Dispersion Venera Arnaoudova, Laleh Eshkevari, Rocco Oliveto, Yann-Ga¨l e Gu´h´neuc, e e Giuliano Antoniol Introduction Our study Dispersion measures Questions? Our study - refined Case study RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work 16 / 16
  • 31. Physical and Conceptual Binkley, D., Davis, M., Lawrie, D., and Morrell, C. Identifier Dispersion (2009). Venera To CamelCase or Under score. Arnaoudova, Laleh Eshkevari, Rocco In Proceedings of 17th IEEE International Conference on Oliveto, Yann-Ga¨l Gu´h´neuc, e e e Program Comprehension. IEEE CS Press. Giuliano Antoniol Butler, S., Wermelinger, M., Yu, Y., and Sharp, H. Introduction (2009). Our study Relating identifier naming flaws and code quality: An Dispersion measures empirical study. Our study - refined In Proceedings of the 16th Working Conference on Case study Reverse Engineering, pages 31–35. IEEE CS Press. RQ1 – Metric Relevance RQ2 – Relation to Faults Deissenboeck, F. and Pizka, M. (2006). Conclusions and future work Concise and consistent naming. Software Quality Journal, 14(3):261–282. Gyim´thy, T., Ferenc, R., and Siket, I. (2005). o Empirical validation of object-oriented metrics on open source software for fault prediction. 16 / 16
  • 32. Physical and IEEE Transactions on Software Engineering, Conceptual Identifier 31(10):897–910. Dispersion Venera Haiduc, S. and Marcus, A. (2008). Arnaoudova, Laleh Eshkevari, Rocco On the use of domain terms in source code. Oliveto, Yann-Ga¨l Gu´h´neuc, e e e In Proceedings of 16th IEEE International Conference on Giuliano Antoniol Program Comprehension, pages 113–122. IEEE CS Introduction Press. Our study Liu, Y., Poshyvanyk, D., Ferenc, R., Gyim´thy, T., and o Dispersion measures Chrisochoides, N. (2009). Our study - refined Modelling class cohesion as mixtures of latent topics. Case study In Proceedings of 25th IEEE International Conference on RQ1 – Metric Relevance RQ2 – Relation to Faults Software Maintenance, pages 233–242, Edmonton, Conclusions and Canada. IEEE CS Press. future work Marcus, A., Poshyvanyk, D., and Ferenc, R. (2008). Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Transactions on Software Engineering, 34(2):287–300. 16 / 16
  • 33. Physical and Conceptual Poshyvanyk, D. and Marcus, A. (2006). Identifier Dispersion The conceptual coupling metrics for object-oriented Venera systems. Arnaoudova, Laleh Eshkevari, Rocco In Proceedings of 22nd IEEE International Conference on Oliveto, Yann-Ga¨l Gu´h´neuc, e e e Software Maintenance, pages 469 – 478. IEEE CS Press. Giuliano Antoniol Takang, A., Grubb, P., and Macredie, R. (1996). Introduction The effects of comments and identifier names on Our study program comprehensibility: an experiential study. Dispersion measures Journal of Program Languages, 4(3):143–167. Our study - refined Zimmermann, T., Premraj, R., and Zeller, A. (2007). Case study RQ1 – Metric Relevance Predicting defects for eclipse. RQ2 – Relation to Faults In Proceedings of the Third International Workshop on Conclusions and future work Predictor Models in Software Engineering. 16 / 16