Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
ICSM10a.ppt
1. Physical and
Conceptual
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e Physical and Conceptual Identifier Dispersion:
Gu´h´neuc,
e e
Giuliano Antoniol
Measures and Relation to Fault Proneness
Introduction
Our study
Dispersion
Venera Arnaoudova Laleh Eshkevari Rocco Oliveto
measures Yann-Ga¨l Gu´h´neuc Giuliano Antoniol
e e e
Our study - refined
Case study ´
SOCCER Lab. – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada
e
RQ1 – Metric Relevance
SE@SA Lab – DMI, University of Salerno - Salerno - Italy
RQ2 – Relation to Faults
´
Ptidej Team – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada
e
Conclusions and
future work
September 15, 2010
SOftware Cost-effective Change and Evolution Research Lab
Software Engineering @ SAlerno
Pattern Trace Identification, Detection, and Enhancement in Java
2. Physical and
Conceptual Outline
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Introduction
Gu´h´neuc,
e e
Giuliano Antoniol
Our study
Introduction
Our study
Dispersion measures
Dispersion
measures
Our study - refined Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults Case study
Conclusions and
future work
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and future work
2 / 16
3. Physical and
Conceptual Introduction
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e Fault identification
Giuliano Antoniol
size (e.g., [Gyim´thy et al., 2005])
o
Introduction
cohesion (e.g., [Liu et al., 2009])
Our study
coupling (e.g., [Marcus et al., 2008])
Dispersion
measures
number of changes (e.g., [Zimmermann et al., 2007])
Our study - refined Importance of linguistic information
Case study
RQ1 – Metric Relevance program comprehension (e.g.,
RQ2 – Relation to Faults
[Takang et al., 1996, Deissenboeck and Pizka, 2006,
Conclusions and
future work
Haiduc and Marcus, 2008, Binkley et al., 2009])
code quality (e.g., [Marcus et al., 2008,
Poshyvanyk and Marcus, 2006, Butler et al., 2009])
3 / 16
4. Physical and
Conceptual Our study
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
Gu´h´neuc,
e e
e Term dispersion
Giuliano Antoniol
We are interested in studying the relation between term
Introduction dispersion and the quality of the source code.
Our study
Dispersion
term basic component of identifiers
measures
dispersion the way terms are scattered among different
Our study - refined
entities (attributes and methods)
Case study
RQ1 – Metric Relevance quality absence of faults
RQ2 – Relation to Faults
Conclusions and
future work Example: What is the impact of using getRelativePath,
returnAbsolutePath, and setPath as method names on
the fault proneness of those methods?
4 / 16
5. Physical and
Conceptual Dispersion measures
Identifier
Dispersion (1/3)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
Gu´h´neuc,
e e
e Physical dispersion - Entropy
Giuliano Antoniol
Terms
Introduction Entropy
Our study
Dispersion fee
measures
Our study - refined
Case study foo
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
bar
Entities
E1 E2 E3 E4 E5
The circle indicates the occurrences of a term in an entity.
The higher the size of the circle the higher the number of occurrences.
5 / 16
6. Physical and
Conceptual Dispersion measures
Identifier
Dispersion (2/3)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e Conceptual dispersion - Context Coverage
Gu´h´neuc,
e e
Giuliano Antoniol Entity Contexts
C4
Introduction
C1 E4 C2
Our study
E1 E2
Dispersion
measures E5
Terms Context
Our study - refined E3 coverage
Case study C3
RQ1 – Metric Relevance fee
RQ2 – Relation to Faults Entity contexts are identified taking into account
the terms contained in the entities.
Conclusions and
foo
future work
bar
C1 C2 C3 C4 Contexts
The star indicates that the term appears in the particular context.
6 / 16
7. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
8. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
?
Conclusions and
future work
th
CC
Context Coverage
7 / 16
9. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance H: used in few identifiers
RQ2 – Relation to Faults
CC: used in similar contexts
Conclusions and
future work
th
CC
Context Coverage
7 / 16
10. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
?
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
11. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
H: used in many identifiers
Introduction CC: used in similar contexts
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
12. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study ?
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
13. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
H: used in few identifiers
Conclusions and CC: used in different contexts
future work
th
CC
Context Coverage
7 / 16
14. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy
Giuliano Antoniol
?
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
15. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy H: used in many identifiers
Giuliano Antoniol CC: used in different contexts
Introduction
th
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
16. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy H: used in many identifiers
Giuliano Antoniol CC: used in different contexts
Introduction
th
!
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
7 / 16
17. Physical and
Conceptual Dispersion measures
Identifier
Dispersion Aggregated metric - numHEHCC
Venera
(3/3)
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Entropy H: used in many identifiers
Giuliano Antoniol CC: used in different contexts
Introduction
th
!
H
Our study
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
th
CC
Context Coverage
For each entity, numHEHCC counts the number of
such terms
7 / 16
18. Physical and
Conceptual Our study - refined
Identifier
Dispersion (1/2)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Giuliano Antoniol
Research question 1
Introduction
RQ1 – Metric Relevance: Does numHEHCC capture
Our study
characteristics different from size?
Dispersion
measures Our believe: Yes it does, although we expect some
Our study - refined
overlap.
Case study
RQ1 – Metric Relevance
To this end, we verify the following:
RQ2 – Relation to Faults
1. To what extend numHEHCC and size vary together.
Conclusions and
future work 2. Can size explain numHEHCC ?
3. Does numHEHCC bring additional information to size
for fault explanation?
8 / 16
19. Physical and
Conceptual Our study - refined
Identifier
Dispersion (2/2)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Giuliano Antoniol
Research question 2
Introduction
Our study RQ2 – Relation to Faults: Do term entropy and
Dispersion context coverage help to explain the presence of faults
measures
in an entity?
Our study - refined
Case study
Our believe: Yes it does!
RQ1 – Metric Relevance
RQ2 – Relation to Faults
How?
Conclusions and 1. Estimate the risk of being faulty when entities contain
future work terms with high entropy and high context coverage.
9 / 16
20. Physical and
Conceptual Objects
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Giuliano Antoniol Objects
Introduction ArgoUML v0.16 – a UML modeling CASE tool.
Our study
Rhino v1.4R3 – a JavaScript/ECMAScript interpreter
Dispersion
measures and compiler.
Our study - refined
Case study
RQ1 – Metric Relevance
Program LOC # Entities # Terms
RQ2 – Relation to Faults
ArgoUML 97,946 12,423 2517
Conclusions and
future work
Rhino 18,163 1,624 949
We consider as entities both methods and attributes.
10 / 16
21. Physical and
Conceptual Case study
Identifier
Dispersion RQ1 – Metric Relevance (1/3)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e Results for RQ1 – Metric Relevance
Gu´h´neuc,
e e
Giuliano Antoniol
To what extend numHEHCC and size vary together?
Introduction
Our study
Dispersion
Correlation between numHEHCC and LOC
measures
Our study - refined ArgoUML: 40%
Rhino: 43%
Case study numHEHCC
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work LOC
11 / 16
22. Physical and
Conceptual Case study
Identifier
Dispersion RQ1 – Metric Relevance (2/3)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e Results for RQ1 – Metric Relevance
Gu´h´neuc,
e e
Giuliano Antoniol
Can size explain numHEHCC ?
Introduction
Our study
Dispersion
measures
ArgoUML: 17%
Our study - refined Rhino: 19%
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
Composition of numHEHCC.
12 / 16
23. Physical and
Conceptual Case study
Identifier
Dispersion RQ1 – Metric Relevance (3/3)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
Gu´h´neuc,
e e
e Results for RQ1 – Metric Relevance (cont’d)
Giuliano Antoniol
Introduction
Does numHEHCC bring additional information to size
Our study
for fault explanation?
Dispersion
measures
Variables Coefficients p-values
Intercept -1.688e+00 2e − 16
Our study - refined
LOC 7.703e-03 8.34e − 10
Case study MArgoUML
RQ1 – Metric Relevance
numHEHCC 7.490e-02 1.42e − 05
RQ2 – Relation to Faults LOC:numHEHCC -2.819e-04 0.000211
Conclusions and
future work
Intercept -4.9625130 2e − 16
LOC 0.0041486 0.17100
MRhino
numHEHCC 0.2446853 0.00310
LOC:numHEHCC -0.0004976 0.29788
13 / 16
24. Physical and
Conceptual Case study
Identifier
Dispersion Results for RQ2 – Relation to Faults (1/1)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e The risk of being faulty when entities contain terms
Giuliano Antoniol
with high entropy and high context coverage.
Introduction
Our study All entities
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
14 / 16
25. Physical and
Conceptual Case study
Identifier
Dispersion Results for RQ2 – Relation to Faults (1/1)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e The risk of being faulty when entities contain terms
Giuliano Antoniol
with high entropy and high context coverage.
Introduction
Our study All entities
Dispersion
measures
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
14 / 16
26. Physical and
Conceptual Case study
Identifier
Dispersion Results for RQ2 – Relation to Faults (1/1)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e The risk of being faulty when entities contain terms
Giuliano Antoniol
with high entropy and high context coverage.
Introduction
Our study All entities
Dispersion
measures
Our study - refined 10% of the
entities
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
numHEHCC
Conclusions and
future work
14 / 16
27. Physical and
Conceptual Case study
Identifier
Dispersion Results for RQ2 – Relation to Faults (1/1)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e The risk of being faulty when entities contain terms
Giuliano Antoniol
with high entropy and high context coverage.
Introduction
Our study All entities
Dispersion
measures
Our study - refined 10% of the
entities
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
numHEHCC
Conclusions and
future work
Risk of being faulty?
14 / 16
28. Physical and
Conceptual Case study
Identifier
Dispersion Results for RQ2 – Relation to Faults (1/1)
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e The risk of being faulty when entities contain terms
Giuliano Antoniol
with high entropy and high context coverage.
Introduction
Our study All entities
Dispersion
measures
Our study - refined 10% of the
entities
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
numHEHCC
Conclusions and
future work
Risk of being faulty?
ArgoUML: 2 x higher
Rhino: 6 x higher
14 / 16
29. Physical and
Conceptual Conclusions and future work
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e Conclusions
Gu´h´neuc,
e e
Giuliano Antoniol Entropy and context coverage, together, capture
Introduction
characteristics different from size!
Our study Entropy and context coverage, together, help to explain
Dispersion the presence of faults in entities!
measures
Our study - refined
Case study
Future directions
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Replicate the study to other systems.
Conclusions and Use entropy and context coverage to suggest
future work
refactoring.
Study the impact of lexicon evolution on entropy and
context coverage.
15 / 16
30. Physical and
Conceptual Thank you!
Identifier
Dispersion
Venera
Arnaoudova, Laleh
Eshkevari, Rocco
Oliveto, Yann-Ga¨l
e
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction
Our study
Dispersion
measures Questions?
Our study - refined
Case study
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Conclusions and
future work
16 / 16
31. Physical and
Conceptual
Binkley, D., Davis, M., Lawrie, D., and Morrell, C.
Identifier
Dispersion
(2009).
Venera
To CamelCase or Under score.
Arnaoudova, Laleh
Eshkevari, Rocco
In Proceedings of 17th IEEE International Conference on
Oliveto, Yann-Ga¨l
Gu´h´neuc,
e e
e Program Comprehension. IEEE CS Press.
Giuliano Antoniol
Butler, S., Wermelinger, M., Yu, Y., and Sharp, H.
Introduction
(2009).
Our study
Relating identifier naming flaws and code quality: An
Dispersion
measures empirical study.
Our study - refined In Proceedings of the 16th Working Conference on
Case study Reverse Engineering, pages 31–35. IEEE CS Press.
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Deissenboeck, F. and Pizka, M. (2006).
Conclusions and
future work Concise and consistent naming.
Software Quality Journal, 14(3):261–282.
Gyim´thy, T., Ferenc, R., and Siket, I. (2005).
o
Empirical validation of object-oriented metrics on open
source software for fault prediction.
16 / 16
32. Physical and IEEE Transactions on Software Engineering,
Conceptual
Identifier
31(10):897–910.
Dispersion
Venera
Haiduc, S. and Marcus, A. (2008).
Arnaoudova, Laleh
Eshkevari, Rocco
On the use of domain terms in source code.
Oliveto, Yann-Ga¨l
Gu´h´neuc,
e e
e
In Proceedings of 16th IEEE International Conference on
Giuliano Antoniol Program Comprehension, pages 113–122. IEEE CS
Introduction Press.
Our study
Liu, Y., Poshyvanyk, D., Ferenc, R., Gyim´thy, T., and
o
Dispersion
measures Chrisochoides, N. (2009).
Our study - refined Modelling class cohesion as mixtures of latent topics.
Case study In Proceedings of 25th IEEE International Conference on
RQ1 – Metric Relevance
RQ2 – Relation to Faults
Software Maintenance, pages 233–242, Edmonton,
Conclusions and Canada. IEEE CS Press.
future work
Marcus, A., Poshyvanyk, D., and Ferenc, R. (2008).
Using the conceptual cohesion of classes for fault
prediction in object-oriented systems.
IEEE Transactions on Software Engineering,
34(2):287–300.
16 / 16
33. Physical and
Conceptual
Poshyvanyk, D. and Marcus, A. (2006).
Identifier
Dispersion
The conceptual coupling metrics for object-oriented
Venera
systems.
Arnaoudova, Laleh
Eshkevari, Rocco
In Proceedings of 22nd IEEE International Conference on
Oliveto, Yann-Ga¨l
Gu´h´neuc,
e e
e Software Maintenance, pages 469 – 478. IEEE CS Press.
Giuliano Antoniol
Takang, A., Grubb, P., and Macredie, R. (1996).
Introduction
The effects of comments and identifier names on
Our study
program comprehensibility: an experiential study.
Dispersion
measures Journal of Program Languages, 4(3):143–167.
Our study - refined
Zimmermann, T., Premraj, R., and Zeller, A. (2007).
Case study
RQ1 – Metric Relevance Predicting defects for eclipse.
RQ2 – Relation to Faults
In Proceedings of the Third International Workshop on
Conclusions and
future work Predictor Models in Software Engineering.
16 / 16