This document discusses using software metrics like lines of code (SLOC) to predict defects and compares the correlations between SLOC and defects for three Java projects. It also explores using economic inequality indices like the Gini index, Theil index, and Atkinson index applied to SLOC distributions to measure inequality in software quality within a codebase. Finally, it proposes that decomposing projects into packages or classes and applying these indices could help identify specific classes that contribute disproportionately to overall quality inequality.
1. SLOC and defect prediction
/ department of mathematics and computer science
2. 2
By no means:
A study on aggregating
software metrics
Bogdan Vasilescu
Alexander Serebrenik
Mark van den Brand
May 20, 2011
Where innovation starts
3. Methodology 3/5
Issue tracker Software system
0
1
Version control system
r3780 | kataka | 2003-04-12 00:43:24 +0200 (za, 12 apr 2003) | 2 lines
Changed paths:
M /argouml/model/uml/modelmanagement/ModelManagementHelper.java
M /argouml/uml/ui/foundation/core/ActionSetParameterType.java 2
Fixed issue 1544
------------------------------------------------------------------------
r3769 | alexb | 2003-04-11 11:27:55 +0200 (vr, 11 apr 2003) | 4 lines
Changed paths:
M /argouml/uml/ui/foundation/core/PropPanelClass.java
M /argouml/uml/ui/foundation/core/PropPanelInterface.java 1
fix for
1
Issue number: 1736
/ department of mathematics and computer science
5. 5/5
The aggregation technique
influences the correlation.
Mean, median are inconsistent.
/ department of mathematics and computer science
6. Emerging trend 6/5
/ department of mathematics and computer science
7. Inequality indices 7/5
Econometrics: measure/explain the inequality of income or wealth.
/ department of mathematics and computer science
8. Inequality indices 8/5
Econometrics: measure/explain the inequality of income or wealth.
Software metrics and econometric variables have distributions with
similar shapes.
Household income in Ilocos, the Phillippines (1998) hibernate−3.6.0−beta4: org.hibernate.criterion
600
35
25
400
Frequency
Frequency
15
200
5
0
0
0 500000 1000000 1500000 2000000 2500000 0 50 100 150 200 250 300
Income SLOC
/ department of mathematics and computer science
9. Inequality indices 8/5
Econometrics: measure/explain the inequality of income or wealth.
Software metrics and econometric variables have distributions with
similar shapes.
Household income in Ilocos, the Phillippines (1998) hibernate−3.6.0−beta4: org.hibernate.criterion
600
35
25
400
Frequency
Frequency
15
200
5
0
0
0 500000 1000000 1500000 2000000 2500000 0 50 100 150 200 250 300
Income SLOC
Inequality in quality = low quality !
/ department of mathematics and computer science
10. Inequality indices and software metrics 9/5
/ department of mathematics and computer science
11. Inequality indices and software metrics 9/5
Decomposable indices (partition the population into MECE groups):
which partition provides the best explanation for the inequality?
/ department of mathematics and computer science
12. Traceability via decomposability 10/5
Which individuals (classes in package) contribute to 80% of the
inequality (of SLOC)?
Which class contributes the most to the inequality?
/ department of mathematics and computer science