Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
1. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
2. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Martin
Poehlmann
Elmar
Juergens
3. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Martin
Poehlmann
Elmar
Juergens
Audris
Mockus
4. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Bram
Adams
Ahmed E.
Hassan
Martin
Poehlmann
Elmar
Juergens
Audris
Mockus
5. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Bram
Adams
Ahmed E.
Hassan
Martin
Poehlmann
Elmar
Juergens
Audris
Mockus
Brigitte
Haupt
Christian
Wagner
21. “...nothing can be
said to be certain,
except death and
taxes”
- Benjamin Franklin
The Build “Tax”
An Empirical Study of Build
Maintenance Effort!
S. McIntosh, B. Adams, T. H. D.
Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source
changes require build
changes, too!
11
22. “...nothing can be
said to be certain,
except death and
taxes”
- Benjamin Franklin
The Build “Tax”
An Empirical Study of Build
Maintenance Effort!
S. McIntosh, B. Adams, T. H. D.
Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source
changes require build
changes, too!
How do practitioners cope
with build maintenance?
11
28. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
13
29. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
13
30. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
13
31. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
Clone coverageof 94%-99%
13
32. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
Clone coverageof 94%-99%
Inflation due to cloning of
10x-23x
13
33. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
Clone coverageof 94%-99%
Inflation due to cloning of
10x-23x
Build changes manuallyduplicated 30 times
13
39. 14
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
40. 14
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
45. 15
Large collection of open source
3,872 projects
2,597 C/C++ projects
Autotools CMake
1,275 Java projects
Ant Maven
15
46. 16
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
55. 19
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
56. 19
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical>50% clone
coverage
is common
<30% clone
coverage
is common
57. 20
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical>50% clone
coverage
is common
<30% clone
coverage
is common
64. Manual analysis of a statistically
representative sample of clones
Autotools CMakeAnt Maven Total
56,521 71,543 23,723 3,746All clones 155,533
Sample!
(95%±5%)
382 382 378 349 1,491
Config.
Const.
Cert.
Pkg.
Depl.
32% 79% 22% 40%
64% 17% 56% 66%
12% 4% 13% 11%
25% 21% 21% 2%
11% 1% 9% 7%
Cloning shifts from
construction to
configuration
21
65. Manual analysis of a statistically
representative sample of clones
Autotools CMakeAnt Maven Total
56,521 71,543 23,723 3,746All clones 155,533
Sample!
(95%±5%)
382 382 378 349 1,491
Config.
Const.
Cert.
Pkg.
Depl.
32% 79% 22% 40%
64% 17% 56% 66%
12% 4% 13% 11%
25% 21% 21% 2%
11% 1% 9% 7%
Cloning shifts from
construction to
configuration
Construction is the
most heavily cloned
C/C++ build phase
21
66. Manual analysis of a statistically
representative sample of clones
Autotools CMakeAnt Maven Total
56,521 71,543 23,723 3,746All clones 155,533
Sample!
(95%±5%)
382 382 378 349 1,491
Config.
Const.
Cert.
Pkg.
Depl.
32% 79% 22% 40%
64% 17% 56% 66%
12% 4% 13% 11%
25% 21% 21% 2%
11% 1% 9% 7%
Cloning shifts from
construction to
configuration
Construction is the
most heavily cloned
C/C++ build phase
Rarely
cloned due
to CPack
21
67. 22
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical>50% clone
coverage
is common
<30% clone
coverage
is common
68. 22
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
>50% clone
coverage
is common
<30% clone
coverage
is common
69. 23
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
>50% clone
coverage
is common
<30% clone
coverage
is common
71. 24
Teams often migrate from one technology to another
Autotools CMakeAnt Maven
Could technology migration
help to reduce cloning?
72. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
73. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
More cloning in
Maven than Ant
74. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
More cloning in
Maven than Ant
High thresholds!
are very similar
75. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
More cloning in
Maven than Ant
High thresholds!
are very similar
Q: How are they avoidingbuild cloning?
76. 26
Using abstraction mechanisms not!
provided by the build technologies
<!-- Define references to files containing common targets -->
<!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">
]>
...
<project name="bea" default="all">
<!-- Include the file containing common targets. -->
&modules -common;
</project >
Listing 1: Using XML entity expansion to
common build code in the Keel system.
77. 26
Using abstraction mechanisms not!
provided by the build technologies
<!-- Define references to files containing common targets -->
<!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">
]>
...
<project name="bea" default="all">
<!-- Include the file containing common targets. -->
&modules -common;
</project >
Listing 1: Using XML entity expansion to
common build code in the Keel system.
Store an external block ofXML in a macro
78. 26
Using abstraction mechanisms not!
provided by the build technologies
<!-- Define references to files containing common targets -->
<!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">
]>
...
<project name="bea" default="all">
<!-- Include the file containing common targets. -->
&modules -common;
</project >
Listing 1: Using XML entity expansion to
common build code in the Keel system.
Store an external block ofXML in a macro
Expand the macro
79. 27
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
>50% clone
coverage
is common
<30% clone
coverage
is common
80. 27
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
Ant Maven
may reduce
cloning
Use of
“creative”
abstraction
>50% clone
coverage
is common
<30% clone
coverage
is common