An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software
1. An Empirical Analysis of Build Failures in the
Continuous Integration Worfklows
of Java-Based Open-Source Software
Thomas Rausch, Waldemar Hummer, Philipp Leitner*, Stefan Schulte
Distributed Systems Group
Vienna University of Technology, Austria
http://dsg.tuwien.ac.at
* Software Evolution and Architecture Lab
University of Zurich, Switzerland
http://www.ifi.uzh.ch/en/seal.html
2. 2
Continuous Integration
VCS
CI Server Build
Feedback
Logs
Vasilescu et al. (2015).
Quality and Productivity Outcomes Relating to
Continuous Integration in GitHub
“Our main finding is that continuous
integration improves the productivity of project
teams”
Kerzazi et al. (2014).
Why do Automated Builds Break? An Empirical Study
“We [...] quantified the cost of such build
breakage as more than 336.18 man-hours”
8. 8
Understanding Build Failures
What types of errors cause CI build failures?
Which development practices can be
associated with CI build failures?
9. 9
Error Categorization and Quantification
Goal
○ Categorization of errors
○ Frequency of occurrence of error types
Approach
○ Systematic exploration of ~54 000 logfiles
○ Categorization scheme based on log message patterns
[INFO] Compiling 67 source files to /home/travis/.../target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/travis/.../redis/RedisAutoConfiguration.java:[143,10] cannot find symbol
[INFO] 1 error
[INFO] Compiling 67 source files to /home/travis/.../target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/travis/.../redis/RedisAutoConfiguration.java:[143,10] cannot find symbol
[INFO] 1 error
[INFO] Compiling 67 source files to /home/travis/.../target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/travis/.../redis/RedisAutoConfiguration.java:[143,10] cannot find symbol
[INFO] 1 error
10. 10
Error Categories
unknown Errors without a clearly identifiable cause 9
itestfailure An automated integration test failed 4
doc Documentation (e.g., JavaDoc) problem 3
license License criteria not met (missing header) 3
compatibility API incompatibility 2
androidsdk Android SDK-related error 1
buildout Error specific to Crate.IO python module 1
Label Description Occurrences
testfailure An automated test failed 12
compile Compilation error 12
git VCS interaction error 12
buildconfig Faulty build config 11
crash Build environment crash or timeout 11
dependency Dependency error 11
quality Coding-rule violation (e.g., Checkstyle) 10
11. 11
Distribution of Common Error Types
Faulty VCS
interaction
Faulty build
configuration Dependency
error
Compilation
error
Coding-rule
violation
Failing test
Crash
40%
30%
20%
10%
0%
12. 12
Distribution of Common Error Types
Apache Storm
Butterknife
Crate.IO
Hystrix
Error
testfailure
compile
git dependency crash
buildconfig quality others
Percentage
JabRef
jcabi-github
Presto
RxAndroid
SpongeAPI
Spring Boot
Square OkHttp
Square Retrofit
0% 25% 50% 75% 100%
13. 13
Understanding Build Failures
What types of errors cause CI build failures?
Which development practices can be
associated with CI build failures?
14. 14
Change Metrics
.java .txt
Changes
Complexity
○ Churn, number of files, ...
File types
○ README.txt vs.
IntegrationTest.java
Date and time
Author
○ Experience, commit
frequency, ...
16. 16
Statistical Correlation Analysis
For each project individually
Non-parametric correlation tests
○ Pearson’s chi-square test
○ Mann—Whitney U test
Calculate effect sizes
○ Cramér’s V
○ Rank-biserial correlation
17. 17
PassedBuild outcome Failed
Failed Passed
Previous build result
Percentageofbuilds
Findings
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build history
b
b’
18. 18
PassedBuild outcome Failed
Failed Passed
Previous build result
Percentageofbuilds
Findings
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build history
b
b’
19. 19
Findings
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build history
No evidence that either history manipula-
tion operations or parallel development
to a PR affect the PR’s build outcome.
No evidence that either history manipula-
tion operations or parallel development
to a PR affect the PR’s build outcome.
Pull request scenarios
20. 20
Findings
Even objectively harmless changes can
break builds. This indicates unwanted
flakiness of tests or the build environment.
Even objectively harmless changes can
break builds. This indicates unwanted
flakiness of tests or the build environment.
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
Build failures mostly occur consecutively.
Phases of build instability perpetuate
failures.
File types
Build history
577 builds from Spring Boot
Changelog file change only
14% original failures
○ 52% test failures
○ 45% environment crash
○ 3% dependency error
No evidence that either history manipula-
tion operations or parallel development
to a PR affect the PR’s build outcome.
No evidence that either history manipula-
tion operations or parallel development
to a PR affect the PR’s build outcome.
Pull request scenarios
21. 21
Summary
Categorization of error types (beyond failed/errored)
Quantification of error type occurrence
Statistical analysis of impact factors
Uncovered challenges that arise when mining CI data
22. 22
Dipl.-Ing.
Thomas Rausch
Research Assistant
TU Wien
Distributed Systems Group
Argentinierstraße 8/184-1, 1040, Vienna, Austria
T: +43 1 58801 184 838
E: rausch@dsg.tuwien.ac.at
dsg.tuwien.ac.at/staff/trausch