The concept of continuous integration (CI) and continuous delivery (CD) based on daily builds, test automation and automated deployment, is becoming popular and widely used in the industry. Automation efforts support both quality improvements and agility acceleration. Recently the scope of test automation has expanded, system-level testing is now automated and included in continuous daily builds.
Before the automation era, system tests were only executed on build artefacts once all implementation was completed. However, system testing is now conducted in parallel with implementation of product code as regression tests. This continuous system test practice allows the development team to lower the cost of bug fixes.
On the other hand, the concept of system testing is not well understood from the perspective of traditional QA. For example the bug curve is smoothly convergent in traditional QA, but it is rapidly convergent in a continuous system test environment.
In this report, source code, bugs and implementation metrics are analyzed for better understanding of continuous system test concepts.
Profile: DevOps engineer in Rakuten inc. He started his career as a search engine developer, then developed a system test automation framework for non-functional tests including resilience testing and search quality testing. Currently he leads a test automation team and DevOps team for expanding the concept of Continuous System Test in the organization.
Metrics Analysis on Continuous System Test (ASQN 2016)
1. Metrics Analysis on Continuous System Test
Sep/28/2016
Rakuten Inc.
Ecosystem Service Department
Delivery and Quality Solution Group
Manager
荻野恒太郎
http://corp.rakuten.co.jp/careers/engineering/
4. 4
(ST)Design Impl
System
Test(ST)
Analysis
Background①:Change in Development Process and System Test
Design
Implementation
Requirement
Before
Automation After
Automation
Time
Requirement
Time
Analysis
•
From
Waterfall
to
Agile
Hiranabe Kenji,
”Role
of
Agile
at
the
turning
point
of
Software
Engineering”
SS2010.
•
System
Test
Execution
from
early
phase
of
the
project
Nagata Atsushi, “Approach of System Test in QA organization for Agile Development”, JASPIC2013.
•
Continuous
System
Test
Kotaro
Ogino,
”Development
process
improvement
by
System
Test
Automation”
JaSST’
Tokyo
2014.
Daily
execution
of
system
test
by
automation
5. 5
Background②:Advantage of Continuous System Test
Myth
in
Test
Automation
•
Tradeoff
relationship
among
quality,
cost
and
delivery
→
Assumption:
QA
is
independent
from
development
process
Continuous
System
Test
•
Include
system
test
in
development
process
by
automation
6. 6
Background②:Advantage of Continuous System Test
System
test
is
inside
of
development
process
JaSST’14
Tokyo
8. 8
Background②:Advantage of Continuous System Test
Myth
in
Test
Automation
•
Tradeoff
relationship
among
quality,
cost
and
delivery
→
Assumption:
QA
is
independent
from
development
process
Continuous
System
Test
•
Include
system
test
in
development
process
by
automation
Improvement
in
Bug
fix
=
Improvement
in
Cost
and
Delivery
9. 9
Objective and Approach of this report
Objective: To
understand
the
characteristics
of
Dev
and
ST
under
continuous
system
test environment.
Approach:
Metrics
analysis
Q1: Is
quality
of
automated
system
test
low?
Q2: How
system
test
automation
changes
Dev
and
QA
relationship?
Q3: Any
technique
for
better
development
activity?
Question
for
ContinuousSystem Test
11. 11
Metrics
Dev
Metrics
Daily
•
commit
frequency
•
commit
size
Dev
Commit
Source
code
repository
Build
Test
Bug
Report
Product
Metrics
Daily
•
LOC •
Updated
file
•
Updated
LOC
•
Added file
•
Added
LOC
•
Deleted
file
•
Deleted
LOC
•
Not
updated
file
•
Not
updated
LOC
Bug
Metrics
Daily
•
Detected
bugs
12. 12
Metrics collection methods
Group Metrics Collection
method Unit
Dev
metrics Daily
commit
frequency git log
(*1) Number of
times
Daily
commit
size git log
(*2) Number of
lines
Product
metrics Daily
LOC cloc (*3) Number of
lines
Daily
updated
LOC
Daily
added
LOC
Daily
deleted
LOC
Daily
not-‐updatedLOC
cloc –diff
(*4) Number of
lines
Daily
updated
file
Daily
added
file
Daily
deleted
file
Daily
not-‐update
file
cloc –diff
(*4) Number
of
Files
Bug
metrics Daily
detected bugs -‐ Bugs
detected
in
ST
-‐Measured
in
created
date
of
bug
ticket
-‐ Duplicated
bugs
are
deleted
Number
of
times
(*1)
https://www.atlassian.com/ja/git/tutorial/git-‐basics#!log
(*2)
Include
the
comments
etc
(*3)
http://cloc.sourceforg e.net/
(*3)(*4)
Java
language,
not
including
comments
etc.
13. 13
Measured metrics (2013 1/28~10/23)
0
10000
20000
30000
0
10
20
30
40
Commit
Commit
size
0
100
200
0〜4
5〜9
10…
15…
20…
25…
30…
Daily
commit
frequency
Commit
frequency
and
commit
size
Daily
commit
frequency
distribution
Daily
detected
bugs
0
100
200
300
0 1 2 3 4 5
Daily
detected
bug
distribution
Update
LOC
Added
LOC
Deleted
LOC
0
2000
0
2000
0
1000
2000
LOC
Time
Time
Commit
frequency Commit
size
Number
of
lines
Frequency
Frequency
14. 14
Analysis data and development phase
Big
requirement
System
refactoring
Continuous
Small
requirement
Acceptance
test
•
Continuous
development
and
testing
•
Bug
curve
rapidly
converged
0
500
1000
1500
0
20
40
60
80
100
累積バグ
累積コミット数
A B C
Acceptance
test
Accumulated
bugs
Accumulated
commit
15. 15
Change in System Test Characteristics
Traditional
System
Test
Continuous
System
Test
Timing
of
ST After
implementation Parallel to
implementation
Responsibility Gate
keeper
of quality Regression
test
Adding
test
cases Reliability curve
at
QA
phase Coverage
at implementation
phase
Change
source code
at
test
phase
No Yes
17. 17
Analysis1: Evaluation on Automated System Test
March May July September
Bug
Curve
A B C
•
Considers
the
project
as
an
incremental
mini-‐waterfall
•
Calculate
the
metrics
at
A.B.
and
C
where
QA
activities
are
stable
Objective
of
Analysis
1
to
compare
the
test
density
and
bug
density
with
the
industry
statistics
for
investigating
the
test
quality
of
target
project.
Q1: Is
quality
of
automated
system
test
low?
Non-‐functional
test
Functional
test
Smoke
test
18. 18
Analysis1: Evaluation on Automated System Test
(*1)
”ソフトウェア開発データ白書 2012-‐2013
定量データ分析で分かる開発の最新動向”
0
20
40
新規開発 改良開発
0
0.5
1
1.5
2
新規開発 改良開発
Metrics
•
Bug
density
and
Test
Density
•
Compare
with
the
industry
statistics
provided
by
IPA(*1)
-‐ Minimum、P25,
Median、P75,
Maximum
→
Inside/outside
the
range
between
P25
~
P75
-‐ Programming
language
Java
-‐ New
development/
Improvement
development
Test
Density Bug
density
Test
density
=
Number
of
test
cases
÷ KLOC
Bug
density
=
Detected
bugs
÷ KLOC
Industry
statistics
provided
by
IPA
New
Improvement New
Improvement
19. 19
Analysis1:Test Density
Discussion:
-‐ The
number
of
test
cases
is
normal
to
industrial
statistics
-‐ The
test
density
has
been
improved
→
Adding
the
test
cases
becomes
easier
after
framework
implementation
-‐ The
test
density
of
C
is
bit
higher
than
the
industry
statistics
→
Needs
a
guideline
for
the
number
and
coverage
of
test
cases
in
system
test
0
10
20
30
40
50
A B C
0
10
20
30
40
50
新規開発 改良開発
Test
Density
Industrial
StatisticsTarget
Project
(18.64)
(28.71)
(38.76)
New
Improvement
20. 20
Analysis1:Bug density
Discussion:
-‐ Small
bug
density
in
phase
B
where
small
continuous
requirements.
-‐ Bugs
are
detected
in
phase
3
where
no
additional
features
is
implemented
→
Detected
the
degrade
of
the
system
in
refactoring
phase.
-‐ Bug
density
is
inside
the
industry
statistics
→
Considers
the
automated
system
test
has
a
enough
coverage
0
0.5
1
1.5
2
新規開発 改良開発
0
0.5
1
1.5
2
A B C
Bug
density
(0.74)
(0.08)
(0.31)
Target
Project
New
Improvement
Industrial
Statistics
22. 22
Analysis2:Relationship between Dev and Bug
Dev
metrics
Commit Source
Code
Repository
Bug
report
Product
metrics Bug
metrics
Build
Test
Previous
Research: Evaluation
between
bug
and
product
metrics
-‐ S
Syed
et
al,
“Open
Source,
Agile
and
reliability
Measures”,
ISQI,
2009
-‐ Shimomura
et
al,
“Evaluation
of
unit
test
quality
risk
using
software
metrics”,
SQiP2013.
Objective
of
Analysis
2
To
investigate
the
relationship
between
bug
and
Dev
metrics
Q2: How
system
test
automation
changes
Dev
and
QA
relationship?
23. 23
Analysis2:Relationship between Dev and Bug
Dev
metrics Product
metrics Bug
metrics
Method
• Calculated
the
correlation
factors
between
Dev
and
Bug
metrics
-‐ Daily
Dev
and
Product
metrics
-‐ Weekly
accumulated
Dev
and
Product
metrics
Time
Commit
frequency
TimeUpdated
LOC Time
Detected
Bugs
Time
Daily
Data
Weekly
Accumulated
Data
Time
Detected
Bugs
Time
Updated
LOC
Commit
frequency
24. 24
Analysis2:Correlation in Daily Data
Group Explanatory
Variable
Correlation
Coefficient
Dev
metrics Commit
frequency
Commit
size
0.19
0.06
Product
metrics Updated
LOC
Added
LOC
Deleted
LOC
Not-‐updated
LOC
Updated
file
Added
file
Deleted
file
Not-‐updated
file
0.36
0.17
0.19
-‐0.17
0.20
-‐0.09
0.06
-‐0.19
0
5
10
0 20 40
Commit
Frequency
Detected
Bug
Scatter
Plot:
Commit
Frequency
VS
Detected
Bug
600
800
1000
0
50
100
累積バグ数
変更無しファイル数
Time
series
of
accumulated
detected
bugs
and
not-‐updated
files
Discussion
-‐ Low
correlation
for
all
metrics
→
Detection
of
integration
bugs
have
delay
-‐ Any
relationship
between
bug
detection
and
number
of
not-‐updated
files?
Accumulated
detected
bug
Not-‐updated
files
25. 25
Analysis2:Correlation in Accumulated Weekly Data
0
20
0 100 200
Scatter
Plot:
Accumulated
Weekly
Updated
file
VS
Weekly
Detected
Bugs
Accumulated
Weekly
Updated
files
Detected
bugs
0
10
20
100以上100以下
Stratified
Analysis
Discussion:
-‐ Dev
metrics
has
middle-‐level
correlation,
but
lower
than
product
metrics
-‐ Accumulated
updated
files
has
the
highest
correlation
100>=
Group Explanatory
Variable
Correlation
Coefficient
Dev
metrics Weekly
Commit
frequency
Weekly
Commit
size
0.47
0.33
Product
metrics Weekly
Updated
LOC
Weekly
Added
LOC
Weekly
Deleted
LOC
Weekly
Not-‐updated
LOC
Weekly
Updated
file
Weekly
Added
file
Weekly
Deleted
file
Weekly
Not-‐updated
file
0.56
0.42
0.61
-‐0.29
0.66
0.20
0.33
-‐0.31
Accumulated
Weekly
Updated
files
Detected
bugs
100<
27. 27
Analysis3:Analysis on Bug Curve
Reliability
Growth
Curve
•
Consider
the
relationship
between
amount
of
test
and
probability
of
bug
detection
【Software
Reliability
Model,
Yamada
Shigeru,
1994】
継続的システムテスト
従来のシステムテスト
Analysis Design Implementation System
TestTraditional
Development
Process
Development
process
with
Continuous
System
Test
Time
Accumulated
Detected
Bugs
Objective
of
Analysis
3
How
to
detect
bugs
earlier
in
continuous
system
test?
Investigate
the
reason
why
bug
curve
converged
rapidly.
Q3: Any
technique
for
better
development
activity?
Continuous
System
Test
Traditional
System
Test
28. 28
Analysis3:Bug curve under Continuous System Test
0
20
40
60
80
100
Time
A
B
C
•
Bug
increases
with
the
stable
gradients
•
The
curve
converged
rapidly
at
phase
end
Accumulated
Detected
Bugs
29. 29
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
30. 30
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐ Small
converge
makes
big
converge
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
31. 31
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐ Small
converge
makes
big
converge
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
32. 32
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐ Small
converge
makes
big
converge
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
33. 33
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐ Small
converge
makes
big
converge
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
(検出バグ数)
34. 34
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐>
It
shows
the
reduction
of
bugs
in
the
source
code
change.
-‐ Small
converge
makes
big
converge
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
(検出バグ数)
35. 35
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐>
It
shows
the
reduction
of
bugs
in
the
source
code
change.
-‐ Small
converges
combine
to
make
a
large
convergence
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
(検出バグ数)
36. 36
0
50
100
Analysis3:Analysis on Bug Curve
Accumulated
detected
bugs
Time
0
50
100
0 200 400 600 800 1000
A
B
C
Method
1
•
Bug
curve
based
on
accumulated
commit
frequency
Discussion:
-‐ A,B,C
-‐>
Not
so
much
difference
in
duration
Big
difference
in
commit
frequency
-‐ When horizontal axis is time,
the curve rapidly converged at phase end
-‐ When
horizontal
axis
is
commit
frequency,
the
curve
converged
smoothly
-‐>
It
shows
the
reduction
of
bugs
in
the
source
code
change.
-‐ Small
converges
combine
to
make
a
large
convergence
-‐>
Developers
know
the
bugs
immediately
right
after
commit,
then
fix
them.
A
B
C
Accumulated
Commit
frequency
Accumulated
detected
bugs
(検出バグ数)
37. 37
Analysis3:Analysis on Bug Curve
Commit
Frequency
0
50
100
0 500 1000
累積バグ
累積バグ in
スモークテスト
累積バグ in
その他テスト
A
B
C
Method
2
•
Bug
curve
analysis
for
each
test
type
Version
Smoke Test
Other Tests
System
Test
Accumulated
bug
in
total
Accumulated
bug
in
Smoke
Test
Accumulated
bugs
in
other
tests
Accumulated
detected
bugs
38. 38
Analysis3:Analysis on Bug Curve
Commit
Frequency
0
50
100
0 500 1000
累積バグ
累積バグ in
スモークテスト
累積バグ in
その他テスト
A
B
C
Method
2
•
Bug
curve
analysis
for
each
test
type
Version
Smoke Test
Other Tests
System
Test
Accumulated
bug
in
total
Accumulated
bug
in
Smoke
Test
Accumulated
bugs
in
other
tests
Accumulated
detected
bugs
Discussion:
-‐ The
commit
breaking
the
smoke
test
is
happened
only
once
in
A
-‐ 2
times
in
C
(Detected
bugs
are
both
10)
-‐ In
C,
Total
is
converged
right
after
converge
of
smoke
test
C
39. 39
Analysis3:Analysis on Bug Curve
Commit
Frequency
0
50
100
0 500 1000
累積バグ
累積バグ in
スモークテスト
累積バグ in
その他テスト
A
B
C
Method
2
•
Bug
curve
analysis
for
each
test
type
Version
Smoke Test
Other Tests
System
Test
Accumulated
bug
in
total
Accumulated
bug
in
Smoke
Test
Accumulated
bugs
in
other
tests
Accumulated
detected
bugs
Discussion:
-‐ The
commit
breaking
the
smoke
test
is
happened
only
once
in
A
-‐ 2
times
in
C
(Detected
bugs
are
both
10)
-‐ In
C,
Total
is
converged
right
after
converge
of
smoke
test
C
40. 40
Analysis3:Analysis on Bug Curve
Commit
Frequency
0
50
100
0 500 1000
累積バグ
累積バグ in
スモークテスト
累積バグ in
その他テスト
A
B
C
Method
2
•
Bug
curve
analysis
for
each
test
type
Version
Smoke Test
Other Tests
System
Test
Accumulated
bug
in
total
Accumulated
bug
in
Smoke
Test
Accumulated
bugs
in
other
tests
Accumulated
detected
bugs
Discussion:
-‐ The
commit
breaking
the
smoke
test
is
happened
only
once
in
A
-‐ 2
times
in
C
(Detected
bugs
are
both
10)
-‐ In
C,
Total
is
converged
right
after
converge
of
smoke
test
C
41. 41
Analysis3:Analysis on Bug Curve
Commit
Frequency
0
50
100
0 500 1000
累積バグ
累積バグ in
スモークテスト
累積バグ in
その他テスト
A
B
C
Method
2
•
Bug
curve
analysis
for
each
test
type
Version
Smoke Test
Other Tests
System
Test
Accumulated
bug
in
total
Accumulated
bug
in
Smoke
Test
Accumulated
bugs
in
other
tests
Accumulated
detected
bugs
Discussion:
-‐ The
commit
breaking
the
smoke
test
is
happened
only
once
in
A
-‐ 2
times
in
C
(Detected
bugs
are
both
10)
-‐ In
C,
Total
is
converged
right
after
converge
of
smoke
test
C
42. 42
Analysis3:Analysis on Bug Curve
Commit
Frequency
0
50
100
0 500 1000
累積バグ
累積バグ in
スモークテスト
累積バグ in
その他テスト
A
B
C
Method
2
•
Bug
curve
analysis
for
each
test
type
Version
Smoke Test
Other Tests
System
Test
Accumulated
bug
in
total
Accumulated
bug
in
Smoke
Test
Accumulated
bugs
in
other
tests
Accumulated
detected
bugs
Discussion:
-‐ The
commit
breaking
the
smoke
test
is
happened
only
once
in
A
-‐ 2
times
in
C
(Detected
bugs
are
both
10)
-‐ In
C,
Total
is
converged
right
after
converge
of
smoke
test
C
-‐>
Implementation
causing
the
smoke
test
break
is
divided
for
iteration.
Smaller
commit
can
make
bug
detection
faster
and
fix
earlier.
44. 44
Conclusion:Question on Continuous System Test
Conclusion:
To
analyzed
development,
product
and
bug
metrics
for
better
understanding
on
continuous
system
test
Q1: Is
quality
of
automated
system
test
low?
Q2: How
system
test
automation
changes
Dev
and
QA
relationship?
Q3: Any
technique
for
better
development
activity?
45. 45
Conclusion:Answer for Q1
Answer(From
Analysis
1):
Automated
system
test
is
not
low
quality
•
However,
the
test
density
can
be
easily
increased
in
automated
test
environment.
-‐>
Need
the
guideline
for
the
coverage
of
system
test.
0
50
A B C
Test
Density
Q1: Is
quality
of
automated
system
test
low?
46. 46
Conclusion:Answer for Q2
Answer(From
Analysis
2):
Not
only
product
metrics,
but
also
Dev
metrics
has
a
relationship
with
bugs
under
the
environment
where
system
test
is
a
part
of
process
of
development
•
Bug
relates
to
not-‐updated
files
and
updated
files.
0
20
Detected
bugs
Stratified
Analysis
100>= 100<
Q2: How
system
test
automation
changes
Dev
and
QA
relationship?
47. 47
Answer(From
Analysis
3):
Quick
feedback
to
developers
is
important
in
continuous
system
test
environment
•
Commit
type
changed
from
feature
to
bug
fix
•
The
bug
can
be
detected
earlier
by
dividing
the
commits
which
cause
smoke
test
break
into
different
iteration.
Conclusion:Answer for Q3
Commit
frequency
0
100
0 1000
Detected
bugs
Q3: Any
technique
for
better
development
activity?
48. 48
Further work
•
Guideline
for
system
test
→
Improvement
of
test
under
continuous
system
test
-‐ Prioritization
of
test
cases
-‐ Prevent
too
much
test
cases
•
Use
Dev
metrics
for
quality
control
•
More
collaboration
between
Dev
and
QA