Transfer Learning for Software Performance Analysis: An Exploratory Analysis

Transfer Learning for
Software Performance Analysis
An Exploratory Analysis
Pooyan Jamshidi Norbert Siegmund Miguel Velez Christian Kaestner Akshay Patel Yuvraj Agarwal

Many systems are now configurable
built

Empirical observations confirm that systems
are becoming increasingly configurable
Modern systems
• Increasingly configurable
with software evolution
• Deployed in dynamic and
uncertain environments
6
Understanding and Dealing with Over-Designed Configuration in System Software
Tianyin Xu*, Long Jin*, Xuepeng Fan*‡, Yuanyuan Zhou*,
Shankar Pasupathy† and Rukma Talwadker†
*University of California San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc
{tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu
{Shankar.Pasupathy, Rukma.Talwadker}@netapp.com
ABSTRACT
Configuration problems are not only prevalent, but also severely
impair the reliability of today’s system software. One fundamental
reason is the ever-increasing complexity of configuration, reflected
by the large number of configuration parameters (“knobs”). With
hundreds of knobs, configuring system software to ensure high re-
liability and performance becomes a daunting, error-prone task.
This paper makes a first step in understanding a fundamental
question of configuration design: “do users really need so many
knobs?” To provide the quantitatively answer, we study the con-
figuration settings of real-world users, including thousands of cus-
tomers of a commercial storage system (Storage-A), and hundreds
of users of two widely-used open-source system software projects.
Our study reveals a series of interesting findings to motivate soft-
ware architects and developers to be more cautious and disciplined
in configuration design. Motivated by these findings, we provide
a few concrete, practical guidelines which can significantly reduce
the configuration space. Take Storage-A as an example, the guide-
lines can remove 51.9% of its parameters and simplify 19.7% of
the remaining ones with little impact on existing users. Also, we
study the existing configuration navigation methods in the context
of “too many knobs” to understand their effectiveness in dealing
with the over-designed configuration, and to provide practices for
building navigation support in system software.
7/2006 7/2008 7/2010 7/2012 7/2014
0
100
200
300
400
500
600
700
Storage-A
Numberofparameters
Release time
1/1999 1/2003 1/2007 1/2011
0
100
200
300
400
500
5.6.2
5.5.0
5.0.16
5.1.3
4.1.0
4.0.12
3.23.0
1/2014
MySQL
Numberofparameters
Release time
1/1998 1/2002 1/2006 1/2010 1/2014
0
100
200
300
400
500
600
1.3.14
2.2.14
2.3.4
2.0.35
1.3.24Numberofparameters
Release time
Apache
1/2006 1/2008 1/2010 1/2012 1/2014
0
40
80
120
160
200
2.0.0
1.0.0
0.19.0
0.1.0
Hadoop
Numberofparameters
Release time
MapReduce
HDFS
Figure 1: The increasing number of configuration parameters with
software evolution. Storage-A is a commercial storage system from a ma-
jor storage company in the U.S.
[Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]

Configurations determine the performance
behavior of configurable software systems
Configuration options
enable different code
paths depending on
environmental conditions
Performance = Non-functional
property (e.g., response time,
energy)
8
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
union{
double d;

Influence of options are typically significant
number of counters
number of splitters
latency(ms)
100
150
1
200
250
2
300
Cubic Interpolation Over Finer Grid
243 684 10125 14166 18
Only by tweaking
2 options out of 200
in Apache Storm
- observed ~100% change
in latency

Developers, users, and operators need to
understand the influence of configurations
• Execution time
• Energy consumption
• Safety
• …
11

Understanding Performance
Behavior of Software Matters

What do we mean by “performance model”?
13
𝑓(𝒐 𝟏, 𝒐 𝟐) = 5 + 3𝒐 𝟏 + 15𝒐 𝟐 − 7𝒐 𝟏×𝒐 𝟐
𝒄 = < 𝒐 𝟏, 𝒐 𝟐 >
𝒄 = < 𝒐 𝟏, 𝒐 𝟐, … , 𝒐 𝟏𝟎 >
𝒄 = < 𝒐 𝟏, 𝒐 𝟐, … , 𝒐 𝟏𝟎𝟎 >
⋮
[Norbert Siegmund, et al., “Performance-influence models for highly configurable systems”, FSE’15]
𝑓: ℂ → ℝ

14
Measure
Learn
𝑓(𝒐 𝟏, 𝒐 𝟐) = 5 + 3𝒐 𝟏 + 15𝒐 𝟐 − 7𝒐 𝟏×𝒐 𝟐
TurtleBot
Optimization +
Reasoning +
Debugging
Learning predictive performance models via
sensitivity analysis
25 options × 2 values = 225 configurations

Building performance models is expensive
15
Measure

16
1 min each measurement
Over 60 years to finish the measurements!

• Specific hardware
• Specific workload
• Specific version
• …
18
0
4
0.5
1
4
×104
Latency(µs)
3
1.5
3
2
2
2
1 1
-1
4
0
1
2
4
Latency(µs)
×105
3
3
4
3
5
2
2
1 1
10
15
20
25
30
Latency(µs)
35
40
45
(a) cass-20 v1 (b) cass-20 v2
concurrent_reads
concurrent_writes
concurrent_reads
concurrent_writes
Performance models are built assuming fixed
environments
Environment change - >
New performance model
0
4
0.5
1
4
×104
Latency(µs)
3
1.5
3
2
2
2
1 1
-1
4
0
1
2
4
Latency(µs)
×105
3
3
4
3
5
2
2
1 1
10
15
20
25
30
Latency(µs)
35
40
45
(a) cass-20 v1 (b) cass-20 v2
concurrent_reads
concurrent_writes
concurrent_reads
concurrent_writes

Here is when transfer learning comes to the
scene
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
Understanding the performance behavior of configurable
software systems can enable (i) performance debugging, (ii)
performance tuning, (iii) design-time evolution, or (iv) runtime
adaptation [11]. We lack empirical understanding of how the
performance behavior of a system will vary when the environ-
ment of the system changes. Such empirical understanding will
provide important insights to develop faster and more accurate
learning techniques that allow us to make predictions and
optimizations of performance for highly configurable systems
in changing environments [10]. For instance, we can learn
performance behavior of a system on a cheap hardware in a
controlled lab environment and use that to understand the per-
formance behavior of the system on a production server before
shipping to the end user. More specifically, we would like to
know, what the relationship is between the performance of a
system in a specific environment (characterized by software
configuration, hardware, workload, and system version) to the
one that we vary its environmental conditions.
In this research, we aim for an empirical understanding of
performance behavior to improve learning via an informed
sampling process. In other words, we at learning a perfor-
A. Preliminary concepts
In this section, we provide forma
cepts that we use throughout this st
enable us to concisely convey conce
1) Configuration and environmen
the i-th feature of a configurable s
enabled or disabled and one of the
configuration space is mathematical
all the features C = Dom(F1) ⇥
Dom(Fi) = {0, 1}. A configurat
a member of the configuration spa
all the parameters are assigned to
range (i.e., complete instantiations of
We also describe an environment
e = [w, h, v] drawn from a given
W ⇥H ⇥V , where they respectively
values for workload, hardware and
2) Performance model: Given a
configuration space F and environm
formance model is a black-box fu
given some observations of the syst
combination of system’s features x
e 2 E. To construct a performanc
with configuration space F, we run A
II. INTUITION
performance behavior of configurable
enable (i) performance debugging, (ii)
ii) design-time evolution, or (iv) runtime
ck empirical understanding of how the
of a system will vary when the environ-
nges. Such empirical understanding will
ghts to develop faster and more accurate
hat allow us to make predictions and
rmance for highly configurable systems
ents [10]. For instance, we can learn
of a system on a cheap hardware in a
ment and use that to understand the per-
he system on a production server before
er. More specifically, we would like to
nship is between the performance of a
nvironment (characterized by software
e, workload, and system version) to the
nvironmental conditions.
aim for an empirical understanding of
to improve learning via an informed
other words, we at learning a perfor-
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
or workload, hardware and system version.
rformance model: Given a software system A with
ation space F and environmental instances E, a per-
e model is a black-box function f : F ⇥ E ! R
me observations of the system performance for each
tion of system’s features x 2 F in an environment
To construct a performance model for a system A
nfiguration space F, we run A in environment instance
n various combinations of configurations xi 2 F, and
he resulting performance values yi = f(xi) + ✏i, xi 2
e ✏i ⇠ N (0, i). The training data for our regression
is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
e function is simply a mapping from the input space to
rable performance metric that produces interval-scaled
re we assume it produces real numbers).
rformance distribution: For the performance model,
sured and associated the performance response to each
ation, now let introduce another concept where we
environment and we measure the performance. An
al performance distribution is a stochastic process,
! (R), that defines a probability distribution over
ance measures for each environmental conditions. To
t a performance distribution for a system A with
ation space F, similarly to the process of deriving
d like to
ance of a
software
on) to the
nding of
informed
a perfor-
ell-suited
ledge we
research
n (trans-
urce and
e carried
nsferable
consider
a set of
mary vari-
formance
nderstand
will be
s kind of
this area
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
construct a performance distribution for a system A with
configuration space F, similarly to the process of deriving
Extract Reuse
Learn Learn
20
• An ML approach
• Uses the knowledge
learned on the source
• To learn a cheaper
model for the target

A simple Transfer Learning via model shift
Machines twice as fast
23
log P (θ, Xobs )
Θ
P (θ|Xobs )
Θ
Figure 5: The ﬁrst column shows the log joint prob
log P(θ, Xobs )
Θ
log P(θ, Xobs )
Θ
log P(θ, Xobs )
Θ
P(θ|Xobs) P(θ|Xobs) P(θ|Xobs)
Transfer
function
Source
Target
[Pavel Valov, et al. “Transferring performance prediction models…”, ICPE’17 ]
Throughput
[higher, better]

However, when the environment change is
not homogeneous, things can go wrong
24
log P(θ, Xobs )
Θ
log P(θ, X
Θ
P(θ|Xobs)
Θ
P(θ|Xob
Θ
Figure 5: The ﬁrst column shows the log joint probability an
have estimates of the log joint and the posterior for uniform
except that more points were chosen in high likelihood region
AGPR will query point (x). However, given sufﬁcient smooth
Throughput
[higher, better]

Other forms of transfer learning exist
25
Measure
Learn
TurtleBot
Measure
Simulator (Gazebo)
Data
Reuse
DataData
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
𝑓(𝒐 𝟏, 𝒐 𝟐) = 5 + 3𝒐 𝟏 + 15𝒐 𝟐 − 7𝒐 𝟏×𝒐 𝟐

Even learning from a source with a small
correlation is better than no transfer
10
20
30
40
50
60
AbsolutePercentageError[%]Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
Fig. 6: Prediction accuracy of the model learned with samples
from different sources of different relatedness to the target.
GP is the model without transfer learning.
TABLE
column
datasets
measure
1
2
3
4
5
6
predictio
system,
as the e
pled for
Models becomes more
accurate when the source is
more related to the target
27
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]

We need to know and transfer learning works
• When simple transfer works/not works?
• How source and target are “related”?
• What knowledge we can transfer across environments?
28
DataModel
Transferable
Knowledge
II. INTUITION
performance behavior of configurable
enable (i) performance debugging, (ii)
i) design-time evolution, or (iv) runtime
ck empirical understanding of how the
of a system will vary when the environ-
nges. Such empirical understanding will
ghts to develop faster and more accurate
at allow us to make predictions and
mance for highly configurable systems
ents [10]. For instance, we can learn
of a system on a cheap hardware in a
ment and use that to understand the per-
he system on a production server before
er. More specifically, we would like to
nship is between the performance of a
nvironment (characterized by software
e, workload, and system version) to the
nvironmental conditions.
aim for an empirical understanding of
to improve learning via an informed
other words, we at learning a perfor-
ged environment based on a well-suited
onfigurable
ugging, (ii)
(iv) runtime
of how the
he environ-
anding will
ore accurate
ictions and
ble systems
e can learn
rdware in a
and the per-
erver before
ould like to
mance of a
by software
sion) to the
standing of
n informed
g a perfor-
well-suited
ironment space: Let Fi indicate
urable system A which is either
e of them holds by default. The
ematically a Cartesian product of
m(F1) ⇥ · · · ⇥ Dom(Fd), where
nfiguration of a system is then
tion space (feature space) where
gned to a specific value in their
ations of the system’s parameters).
onment instance by 3 variables
a given environment space E =
pectively represent sets of possible
are and system version.
Given a software system A with
environmental instances E, a per-
-box function f : F ⇥ E ! R
the system performance for each
atures x 2 F in an environment
formance model for a system A
we run A in environment instance
ons of configurations xi 2 F, and
ance values yi = f(xi) + ✏i, xi 2
e training data for our regression
= {(xi, yi)}n
a mapping from the input space to
etric that produces interval-scaled
duces real numbers).
on: For the performance model,
the performance response to each
duce another concept where we
we measure the performance. An
ribution is a stochastic process,
es a probability distribution over
ach environmental conditions. To
stribution for a system A with
ilarly to the process of deriving
e run A on various combinations
nfiguration and environment space: Let Fi indicate
feature of a configurable system A which is either
or disabled and one of them holds by default. The
ation space is mathematically a Cartesian product of
features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
) = {0, 1}. A configuration of a system is then
er of the configuration space (feature space) where
parameters are assigned to a specific value in their
e., complete instantiations of the system’s parameters).
describe an environment instance by 3 variables
, h, v] drawn from a given environment space E =
⇥V , where they respectively represent sets of possible
or workload, hardware and system version.
rformance model: Given a software system A with
ation space F and environmental instances E, a per-
e model is a black-box function f : F ⇥ E ! R
me observations of the system performance for each
tion of system’s features x 2 F in an environment
To construct a performance model for a system A
figuration space F, we run A in environment instance
n various combinations of configurations xi 2 F, and
he resulting performance values yi = f(xi) + ✏i, xi 2
e ✏i ⇠ N (0, i). The training data for our regression
s then simply Dtr = {(xi, yi)}n
function is simply a mapping from the input space to
rable performance metric that produces interval-scaled
re we assume it produces real numbers).
rformance distribution: For the performance model,
ured and associated the performance response to each
ation, now let introduce another concept where we
environment and we measure the performance. An
l performance distribution is a stochastic process,
ance measures for each environmental conditions. To
t a performance distribution for a system A with
ation space F, similarly to the process of deriving
ormance models, we run A on various combinations
Extract Reuse
Learn Learn
whywhen

Theoretical Principles of
Transfer Learning

Establishing theoretical principles of transfer
learning for performance analysis
30
DataModel
Transferable
Knowledge
II. INTUITION
mance model in a changed environment based on a well-suited
sampling set that has been determined by the knowledge we
gained in other environments. Therefore, the main research
question is whether there exists a common information (trans-
ferable/reusable knowledge) that applies to both source and
target environments of systems and therefore can be carried
over from either environment to the other. This transferable
In this section, we provide
cepts that we use throughout t
enable us to concisely convey
1) Configuration and enviro
the i-th feature of a configura
enabled or disabled and one
configuration space is mathem
all the features C = Dom(F
Dom(Fi) = {0, 1}. A confi
a member of the configuratio
all the parameters are assigne
range (i.e., complete instantiati
We also describe an environ
e = [w, h, v] drawn from a
W ⇥H ⇥V , where they respec
values for workload, hardware
2) Performance model: Giv
configuration space F and env
formance model is a black-bo
given some observations of th
combination of system’s featu
e 2 E. To construct a perfor
with configuration space F, we
e 2 E on various combinations
record the resulting performan
F where ✏i ⇠ N (0, i). The t
models is then simply Dtr = {
response function is simply a m
a measurable performance met
data (here we assume it produ
II. INTUITION
ferable/reusable knowledge) that applies to both source and
target environments of systems and therefore can be carried
over from either environment to the other. This transferable
In this section, we provide formal definitions of four c
cepts that we use throughout this study. The formal notati
enable us to concisely convey concept throughout the pap
1) Configuration and environment space: Let Fi indic
the i-th feature of a configurable system A which is ei
enabled or disabled and one of them holds by default.
configuration space is mathematically a Cartesian produc
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), wh
Dom(Fi) = {0, 1}. A configuration of a system is t
a member of the configuration space (feature space) wh
all the parameters are assigned to a specific value in t
range (i.e., complete instantiations of the system’s paramete
We also describe an environment instance by 3 variab
e = [w, h, v] drawn from a given environment space E
W ⇥H ⇥V , where they respectively represent sets of poss
2) Performance model: Given a software system A w
configuration space F and environmental instances E, a p
formance model is a black-box function f : F ⇥ E !
given some observations of the system performance for e
combination of system’s features x 2 F in an environm
e 2 E. To construct a performance model for a system
with configuration space F, we run A in environment insta
e 2 E on various combinations of configurations xi 2 F,
record the resulting performance values yi = f(xi) + ✏i, x
F where ✏i ⇠ N (0, i). The training data for our regress
i=1. In other word
response function is simply a mapping from the input spac
a measurable performance metric that produces interval-sca
develop faster and more accurate
ow us to make predictions and
e for highly configurable systems
10]. For instance, we can learn
system on a cheap hardware in a
nd use that to understand the per-
tem on a production server before
ore specifically, we would like to
is between the performance of a
nment (characterized by software
kload, and system version) to the
mental conditions.
or an empirical understanding of
mprove learning via an informed
words, we at learning a perfor-
nvironment based on a well-suited
determined by the knowledge we
ts. Therefore, the main research
sts a common information (trans-
that applies to both source and
ems and therefore can be carried
nt to the other. This transferable
nsfer learning [10].
ferent changes that we consider
ion: A configuration is a set of
options. This is the primary vari-
onsider to understand performance
, we would like to understand
he system under study will be
nfiguration changes. This kind of
s of previous work in this area
er, they assumed a predetermined
workload, hardware, and software
workload describes the input of
tes on. The performance behavior
er different workload conditions.
the performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
e 2 E and record the resulting performance values yi. We then
fit a probability distribution to the set of measured performance
values De = {yi} using kernel density estimation [2] (in the
mportant insights to develop faster and more accurate
techniques that allow us to make predictions and
tions of performance for highly configurable systems
ging environments [10]. For instance, we can learn
ance behavior of a system on a cheap hardware in a
d lab environment and use that to understand the per-
e behavior of the system on a production server before
to the end user. More specifically, we would like to
hat the relationship is between the performance of a
n a specific environment (characterized by software
ation, hardware, workload, and system version) to the
we vary its environmental conditions.
s research, we aim for an empirical understanding of
ance behavior to improve learning via an informed
g process. In other words, we at learning a perfor-
model in a changed environment based on a well-suited
g set that has been determined by the knowledge we
n other environments. Therefore, the main research
is whether there exists a common information (trans-
eusable knowledge) that applies to both source and
nvironments of systems and therefore can be carried
m either environment to the other. This transferable
ge is a case for transfer learning [10].
s first introduce different changes that we consider
work: (i) Configuration: A configuration is a set of
s over configuration options. This is the primary vari-
he system that we consider to understand performance
. More specifically, we would like to understand
performance of the system under study will be
ed as a result of configuration changes. This kind of
s the primary focus of previous work in this area
9], [26], [9], however, they assumed a predetermined
ment (i.e., a specific workload, hardware, and software
(ii) Workload: The workload describes the input of
m on which it operates on. The performance behavior
ystem can vary under different workload conditions.
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), wher
Dom(Fi) = {0, 1}. A configuration of a system is the
a member of the configuration space (feature space) wher
all the parameters are assigned to a specific value in the
range (i.e., complete instantiations of the system’s parameters
We also describe an environment instance by 3 variable
W ⇥H ⇥V , where they respectively represent sets of possibl
2) Performance model: Given a software system A wit
configuration space F and environmental instances E, a per
given some observations of the system performance for eac
combination of system’s features x 2 F in an environmen
with configuration space F, we run A in environment instanc
e 2 E on various combinations of configurations xi 2 F, an
F where ✏i ⇠ N (0, i). The training data for our regressio
i=1. In other words,
response function is simply a mapping from the input space t
a measurable performance metric that produces interval-scale
3) Performance distribution: For the performance mode
we measured and associated the performance response to eac
configuration, now let introduce another concept where w
vary the environment and we measure the performance. A
empirical performance distribution is a stochastic process
pd : E ! (R), that defines a probability distribution ove
performance measures for each environmental conditions. T
construct a performance distribution for a system A wit
configuration space F, similarly to the process of derivin
the performance models, we run A on various combination
configurations xi 2 F, for a specific environment instanc
e 2 E and record the resulting performance values yi. We the
fit a probability distribution to the set of measured performanc
values De = {yi} using kernel density estimation [2] (in th
Extract Reuse
Learn Learn
Theoretical
Principles
of Transfer
Learning

We conducted an exploratory/empirical study
• 10 hypotheses (assumptions about “relatedness”)
• 4 real-world configurable systems
• 36 comparisons of environmental changes
• Statistical analyses to verify the hypotheses
32
Hardware Workloads Versions

• RQ1: Does the performance behavior stay consistent?
• RQ2: Is the influence of options on performance consistent?
• RQ3: Are the interactions among options preserved?
• RQ4: Are the invalid configurations similar across environments?
33
Our research questions are about across
environments
similarities

Our research questions are about across
environments
• RQ1: Does the performance behavior stay consistent?
• RQ2: Is the influence of options on performance consistent?
• RQ3: Are the interactions among options preserved?
• RQ4: Are the invalid configurations similar across environments?
34
Similarity across environments matters!
similarities

Subject systems we investigated
36
SPEAR (SAT Solver)
Analysis time
14 options
16,384 configurations
SAT problems
3 hardware
2 versions
X264 (video encoder)
Encoding time
16 options
Video quality/size
2 hardware
3 versions
SQLite (DB engine)
Query time
14 options
DB Queries
2 hardware
2 versions
SaC (Compiler)
Execution time
50 options
10 Demo programs

38
TABLE II: Results indicate that there exist several forms of knowledge that can be transfered across environments and can be used in transfer learning.
RQ1 RQ2 RQ3 RQ4
H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H3.2 H4.1 H4.2
Environment ES M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25 1.00 0.47 0.45 1 1.00
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21 0.98 0.48 0.45 1 0.98
ec3 : [h1, w1 ! w2, v2] L 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22 0.99 0.45 0.45 1 1.00
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22 0.99 0.45 0.49 1 0.94
ec5 : [h1, w1, v2 ! v1] S 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7 0.33 0.45 0.50 1 0.96
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7 0.31 0.50 0.45 1 0.96
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 -0.44 0.47 0.50 1 0.97
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18 1.00 0.49 0.49 1 1
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24 1.00 0.49 0.49 1 1
ec3 : [h1, w1 ! w2, v3] M 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22 0.96 0.49 0.49 1 1
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20 0.96 0.49 0.49 1 1
ec5 : [h1, w3, v2 ! v3] L 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29 1.00 0.49 0.49 1 1
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27 0.99 0.49 0.49 1 1
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20 0.94 0.49 0.49 1 1
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18 0.94 0.49 0.49 1 1
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8 1.00 N/A N/A N/A N/A
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9 1.00 N/A N/A N/A N/A
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7 0.99 N/A N/A N/A N/A
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2 1.00 N/A N/A N/A N/A
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1 12 11 7 0.99 N/A N/A N/A N/A
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31 7 11 6 0.96 N/A N/A N/A N/A
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31 7 13 6 0.97 N/A N/A N/A N/A
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nbody750, w9 : gc, w10 : cg
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88 82 73 52 0.27 0.18 0.17 0.88 0.73
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91 82 63 50 0.56 0.18 0.12 0.90 0.84
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96 37 64 34 0.94 0.16 0.15 0.26 0.88
ec4 : [h1, w1 ! w5, v1] L 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94 34 58 25 0.04 0.15 0.22 0.19 -0.29
ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.86 17 11 9 1 0.95 79 61 47 0.55 0.27 0.13 0.83 0.88
ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.91 14 11 8 0 0.85 64 65 31 -0.40 0.13 0.15 0.12 0.64
ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.71 14 14 8 0 0.88 67 59 29 0.05 0.21 0.22 0.09 -0.13
ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.91 14 13 9 1 0.88 57 67 36 0.34 0.11 0.14 0.05 0.67
ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.64 16 10 9 0 0.90 51 58 35 -0.52 0.11 0.21 0.06 -0.41
ec10 : [h1, w4 ! w5, v1] L 0.70 4.85 0.76 0.00 0.75 12 12 11 0 0.95 58 57 43 0.29 0.14 0.20 0.64 -0.14
ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.88 36 30 28 2 0.89 109 164 102 0.96 N/A N/A N/A N/A
ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.77 22 21 18 3 0.69 237 226 94 0.86 N/A N/A N/A N/A
ES: Expected severity of change (Sec. III-B): S: small change; SM: small medium change; M: medium change; L: large change; VL: very large change.
SaC workload descriptions: srad: random matrix generator; pfilter: particle filtering; hotspot: heat transfer differential equations; k-means: clustering; nw: optimal matching;
nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector. Hardware descriptions (ID: Type/CPUs/Clock (GHz)/RAM (GiB)/Disk):
h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon/1/2.4/1/SSD; h5: Amazon/1/2.4/0.5/SSD; h6: Azure/1/2.4/3/SCSI
Metrics: M1: Pearson correlation; M2: Kullback-Leibler (KL) divergence; M3: Spearman correlation; M4/M5: Perc. of top/bottom conf.; M6/M7: Number of influential options;
M8/M9: Number of options agree/disagree; M10: Correlation btw importance of options; M11/M12: Number of interactions; M13: Number of interactions agree on effects;
M14: Correlation btw the coeffs; M15/M16: Perc. of invalid conf. in source/target; M17: Perc. of invalid conf. common btw environments; M18: Correlation btw coeffs

RQ1: Does the performance behavior stay
consistent across environments?
Environmental change Sev
erity
Lin.
corr.
SPEAR
NUC/2 à NUC/4 S 1.00
Amazon_nano à NUC L 0.59
Hardware/workload/version VL -0.10
x264
Version L 0.06
Workload M 0.65
SQLite
write-seq à write-batch S 0.96
read-rand à read-seq M 0.50
40
log P (θ, Xobs )
Θ
P (θ|Xobs )
Θ
log P(θ, Xobs )
Θ
log P(θ, Xobs )
Θ
log P(θ, Xobs )
Θ
P(θ|Xobs) P(θ|Xobs) P(θ|Xobs)
Throughput
𝑓; = 𝛼×𝑓= + β
𝑓;
𝑓=
Insight: We observed a linear shift only for non-severe hardware changes

We observed similar performance distribution
across environments
44
Environmental change Seve
rity
Lin.
corr.
Diver
gence
x264
Version L 0.05 1.64
Workload/version L 0.08 1.03
SQLite
write-seq à write-batch VL 0.51 2.80
read-rand à read-seq M 0.50 1.24
SaC
Workload VL 0.06 3.68
0 100 200 300 400
Runtime [s]
0
100
200
300
400
500
Frequency
0 100 200 300 400
Runtime [s]
0
100
200
300
400
500
Frequency
0 2 4 6 8 10
Runtime [s]
0
500
1000
1500
2000
2500
3000
3500
4000
Frequency
0 2 4 6 8 10
Runtime [s]
0
500
1000
1500
2000
2500
3000
3500
4000
Frequency
(a) (b)
(c) (d)
Insight: For severe changes, the performance distributions are similar, showing
the potential for learning a non-linear transfer function.

RQ2: Is the influence of configuration options on
performance consistent across environments?
48
Environmental change Sev
erity
Dim Paired t-test
S T
x264
Version L
16
12 10
Hardware/workload/ver VL 8 9
SQLite
write-seq à write-batch VL
14
3 4
read-rand à read-seq M 1 1
SaC
Workload VL 50 16 10
Insight: Only a subset of options is influential which is
largely preserved across all environment changes.
𝑪 = < 𝒐 𝟏, 𝒐 𝟐, 𝒐 𝟑, 𝒐 𝟒, 𝒐 𝟓, 𝒐 𝟔, 𝒐 𝟕 >

RQ3: Are the interactions among configuration
options preserved across environments?
52
rity
Dim Interactions Corr.
S T
SPEAR
Amazon_nano à NUC L 14 41 27 0.98
x264
Version L 16 47 33 1
SQLite
Workload VL 50 109 164 0.96
fs = … - 7o1o2 + 2o1o3 – 0.2o2o3
ft = … - 6o1o2 + 2o1o3 – 0.1o2o3
Insight: Only a subset of option interactions is influential
which is largely preserved across all environment changes.

RQ4: Are the invalid configurations in the
source also invalid in the target environment?
55
rity
Invalid Corr.
S T
SPEAR
Workload/version L 50% 45% 0.96
Hardware/workload/ver VL 47% 50% 0.97
15
20
25
18
20
22
24
5 10 15 20 25
5
10
15
20
25
8
10
12
14
16
18
20
22
24
26
5 10 15 20 25
5
10
15
20
25
5
10
15
20
25
30
CPU usage [%] CPU usage [%]
(a) (b)
Prediction without transfer learning
15
20
25
20
25
Prediction with transfer learning
Insight: A moderate percentage of configurations are invalid across environments.

Data is publically available
56
https://github.com/pooyanjamshidi/ase17
Preprint: https://arxiv.org/abs/1709.02280

58
Simple
Change
Severe
Change
Findings Implications
- Strong correlations
- Similar performance distributions
- Similar options or interactions
- High invalid configurations
- Simple transfer learning
- Establish non-linear relation
- Focus on interesting regions

Implications: This study opens up several
future research opportunities
59
Sampling Learning Performance testing
number of counters
number of splitters
latency(ms)
100
150
1
200
250
2
300
Cubic Interpolation Over Finer Grid
243 684 10125 14166 18
Performance tuning
Similarity across environments matters!

61
Many systems are now configurable
Here is when transfer learning comes to the
scene
DataModel
Transferable
Knowledge
II. INTUITION
II. INTUITION
e performance behavior of configurable
n enable (i) performance debugging, (ii)
(iii) design-time evolution, or (iv) runtime
lack empirical understanding of how the
r of a system will vary when the environ-
hanges. Such empirical understanding will
sights to develop faster and more accurate
that allow us to make predictions and
ormance for highly configurable systems
ments [10]. For instance, we can learn
r of a system on a cheap hardware in a
nment and use that to understand the per-
f the system on a production server before
user. More specifically, we would like to
ionship is between the performance of a
environment (characterized by software
are, workload, and system version) to the
environmental conditions.
we aim for an empirical understanding of
or to improve learning via an informed
n other words, we at learning a perfor-
anged environment based on a well-suited
s been determined by the knowledge we
ronments. Therefore, the main research
here exists a common information (trans-
liminary concepts
is section, we provide formal definitions of four con-
hat we use throughout this study. The formal notations
us to concisely convey concept throughout the paper.
Configuration and environment space: Let Fi indicate
h feature of a configurable system A which is either
d or disabled and one of them holds by default. The
uration space is mathematically a Cartesian product of
features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Fi) = {0, 1}. A configuration of a system is then
ber of the configuration space (feature space) where
parameters are assigned to a specific value in their
i.e., complete instantiations of the system’s parameters).
so describe an environment instance by 3 variables
w, h, v] drawn from a given environment space E =
⇥V , where they respectively represent sets of possible
for workload, hardware and system version.
erformance model: Given a software system A with
uration space F and environmental instances E, a per-
ce model is a black-box function f : F ⇥ E ! R
ome observations of the system performance for each
nation of system’s features x 2 F in an environment
. To construct a performance model for a system A
nfiguration space F, we run A in environment instance
on various combinations of configurations xi 2 F, and
the resulting performance values yi = f(xi) + ✏i, xi 2
re ✏i ⇠ N (0, i). The training data for our regression
is then simply Dtr = {(xi, yi)}n
se function is simply a mapping from the input space to
urable performance metric that produces interval-scaled
ere we assume it produces real numbers).
erformance distribution: For the performance model,
asured and associated the performance response to each
uration, now let introduce another concept where we
e environment and we measure the performance. An
cal performance distribution is a stochastic process,
mance measures for each environmental conditions. To
ct a performance distribution for a system A with
uration space F, similarly to the process of deriving
formance models, we run A on various combinations
urations xi 2 F, for a specific environment instance
and record the resulting performance values yi. We then
nfigurable
gging, (ii)
v) runtime
f how the
e environ-
nding will
e accurate
ctions and
e systems
can learn
ware in a
nd the per-
ver before
uld like to
mance of a
y software
on) to the
tanding of
informed
a perfor-
well-suited
wledge we
n research
on (trans-
ource and
be carried
ansferable
e consider
s a set of
mary vari-
rformance
understand
y will be
is kind of
this area
etermined
d software
e input of
the performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
e 2 E and record the resulting performance values yi. We then
Extract Reuse
Learn Learn
12
• An ML approach
• Uses the knowledge
learned on the source
• To learn a cheaper
model for the target
Hypotheses were categorized in 4 research
questions
24
log P(θ, Xobs )
Θ
log P(θ, Xobs )
Θ
log P(θ, Xobs )
Θ
P(θ|Xobs)
Θ
P(θ|Xobs)
Θ
P(θ|Xobs)
Θ
Figure 5: The first column shows the log joint probability and the corresponding posterior. In the second column we
have estimates of the log joint and the posterior for uniformly spaced points. In the third column we have the same
except that more points were chosen in high likelihood regions.
AGPR will query point (x). However, given sufficient smoothness, we know that the joint probability will be very low
there after exponentiation due to points (3) and (4). Therefore, the BAPE active learner will not be as interested in (x)
as AGPR. Observe that the uncerainty at (x) is large in the log joint probability space in comparison to the uncertainty
elsewhere; however, in the probability space this is smaller than the uncertainty at the high probability regions. As
Figure 5 indicates, while we model the log joint probability as a GP we are more interested in the uncertainty model
logP(θ, Xobs)
Θ
logP(θ, Xobs)
Θ
logP(θ, Xobs)
Θ
P(θ|Xobs)
Θ
P(θ|Xobs)
Θ
P(θ|Xobs)
Θ
Figure 5: The first column shows the log joint probability and the corresponding posterior. In the second column we
have estimates of the log joint and the posterior for uniformly spaced points. In the third column we have the same
5 10 15 20 25
5
10
15
20
25
14
16
18
20
22
24
5 10 15 20 25
5
10
15
20
25
8
10
12
14
16
18
20
22
24
26
5 10 15 20 25
5
10
15
20
25
5
10
15
20
25
30
CPU usage [%] CPU usage [%]
(a) (b)
(c) (d)
Prediction without transfer learning
5 10 15 20 25
5
10
15
20
25
10
15
20
25
Prediction with transfer learning
RQ1: consistent across
environments
RQ2: influence of
configuration options
fs = … - 7o1o2 + …
ft = … - 3o1o2 + …
RQ3: Option interactions RQ4: Invalid configurations
Building performance models is expensive
11
Measure

Transfer Learning for Software Performance Analysis: An Exploratory Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transfer Learning for Software Performance Analysis: An Exploratory Analysis

Similar to Transfer Learning for Software Performance Analysis: An Exploratory Analysis (20)

More from Pooyan Jamshidi

More from Pooyan Jamshidi (20)

Recently uploaded

Recently uploaded (20)

Transfer Learning for Software Performance Analysis: An Exploratory Analysis