SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Transfer	Learning	for	Improving	Model	Predictions	
in	Highly	Configurable	Software
Pooyan Jamshidi, Miguel Velez, Christian Kästner
Norbert Siegmund, Prasad Kawthekar
My	background	and	objectives
• PhD	(2010-13):	Fuzzy	control	for	auto-scaling	in	cloud
• Postdoc	1	(2014):	Fuzzy	Q-Learning	for	online	rule	update,	MAPE-KE
• Postdoc	2	(2015-16):	Configuration	optimization	and	performance	
modeling	for	big	data	systems	(Stream	processing,	Map-Reduce,	NoSQL)
• Now:	Transfer	learning	for	making	performance	model	predictions	more	
accurate	and	more	reliable	for	highly	configurable	software
Objectives:
• Opportunities	for	applying	model	learning	at	runtime
• Collaborations
2
Motivation
1- Many different
Parameters =>
- large state space
- interactions
2- Defaults are
typically used =>
- poor performance
Motivation
0 1 2 3 4 5
average read latency (µs) ×104
0
20
40
60
80
100
120
140
160
observations
1000 1200 1400 1600 1800 2000
average read latency (µs)
0
10
20
30
40
50
60
70
observations
1
1
(a) cass-20 (b) cass-10
Best configurations
Worst configurations
Experiments on
Apache Cassandra:
- 6 parameters, 1024 configurations
- Average read latency
- 10 millions records (cass-10)
- 20 millions records (cass-20)
Look at these outliers!
Large statistical dispersion
Problem-solution	overview
• The	robotic	software	is	considered	as	a	highly	configurable	system.
• The	configuration	of	the	robot	influence	the	performance (as	well	as	
energy	usage).
• Problem:	there	are	many	different	parameters	making	the	configuration	
space	highly	dimensional	and	difficult	to	understand	the	influence	of	
configuration	on	system	performance.
• Solution:	learn	a	black-box	performance	model	using	measurements	from	
the	robot.	performance_value=f(configuration)
• Challenge:	Measurements	from	real	robot	is	expensive	(time	consuming,	
require	human	resource,	risky	when	fail)
• Our	contribution:	performing	most	measurements	on	simulator	(Gazebo)	
and	taking	only	few	samples	from	the	real	robot	to	learn	a	reliable	and	
accurate	performance	model	given	a	certain	experimental	budget.		 5
Overview	of	our	approach
ML Model
Learn
Model
Measure
Measure
Data
Source
Target
Simulator (Gazebo) Robot (TurtleBot)
Predict
Performance
Predictions
Adaptation
Reason
6
Overview	of	our	approach
ML Model
Learn
Model
Measure
Measure
Data
Source
Target
Simulator (Gazebo) Robot (TurtleBot)
Predict
Performance
Predictions
Adaptation
Reason
Data
7
Overview	of	our	approach
We	get	more cheaper samples	from	the	simulator	and	only	few	expensive	from	the	
real	robot	to	learn	an	accurate	model	to	predict	the	performance	of	the	robot.
Here,	performance	may	denote	“battery	usage”,	“localization	error”	or	“mission	time”.
ML Model
Learn
Model
Measure
Measure
Data
Source
Target
Simulator (Gazebo) Robot (TurtleBot)
Predict
Performance
Predictions
Adaptation
Reason
8
Goal
9
m(Xi). In general, Xi may either indicate (i) integer vari-
e such as level of parallelism or (ii) categorical variable
ch as messaging frameworks or Boolean variable such as
abling timeout. We use the terms parameter and factor in-
changeably; also, with the term option we refer to possible
ues that can be assigned to a parameter.
We assume that each configuration x 2 X in the configura-
n space X = Dom(X1) ⇥ · · · ⇥ Dom(Xd) is valid, i.e., the
tem accepts this configuration and the corresponding test
ults in a stable performance behavior. The response with
nfiguration x is denoted by f(x). Throughout, we assume
at f(·) is latency, however, other metrics for response may
used. We here consider the problem of finding an optimal
nfiguration x⇤
that globally minimizes f(·) over X:
x⇤
= arg min
x2X
f(x) (1)
n fact, the response function f(·) is usually unknown or
rtially known, i.e., y = f(x ), x ⇢ X. In practice, such
it still requires hu
per, we propose t
with the search e
than starting the
the learned know
software to accel
version. This idea
in real software e
in DevOps di↵er
tinuously, (ii) Big
frameworks (e.g.,
on similar platfor
versions of a syste
To the best of o
the possibility of
The authors learn
of a system and
systems. However
(x). Throughout, we assume
her metrics for response may
problem of finding an optimal
minimizes f(·) over X:
min
2X
f(x) (1)
n f(·) is usually unknown or
i), xi ⇢ X. In practice, such
ise, i.e., yi = f(xi) + ✏. The
onfiguration is thus a black-
ject to noise [27, 33], which
terministic optimization. A
sampling that starts with a
ons. The performance of the
s initial samples can deliver
and guide the generation of
properly guided, the process
n-feedback-regeneration will
ptimal configuration will be
frameworks (e.g., Apache Hadoop, Spark, Kafka) an
on similar platforms (e.g., cloud clusters), (iii) and di
versions of a system often share a similar business log
To the best of our knowledge, only one study [9] ex
the possibility of transfer learning in system configu
The authors learn a Bayesian network in the tuning p
of a system and reuse this model for tuning other s
systems. However, the learning is limited to the struc
the Bayesian network. In this paper, we introduce a m
that not only reuse a model that has been learned prev
but also the valuable raw data. Therefore, we are not l
to the accuracy of the learned model. Moreover, we
consider Bayesian networks and instead focus on MT
2.4 Motivation
A motivating example. We now illustrate the pr
points on an example. WordCount (cf. Figure 1) is a p
benchmark [12]. WordCount features a three-layer ar
ture that counts the number of words in the incoming s
Partially known
Taking measurements
Configuration space
A black-box response function f : X ! R is used
to build a performance model given some observations of
the system performance under different settings. In practice,
though, the observation data may contain noise, i.e., yi =
f(xi) + ✏i, xi 2 X where ✏i ⇠ N (0, i) and we only
partially know the response function through the observations
D = {(xi, yi)}d
i=1, |D| ⌧ |X|. In other words, a response
function is simply a mapping from the configuration space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers). Note that
we can learn a model for all measurable attributes (including
accuracy, safety if suitably operationalized), but here we
mostly train predictive models on performance attributes (e.g.,
response time, throughput, CPU utilization).
Our main goal is to learn a reliable regression model, ˆf(·),
operations including refl
guide the sample generat
(QOG) [26] speeds up
some heuristics to filter
Machine learning. Als
niques, such as support-
evolutionary algorithms
approaches trade simpl
learned models for predic
work [39] exploited a ch
the configurable software
only a small sample size.
this fact, but iteratively c
ing performance influenc
2) Positioning in the s
easurable performance metric that produces interval-scaled
a (here we assume it produces real numbers). Note that
can learn a model for all measurable attributes (including
uracy, safety if suitably operationalized), but here we
stly train predictive models on performance attributes (e.g.,
ponse time, throughput, CPU utilization).
Our main goal is to learn a reliable regression model, ˆf(·),
can predict the performance of the system, f(·), given a
ted number of observations D. More specifically, we aim
minimize the prediction error over the configuration space:
arg min
x2X
pe = | ˆf(x) f(x)| (1)
n order to solve the problem above, we assume f(·) is
onably smooth, but otherwise little is known about the
learned models
work [39] expl
the configurab
only a small sa
this fact, but ite
ing performanc
2) Positioni
reasoning is a
Time series tec
response time
of time. FUSI
ships (e.g., fea
configuration
feasible. Diffe
for the purpose
function f : X ! R is used
model given some observations of
nder different settings. In practice,
ata may contain noise, i.e., yi =
ere ✏i ⇠ N (0, i) and we only
e function through the observations
⌧ |X|. In other words, a response
ng from the configuration space to
metric that produces interval-scaled
produces real numbers). Note that
all measurable attributes (including
ly operationalized), but here we
dels on performance attributes (e.g.,
CPU utilization).
n a reliable regression model, ˆf(·),
operations including reflection, expansion, and contraction to
guide the sample generation. Quick Optimization via Guessing
(QOG) [26] speeds up the optimization process exploiting
some heuristics to filter out sub-optimal configurations.
Machine learning. Also, standard machine-learning tech-
niques, such as support-vector machines, decision trees, and
evolutionary algorithms have been tried [20], [38]. These
approaches trade simplicity and understandability of the
learned models for predictive power. For example, some recent
work [39] exploited a characteristic of the response surface of
the configurable software to learn Fourier sparse functions by
only a small sample size. Another approach [31] also exploited
this fact, but iteratively construct a regression model represent-
ing performance influences in an active learning process.
2) Positioning in the self-adaptive community: Performance
Response function to learn
In practice,
e, i.e., yi =
nd we only
observations
, a response
tion space to
terval-scaled
s). Note that
es (including
but here we
ributes (e.g.,
model, ˆf(·),
f(·), given a
ally, we aim
ration space:
(1)
(QOG) [26] speeds up the optimization process exploiting
some heuristics to filter out sub-optimal configurations.
Machine learning. Also, standard machine-learning tech-
niques, such as support-vector machines, decision trees, and
evolutionary algorithms have been tried [20], [38]. These
approaches trade simplicity and understandability of the
learned models for predictive power. For example, some recent
work [39] exploited a characteristic of the response surface of
the configurable software to learn Fourier sparse functions by
only a small sample size. Another approach [31] also exploited
this fact, but iteratively construct a regression model represent-
ing performance influences in an active learning process.
2) Positioning in the self-adaptive community: Performance
reasoning is a key activity for decision making at runtime.
Time series techniques [10] shown to be effective in predicting
response time and uncovering performance anomalies ahead
of time. FUSION [12], [11] exploited inter-feature relation-
ships (e.g., feature dependencies) to reduce the dimensions of
Learn model
Cost	model
10
constructed using the target and the source samples. The new
model is based on only 3 target measurements while providing
accurate predictions. (d) [negative transfer] a regression model
is constructed with transfer from a misleading response func-
tion and therefore the model prediction is inaccurate.
For the choice of number of samples from the source and
target, we use a simple cost model (cf. Figure 1) that is based
on the number of source and target samples as follows:
C(Ds, Dt) = cs · |Ds| + ct · |Dt| + CT r(|Ds|, |Dt|), (2)
C(Ds, Dt)  Cmax, (3)
where Cmax is the experimental budget and CT r(|Ds|, |Dt|)
is the cost of model training, which is a function of source
s = g(xs) + ✏s, xs 2 X}, and transfer these obser-
earn a performance model for the real system using
bservations from that system, Dt = {(xt, yt)|yt =
xt 2 X}. The cost of obtaining each observation on
system is ct, and we assume that cs ⌧ ct and that
response function, g(·), is related (correlated) to the
tion, and this relatedness contributes to the learning
get function, using only sparse samples from the
m, i.e., |Dt| ⌧ |Ds|. Intuitively, rather than starting
ch, we transfer the learned knowledge from the data
ollected using cheap methods such as simulators
s system versions, to learn a more accurate model
target configurable system and use this model to
ut its performance at runtime.
e 2, we demonstrate the idea of transfer learning
thetic example. Only three observations are taken
(c)
Fig. 2: (a) 9 samples h
spectively 3 from a targ
regression model is con
ples. (c) [with transfer l
constructed using the tar
model is based on only 3
accurate predictions. (d)
is constructed with trans
tion and therefore the m
ng
gn
ng
m
b-
in
at
e-
ms
nd
ed
III. COST-AWARE TRANSFER LEARNING
In this section, we explain our cost-aware transfer learning
solution to improve learning an accurate performance model.
A. Solution overview
Ds = {(xs, ys)|ys = g(xs) + ✏s, xs 2 X}, and transfer
these observations to learn a performance model for the real
system using only few observations from that system, Dt =
{(xt, yt)|yt = f(xt) + ✏t, xt 2 X}. The cost of obtaining
each observation on the target system is ct, and we assume
that cs ⌧ ct and that the source response function, g(·), is
. COST-AWARE TRANSFER LEARNING
ction, we explain our cost-aware transfer learning
mprove learning an accurate performance model.
overview
xs, ys)|ys = g(xs) + ✏s, xs 2 X}, and transfer
ations Dt = {(xt, yt)|yt = f(xt) + ✏t, xt 2 X}.
obtaining each observation on the target system is
assume that cs ⌧ ct and that the source response
·), is related (correlated) to the target function,
atedness contributes to the learning of the target
In this section, we explain our cost-aware transfer learning
lution to improve learning an accurate performance model.
Solution overview
Ds = {(xs, ys)|ys = g(xs) + ✏s, xs 2 X}, and transfer
ese observations Dt = {(xt, yt)|yt = f(xt) + ✏t, xt 2 X}.
he cost of obtaining each observation on the target system is
and we assume that cs ⌧ ct and that the source response
nction, g(·), is related (correlated) to the target function,
d this relatedness contributes to the learning of the target
Source observations
Target observations
Assumption
Budget
Total cost
Synthetic	
example
(a) Source	and	target	
samples
(b) Learning	without	
transfer
(c) Learning	with	
transfer	
(d) Negative	transfer
1 3 5 9 10 11 12 13 14
0
50
100
150
200
source samples
target samples
(a) (b)
(c) (d) 11
Assumption:	Source	and	target	are	correlated
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
10
4
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
10
4
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
10
4
(a) (b) (c)
1.5
2
2.5
3
104
(d)0 2 4 6 8 10
-0.5
0
0.5
1
1.5
2
2.5
10
4
1.5
2
2.5
3
104
1.5
2
2.5
3
104
ntime
it is
cisely
using
ration
i). In
uch as
rithm
binary
There-
tesian
terest
space
R is
ations
✓ X.
noise,
words,
from
c that
s real
dimensional spaces. Relevant to software engineering commu-
nity, several approaches tried different experimental designs
for highly configurable software [14] and some even consider
cost as an explicit factor to determine optimal sampling [26].
Recently, researchers have tried novel way of sampling with
a feedback embedded inside the process where new samples
are derived based on information gain from previous set of
samples. Recursive Random Sampling (RRS) [33] integrates
a restarting mechanism into the random sampling to achieve
high search efficiency. Smart Hill Climbing (SHC) [32] inte-
grates the importance sampling with Latin Hypercube Design
(lhd). SHC estimates the local regression at each potential
region, then it searches toward the steepest descent direction.
An approach based on direct search [35] forms a simplex in
the parameter space by a number of samples, and iteratively
updates a simplex through a number of well-defined operations
including reflection, expansion, and contraction to guide the
sample generation. Quick Optimization via Guessing (QOG)
in [23] speeds up the optimization process exploiting some
heuristics to filter out sub-optimal configurations. Some recent
work [34] exploited a characteristic of the response surface of
the configurable software to learn Fourier sparse functions by
only a small sample size. Another approach also exploited this
fact, but iteratively construct a regression model representing
performance influences in an active learning process.
time constrained environments (e.g., runtime
making in a feedback loop for robots), it is
to select the sources purposefully.
ormulation: Model learning
ntroduce the concepts in our approach precisely
we define the model learning problem using
notation. Let Xi indicate the i-th configuration
hich ranges in a finite domain Dom(Xi). In
ay either indicate (i) an integer variable such as
iterative refinements in a localization algorithm
orical variable such as sensor names or binary
local vs global localization method). There-
figuration space is mathematically a Cartesian
of the domains of the parameters of interest
) ⇥ · · · ⇥ Dom(Xd).
ation x resides in the design parameter space
black-box response function f : X ! R is
a performance model given some observations
performance under different settings, D ✓ X.
hough, such measurements may contain noise,
xi) + ✏i where ✏i ⇠ N (0, i). In other words,
e model is simply a function (mapping) from
space to a measurable performance metric that
val-scaled data (here we assume it produces real
dimensional spaces. Relevant to software engineering commu-
nity, several approaches tried different experimental designs
for highly configurable software [14] and some even consider
cost as an explicit factor to determine optimal sampling [26].
Recently, researchers have tried novel way of sampling with
a feedback embedded inside the process where new samples
are derived based on information gain from previous set of
samples. Recursive Random Sampling (RRS) [33] integrates
a restarting mechanism into the random sampling to achieve
high search efficiency. Smart Hill Climbing (SHC) [32] inte-
grates the importance sampling with Latin Hypercube Design
(lhd). SHC estimates the local regression at each potential
region, then it searches toward the steepest descent direction.
An approach based on direct search [35] forms a simplex in
the parameter space by a number of samples, and iteratively
updates a simplex through a number of well-defined operations
including reflection, expansion, and contraction to guide the
sample generation. Quick Optimization via Guessing (QOG)
in [23] speeds up the optimization process exploiting some
heuristics to filter out sub-optimal configurations. Some recent
work [34] exploited a characteristic of the response surface of
the configurable software to learn Fourier sparse functions by
only a small sample size. Another approach also exploited this
fact, but iteratively construct a regression model representing
performance influences in an active learning process.
in some time constrained environments (e.g., runtime
decision making in a feedback loop for robots), it is
important to select the sources purposefully.
C. Problem formulation: Model learning
In order to introduce the concepts in our approach precisely
and concisely, we define the model learning problem using
mathematical notation. Let Xi indicate the i-th configuration
parameter, which ranges in a finite domain Dom(Xi). In
general, Xi may either indicate (i) an integer variable such as
the number of iterative refinements in a localization algorithm
or (ii) a categorical variable such as sensor names or binary
options (e.g., local vs global localization method). There-
fore, the configuration space is mathematically a Cartesian
product of all of the domains of the parameters of interest
X = Dom(X1) ⇥ · · · ⇥ Dom(Xd).
A configuration x resides in the design parameter space
x 2 X. A black-box response function f : X ! R is
used to build a performance model given some observations
of the system performance under different settings, D ✓ X.
In practice, though, such measurements may contain noise,
i.e., yi = f(xi) + ✏i where ✏i ⇠ N (0, i). In other words,
a performance model is simply a function (mapping) from
configuration space to a measurable performance metric that
produces interval-scaled data (here we assume it produces real
dimensional spaces. Relevant to software engineering commu-
nity, several approaches tried different experimental designs
for highly configurable software [14] and some even consider
cost as an explicit factor to determine optimal sampling [26].
Recently, researchers have tried novel way of sampling with
a feedback embedded inside the process where new samples
are derived based on information gain from previous set of
samples. Recursive Random Sampling (RRS) [33] integrates
a restarting mechanism into the random sampling to achieve
high search efficiency. Smart Hill Climbing (SHC) [32] inte-
grates the importance sampling with Latin Hypercube Design
(lhd). SHC estimates the local regression at each potential
region, then it searches toward the steepest descent direction.
An approach based on direct search [35] forms a simplex in
the parameter space by a number of samples, and iteratively
updates a simplex through a number of well-defined operations
including reflection, expansion, and contraction to guide the
sample generation. Quick Optimization via Guessing (QOG)
in [23] speeds up the optimization process exploiting some
heuristics to filter out sub-optimal configurations. Some recent
work [34] exploited a characteristic of the response surface of
the configurable software to learn Fourier sparse functions by
only a small sample size. Another approach also exploited this
fact, but iteratively construct a regression model representing
performance influences in an active learning process.
in some time constrained environments (e.g., runtime
decision making in a feedback loop for robots), it is
important to select the sources purposefully.
C. Problem formulation: Model learning
In order to introduce the concepts in our approach precisely
and concisely, we define the model learning problem using
mathematical notation. Let Xi indicate the i-th configuration
parameter, which ranges in a finite domain Dom(Xi). In
general, Xi may either indicate (i) an integer variable such as
the number of iterative refinements in a localization algorithm
or (ii) a categorical variable such as sensor names or binary
options (e.g., local vs global localization method). There-
fore, the configuration space is mathematically a Cartesian
product of all of the domains of the parameters of interest
X = Dom(X1) ⇥ · · · ⇥ Dom(Xd).
A configuration x resides in the design parameter space
x 2 X. A black-box response function f : X ! R is
used to build a performance model given some observations
of the system performance under different settings, D ✓ X.
In practice, though, such measurements may contain noise,
i.e., yi = f(xi) + ✏i where ✏i ⇠ N (0, i). In other words,
a performance model is simply a function (mapping) from
configuration space to a measurable performance metric that
produces interval-scaled data (here we assume it produces real
dimensional spaces. Relevant to s
nity, several approaches tried dif
for highly configurable software [
cost as an explicit factor to deter
Recently, researchers have tried
a feedback embedded inside the
are derived based on information
samples. Recursive Random Sam
a restarting mechanism into the
high search efficiency. Smart Hil
grates the importance sampling w
(lhd). SHC estimates the local
region, then it searches toward th
An approach based on direct sea
the parameter space by a numbe
updates a simplex through a numb
including reflection, expansion, a
sample generation. Quick Optimi
in [23] speeds up the optimizati
heuristics to filter out sub-optimal
work [34] exploited a characterist
the configurable software to learn
only a small sample size. Another
fact, but iteratively construct a re
performance influences in an acti
a) Source	may	be	a	shift	to	the	target
b) Source	may	be	a	noisy	version	of	the	target
c) Source	may	be	different	to	target	in	some	extreme	positions	in	the	configuration	space
d) Source	may	be	irrelevant	to	the	target	->	negative	transfer	will	happen!
12
Correlations
100
150
1
200
250
Latency(ms)
300
2 5
3 104
5 15
6
14
16
18
20
1
22
24
26
Latency(ms)
28
30
32
2 53 104
5 156
number of countersnumber of splitters number of countersnumber of splitters
2.8
2.9
1
3
3.1
3.2
3.3
2
Latency(ms)
3.4
3.5
3.6
3 5
4 10
5 15
6
1.2
1.3
1.4
1
1.5
1.6
1.7
Latency(ms)
1.8
1.9
2 53
104
5 156
(a) WordCount v1
(b) WordCount v2
(c) WordCount v3 (d) WordCount v4
(e) Pearson correlation coefficients
(g) Measurement noise across WordCount versions
(f) Spearman correlation coefficients
correlation coefficient
p-value
v1 v2 v3 v4
500
600
700
800
900
1000
1100
1200
Latency(ms)
hardware change
softwarechange
Table 1: My caption
v1 v2 v3 v4
v1 1 0.41 -0.46 -0.50
v2 7.36E-06 1 -0.20 -0.18
v3 6.92E-07 0.04 1 0.94
v4 2.54E-08 0.07 1.16E-52 1
Table 2: My caption
v1 v2 v3 v4
v1 1 0.49 -0.51 -0.51
v2 5.50E-08 1 -0.2793 -0.24
v3 1.30E-08 0.003 1 0.88
v4 1.40E-08 0.01 8.30E-36 1
Table 3: My caption
ver. µ µ
v1 516.59 7.96 64.88
v1 v2 v3 v4
v1 1 0.41 -0.46 -0.50
v2 7.36E-06 1 -0.20 -0.18
v3 6.92E-07 0.04 1 0.94
v4 2.54E-08 0.07 1.16E-52 1
Table 2: My caption
v1 v2 v3 v4
v1 1 0.49 -0.51 -0.51
v2 5.50E-08 1 -0.2793 -0.24
v3 1.30E-08 0.003 1 0.88
v4 1.40E-08 0.01 8.30E-36 1
Table 3: My caption
ver. µ µ
v1 516.59 7.96 64.88
v2 584.94 2.58 226.32
v3 654.89 13.56 48.30
v4 1125.81 16.92 66.56
- Different correlations
- Different optimum
Configurations
- Different noise level
Traditional	machine	learning	vs.	transfer	learning
Source	Domain
(cheap)
Target	Domain
(expensive)
Learning	
Algorithm
Different	Domains
Learning	
Algorithm
Learning	
Algorithm
Learning	
Algorithm
Traditional	Machine	Learning Transfer	Learning
Knowledge
Data
Source
Target
14
Traditional	machine	learning	vs.	transfer	learning
Source	Domain
(cheap)
Target	Domain
(expensive)
Learning	
Algorithm
Different	Domains
Learning	
Algorithm
Learning	
Algorithm
Learning	
Algorithm
Traditional	Machine	Learning Transfer	Learning
The	goal	of	transfer	learning	is	to	improve	learning	in	the	target	
domain	by	leveraging	knowledge	from	the	source	domain.	
Knowledge
Data
Source
Target
15
GP	for	modeling	black	box	response	function
true function
GP mean
GP variance
observation
selected point
true
minimum
mposed by its prior mean (µ(·) : X ! R) and a covariance
nction (k(·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
here covariance k(x, x0
) defines the distance between x
d x0
. Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
e collection of t experimental data (observations). In this
mework, we treat f(x) as a random variable, conditioned
observations S1:t, which is normally distributed with the
lowing posterior mean and variance functions [41]:
µt(x) = µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
2
t (x) = k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
here y := y1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
:= µ(x1:t), K := k(xi, xj) and I is identity matrix. The
ortcoming of BO4CO is that it cannot exploit the observa-
ns regarding other versions of the system and as therefore
nnot be applied in DevOps.
2 TL4CO: an extension to multi-tasks
TL4CO 1
uses MTGPs that exploit observations from other
evious versions of the system under test. Algorithm 1
fines the internal details of TL4CO. As Figure 4 shows,
4CO is an iterative algorithm that uses the learning from
her system versions. In a high-level overview, TL4CO: (i)
ects the most informative past observations (details in
ction 3.3); (ii) fits a model to existing data based on kernel
arning (details in Section 3.4), and (iii) selects the next
ork are based on tractable linear algebra.
evious work [21], we proposed BO4CO that ex-
task GPs (no transfer learning) for prediction of
tribution of response functions. A GP model is
y its prior mean (µ(·) : X ! R) and a covariance
·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
iance k(x, x0
) defines the distance between x
us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
n of t experimental data (observations). In this
we treat f(x) as a random variable, conditioned
ons S1:t, which is normally distributed with the
sterior mean and variance functions [41]:
µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
, K := k(xi, xj) and I is identity matrix. The
of BO4CO is that it cannot exploit the observa-
ng other versions of the system and as therefore
pplied in DevOps.
CO: an extension to multi-tasks
uses MTGPs that exploit observations from other
Motivations:
1- mean estimates + variance
2- all computations are linear algebra
3- good estimations when few data
K := 4
...
...
...
k(xt, x1) . . . k(xt, xt)
5 (7)
odels have shown to be effective for performance
ns in data scarce domains [20]. However, as we
monstrated in Figure 2, it may becomes inaccurate
e samples do not cover the space uniformly. For
onfigurable systems, we require a large number of
ions to cover the space uniformly, making GP models
ve in such situations.
el prediction using transfer learning
nsfer learning, the key question is how to make accu-
dictions for the target environment using observations
er sources, Ds. We need a measure of relatedness not
ween input configurations but between the sources
The relationships between input configurations was
in the GP models using the covariance matrix that
ned based on the kernel function in Eq. (7). More
lly, a kernel is a function that computes a dot product
ure of “similarity”) between two input configurations.
kernel helps to get accurate predictions for similar
ations. We now need to exploit the relationship be-
e source and target functions, g, f, using the current
ions Ds, Dt to build the predictive model ˆf. To capture
ionship, we define the following kernel function:
k(f, g, x, x0
) = kt(f, g) ⇥ kxx(x, x0
), (8)
other components can query the performance under specific
configurations or update the model given a new observation.
We use transfer learning to make the knowledge more accurate
using observations that are taken from a simulator or any
other cheap sources. For deciding how many observations
and from what source to transfer, we use the cost model
that we have introduced earlier. At runtime, the managed
system is Monitored by pulling the end-to-end performance
metrics (e.g., latency, throughput) from the corresponding
sensors. Then, the retrieved performance data needs to be
Analysed and the mean performance associated to a specific
setting of the system will be stored in a data repository.
Next, the GP model needs to be updated taking into account
the new performance observation. Having updated the GP
model, a new configuration may be Planned to replace the
current configuration. Finally, the new configuration will be
enacted by Executing appropriate platform specific operations.
This enables model-based knowledge evolution using machine
learning [2], [21]. The underlying GP model can now be
updated not only when a new observation is available but
also by transferring the learning from other related sources.
So at each adaptation cycle, we can update our belief about
the correct response given data from the managed system and
other related sources, accelerating the learning process.
IV. EXPERIMENTAL RESULTS
We evaluate the effectiveness and applicability of our
transfer learning approach for learning models for highly-
orate domain knowledge as prior, if available, which
nce the model accuracy [20].
er to describe the technical details of our transfer
methodology, let us briefly describe an overview of
l regression; a more detailed description can be found
e [35]. GP models assume that the function ˆf(x) can
reted as a probability distribution over functions:
y = ˆf(x) ⇠ GP(µ(x), k(x, x0
)), (4)
: X ! R is the mean function and k : X ⇥ X ! R
variance function (kernel function) which describes
onship between response values, y, according to the
of the input values x, x0
. The mean and variance of
model predictions can be derived analytically [35]:
= µ(x) + k(x)|
(K + 2
I) 1
(y µ), (5)
= k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x), (6)
(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)], I is
matrix and
K :=
2
6
4
k(x1, x1) . . . k(x1, xt)
...
...
...
k(xt, x1) . . . k(xt, xt)
3
7
5 (7)
odels have shown to be effective for performance
ns in data scarce domains [20]. However, as we
Eq. (5), (6) with the new K. The main essence of transfer
learning is, therefore, the kernel that capture the source and
target relationship and provide more accurate predictions using
the additional knowledge we can gain via the relationship
between source and target.
D. Transfer learning in a self-adaptation loop
Now that we have described the idea of transfer learning for
providing more accurate predictions, the question is whether
such an idea can be applied at runtime and how the self-
adaptive systems can benefit from it. More specifically, we
now describe the idea of model learning and transfer learning
in the context of self-optimization, where the system adapts
its configuration to meet performance requirements at runtime.
The difference to traditional configurable systems is that we
learn the performance model online in a feedback loop under
time and resource constraints. Such performance reasoning is
done more frequently for self-adaptation purposes.
An overview of a self-optimization solution is depicted
in Figure 3 following the well-know MAPE-K framework
[9], [23]. We consider the GP model as the K (knowledge)
component of this framework that acts as an interface to which
other components can query the performance under specific
configurations or update the model given a new observation.
We use transfer learning to make the knowledge more accurate
using observations that are taken from a simulator or any
The	case	where	we	learn	from	correlated	responses
-1.5 -1 -0.5 0 0.5 1 1.5
-4
-3
-2
-1
0
1
2
3
(a) 3 sample response functions
configuration domain
responsevalue
(1)
(2)
(3)
observations
(b) GP fit for (1) ignoring observations for (2),(3)
LCB
not informative
(c) multi-task GP fit for (1) by transfer learning from (2),(3)
highly informative
GP prediction mean
GP prediction variance
probability distribution
of the minimizers
Results
Evaluation
• Case	study	(robotics)
• Controlled	experiments	(highly	configurable	software)
19
Performance	
prediction
for	CoBot
5 10 15 20 25
5
10
15
20
25
14
16
18
20
22
24
5 10 15 20 25
5
10
15
20
25
8
10
12
14
16
18
20
22
24
26
5 10 15 20 25
5
10
15
20
25
5
10
15
20
25
30
CPU usage [%] CPU usage [%]
(a) (b)
(c) (d)
Prediction without transfer learning
5 10 15 20 25
5
10
15
20
25
10
15
20
25
Prediction with transfer learning
(a) Source	response	
function	(cheap)
(b) Target	response	
function	(expensive)
20
Performance	
prediction
for	CoBot
5 10 15 20 25
5
10
15
20
25
14
16
18
20
22
24
5 10 15 20 25
5
10
15
20
25
8
10
12
14
16
18
20
22
24
26
5 10 15 20 25
5
10
15
20
25
5
10
15
20
25
30
CPU usage [%] CPU usage [%]
(a) (b)
(c) (d)
Prediction without transfer learning
5 10 15 20 25
5
10
15
20
25
10
15
20
25
Prediction with transfer learning
(a) Source	response	
function	(cheap)
(b) Target	response	
function	(expensive)
(c) Prediction	without	
transfer	learning
(d) Prediction	with	
transfer	learning
21
Prediction	
accuracy	and	
model	reliability	
2
2
2
2
2
2
2
2
2
44
4
4
6
6
6
8
8 1012
14 16
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
5
5
5
5
5
5
5
5
5
10
10
10
10
15
15
15
20
20
25
303540
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
(a) (b)
0.5 0.5 0.5
1 1 1
1.5 1.5 1.5
2
2 2
2.5 2.5
2.5
3
3
3
3
3.5
3.5
3.5
3.5
4
4
4
4
4
4.5 4.5
4.5
5
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
0.005 0.005 0.005
0.01 0.01 0.01
0.015 0.015 0.0150.02 0.02 0.02
0.02
0.025
0.025
0.025 0.025
0.025
0.025
0.025
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.035
0.035
0.035
0.04
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
(c) (d)
(a) Prediction	error
• Trade-off	among	using	more	
source	or	target	samples	
(b) Model	reliability	
• Transfer	learning	contributes	
to	lower	the	prediction	
uncertainty
22
Prediction	
accuracy	and	
model	reliability	
2
2
2
2
2
2
2
2
2
44
4
4
6
6
6
8
8 1012
14 16
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
5
5
5
5
5
5
5
5
5
10
10
10
10
15
15
15
20
20
25
303540
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
(a) (b)
0.5 0.5 0.5
1 1 1
1.5 1.5 1.5
2
2 2
2.5 2.5
2.5
3
3
3
3
3.5
3.5
3.5
3.5
4
4
4
4
4
4.5 4.5
4.5
5
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
0.005 0.005 0.005
0.01 0.01 0.01
0.015 0.015 0.0150.02 0.02 0.02
0.02
0.025
0.025
0.025 0.025
0.025
0.025
0.025
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.035
0.035
0.035
0.04
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
(c) (d)
(a) Prediction	error
• Trade-off	among	using	more	
source	or	target	samples	
(b) Model	reliability	
• Transfer	learning	contributes	
to	lower	the	prediction	
uncertainty
(c) Training	overhead
(d) Evaluation	overhead
• Appropriate	for	runtime	
usage
23
Level	of	correlation	between	source	and	
target	is	important
10
20
30
40
50
60
AbsolutePercentageError[%]
Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
TABL
colum
datase
measu• Model	becomes	more	accurate	
when	the	source	is	more	
related	to	the	target
• Even	learning	from	a	source	
with	a	small	correlation	is	
better	than	no	transfer
24
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
5
10
15
20
25
30
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
12
14
16
18
20
22
24
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
6
8
10
12
14
16
18
20
22
24
(a) (b) (c)
(d) (e) 5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
12
14
16
18
20
22
24
(f)
CPU usage [%]
CPU usage [%] CPU usage [%]
CPU usage [%] CPU usage [%] CPU usage [%]
Prediction	accuracy
• The	model	provides	more	
accurate	predictions	as	we	
exploit	more	data	
• Transfer	learning	may	(i)	boost	
initial	performance,	(ii)	
increase	the	learning	speed,	
(iii)	lead	to	a	more	accurate	
performance	finally
correspondences. In much of the work on transfer learning, a human p
this mapping, but some methods provide ways to perform the mappin
matically. Another section of the chapter discusses work in this area.
performance
training
with transfer
without transfer
higher start
higher slope higher asymptote
Fig. 2. Three ways in which transfer might improve learning.
2
[Lisa	Torrey	and	Jude	Shavlik]	
26
Experimental	evaluations
• RQ1: How	much	does	transfer	learning	improve	the	prediction	
accuracy?	
• RQ2:	What	are	the	trade-offs	between	learning	with	different	
numbers	of	samples	from	source	and	target	in	terms	of	prediction	
accuracy?
• RQ3:	Is	our	transfer	learning	approach	applicable	in	the	context	of	
self-adaptive	systems	in	terms	of	model	training	and	evaluation	time?	
27
Subject	systems
• Robotic	software	(CoBot)
• Stream	processing	systems	(WordCount,	RollingSort,	SOL)	on	Apache	
Storm
• NoSQL	benchmark	on	Apache	Cassandra
28
Stream	Processing	Systems
Logical
View
Physical
View
pipe
Spout A Bolt A Bolt B
socket socket
out queue in queue
Worker A Worker B Worker C
out queue in queue
Kafka Spout Splitter Bolt Counter Bolt
(sentence) (word)
[paintings, 3]
[poems, 60]
[letter, 75]
Kafka Topic
Stream to
Kafka
File
(sentence)
(sentence)
(sentence)
Kafka Spout
RollingCount
Bolt
Intermediate
Ranking Bolt
(hashtags)
(hashtag,
count)
Ranking
Bolt
(ranking)
(trending
topics)Kafka Topic
Twitter to
Kafka
(tweet)
Twitter Stream
(tweet)
(tweet)
Storm Architecture
Word Count Architecture
• CPU intensive
Rolling Sort Architecture
• Memory intensive
Applications:
• Fraud detection
• Trending topics
30
5
5
5
5
5
5
5
5
5
10
10
10
15
15
202530
35
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
10
10
10
10
20
20
20
20
20
30
30
30
4040
40
5050
50
6060
60
70
708090
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
20
20
20
20
20
20
4040
40
6060
60
8080
80
100
100
120
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
20
20
20
20
4040
40
60
60
60
80
80
100120
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
20
20
20
4040
40
6060
60
8080
80
100
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
20
20
20
20
20
4040
40
6060
60
8080
80
100
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
(a) (b) (c)
(d) (e) (f)
umber of source and target samples as follows:
, Dt) = cs · |Ds| + ct · |Dt| + CT r(|Ds|, |Dt|), (2)
, Dt)  Cmax, (3)
max is the experimental budget and CT r(|Ds|, |Dt|)
st of model training, which is a function of source
et sample sizes. Note that this cost model makes the
we have formulated in Eq. (1) into a multi-objective
where not only accuracy is considered, but also cost
onsidered in the transfer learning process.
l learning: Technical details
e Gaussian Process (GP) models to learn a reliable
nce model ˆf(·) that can predict unobserved response
The main motivation to chose GP here is that it offers
CoBot WC SOL
Cass (hdw)RS Cass (db size)
Trade-off	between	measurement	cost	and	
accuracy
0 200 400 600 800 1000 1200
Measurement cost ($)
0
10
20
30
40
50
60
70
80
accuracy
threshold
budget
threshold
sweet
spot
31
Model	training	and	evaluation	overhead
0 10 20 30 40 50 60 70 80 90 100
Percentage of training data from source
10
-2
10
-1
10
0
10
1
10
2
10
3
Modeltrainingtime
WC
RS
SOL
cass
0 10 20 30 40 50 60 70 80 90 100
Percentage of training data from source
10
-3
10
-2
10
-1
10
0
10
1
Modelevaluationtime
WC
RS
SOL
cass
(a) (b)
32
Future	work,	insights	and	ideas
What	if	multiple	related	sources?
Target
corr_2
corr_3
corr_1
source 1
source 2
source 3
Target
select
(a) (b)
source 1
source 2
source 3
34
Example	of	multiple	sources
Source Simulator Target Simulator
Source Robot Target Robot
(1)
(2)
(3)
(4)
(5)
35
Current	priority
We	have	looked	into	transfer	learning	from	simulator	to	real	robot,	but	
now,	we	consider	the	following	scenarios:
• Workload	change	(New	tasks	or	missions,	new	environmental	
conditions)
• Infrastructure	change	(New	Intel	NUC,	new	Camera,	new	Sensors)
• Code	change	(new	versions	of	ROS,	new	localization	algorithm)
36
Learned	function	with	bad	
samples	->	overfitting
Goal:	find	best	sample	points	iteratively	by	gaining	
knowledge	from	source	and	target	domain
Learned	function	with	
good	samples
37
Active	learning	+	transfer	learning
Future	work:	Integrating	transfer	learning	
with	MAPE-K
38
Environment
R.1 Performance
Monitoring performance
repository
GP modelM
A P
E
Highly Configurable System
R.2 Data
Analysis
R.4 Select the most
promising
configuration to test
R.5 Update System
Configuration
R.3 Model Update
K
AS
Autonomic Manager
simulation
data
repository
D.1 Change config.
D.2 Perform sim.
D.3 Extract data
R.3.1
Transfer Learning
Simulator
RuntimeDesign-time
Real System
Conclusion:	Our	cost-aware	transfer	learning	
• Improves	the	model	accuracy	up	to	several	orders	of	magnitude
• Is	able	to	trade-off	between	different	number	of	samples	from	source	
and	target	enabling	a	cost-aware	model
• Imposes	an	acceptable	model	building	and	evaluation	cost	making	
appropriate	for	application	in	the	robotics	domain
39

Weitere ähnliche Inhalte

Was ist angesagt?

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
r-kor
 

Was ist angesagt? (20)

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei Diao
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Build your own Convolutional Neural Network CNN
Build your own Convolutional Neural Network CNNBuild your own Convolutional Neural Network CNN
Build your own Convolutional Neural Network CNN
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
 

Andere mochten auch

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of Learning
Abby Rondilla
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
Pooyan Jamshidi
 
Windows azure in_the_enterprise_net_com
Windows azure in_the_enterprise_net_comWindows azure in_the_enterprise_net_com
Windows azure in_the_enterprise_net_com
Vakul More
 
클라우드 컴퓨팅 보안 이슈 극복을 위한 제언
클라우드 컴퓨팅 보안 이슈 극복을 위한 제언클라우드 컴퓨팅 보안 이슈 극복을 위한 제언
클라우드 컴퓨팅 보안 이슈 극복을 위한 제언
00heights
 

Andere mochten auch (20)

Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
Cloud Migration Patterns: A Multi-Cloud Architectural PerspectiveCloud Migration Patterns: A Multi-Cloud Architectural Perspective
Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic Software
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Cloud Migration Strategy - IT Transformation with Cloud
Cloud Migration Strategy - IT Transformation with CloudCloud Migration Strategy - IT Transformation with Cloud
Cloud Migration Strategy - IT Transformation with Cloud
 
Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of Learning
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
 
Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of Learning
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Cloud sec 2015 megazone slideshare 20150910
Cloud sec 2015 megazone slideshare 20150910Cloud sec 2015 megazone slideshare 20150910
Cloud sec 2015 megazone slideshare 20150910
 
Create and manage a web application on Azure (step to step tutorial)
Create and manage a web application on Azure (step to step tutorial)Create and manage a web application on Azure (step to step tutorial)
Create and manage a web application on Azure (step to step tutorial)
 
6th SDN Interest Group Seminar - Session2 (131210)
6th SDN Interest Group Seminar - Session2 (131210)6th SDN Interest Group Seminar - Session2 (131210)
6th SDN Interest Group Seminar - Session2 (131210)
 
Windows azure in_the_enterprise_net_com
Windows azure in_the_enterprise_net_comWindows azure in_the_enterprise_net_com
Windows azure in_the_enterprise_net_com
 
클라우드 컴퓨팅 보안 이슈 극복을 위한 제언
클라우드 컴퓨팅 보안 이슈 극복을 위한 제언클라우드 컴퓨팅 보안 이슈 극복을 위한 제언
클라우드 컴퓨팅 보안 이슈 극복을 위한 제언
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering Conferences
 

Ähnlich wie Transfer Learning for Improving Model Predictions in Highly Configurable Software

Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
Pooyan Jamshidi
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
Richard Ashworth
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
Charles Hubbard
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
Ravi Gupta
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
NYversity
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 

Ähnlich wie Transfer Learning for Improving Model Predictions in Highly Configurable Software (20)

Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Taskonomy of Transfer Learning
Taskonomy  of Transfer LearningTaskonomy  of Transfer Learning
Taskonomy of Transfer Learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
genetic paper
genetic papergenetic paper
genetic paper
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
1605.08695.pdf
1605.08695.pdf1605.08695.pdf
1605.08695.pdf
 
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Surrogate modeling for industrial design
Surrogate modeling for industrial designSurrogate modeling for industrial design
Surrogate modeling for industrial design
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
 
Taking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsTaking r to its limits. 70+ tips
Taking r to its limits. 70+ tips
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 

Mehr von Pooyan Jamshidi

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Pooyan Jamshidi
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Pooyan Jamshidi
 

Mehr von Pooyan Jamshidi (14)

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning Systems
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Learning to Sample
Learning to SampleLearning to Sample
Learning to Sample
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
 
Production-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectProduction-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software Architect
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software Engineering
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 

Kürzlich hochgeladen

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Kürzlich hochgeladen (20)

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Transfer Learning for Improving Model Predictions in Highly Configurable Software

  • 2. My background and objectives • PhD (2010-13): Fuzzy control for auto-scaling in cloud • Postdoc 1 (2014): Fuzzy Q-Learning for online rule update, MAPE-KE • Postdoc 2 (2015-16): Configuration optimization and performance modeling for big data systems (Stream processing, Map-Reduce, NoSQL) • Now: Transfer learning for making performance model predictions more accurate and more reliable for highly configurable software Objectives: • Opportunities for applying model learning at runtime • Collaborations 2
  • 3. Motivation 1- Many different Parameters => - large state space - interactions 2- Defaults are typically used => - poor performance
  • 4. Motivation 0 1 2 3 4 5 average read latency (µs) ×104 0 20 40 60 80 100 120 140 160 observations 1000 1200 1400 1600 1800 2000 average read latency (µs) 0 10 20 30 40 50 60 70 observations 1 1 (a) cass-20 (b) cass-10 Best configurations Worst configurations Experiments on Apache Cassandra: - 6 parameters, 1024 configurations - Average read latency - 10 millions records (cass-10) - 20 millions records (cass-20) Look at these outliers! Large statistical dispersion
  • 5. Problem-solution overview • The robotic software is considered as a highly configurable system. • The configuration of the robot influence the performance (as well as energy usage). • Problem: there are many different parameters making the configuration space highly dimensional and difficult to understand the influence of configuration on system performance. • Solution: learn a black-box performance model using measurements from the robot. performance_value=f(configuration) • Challenge: Measurements from real robot is expensive (time consuming, require human resource, risky when fail) • Our contribution: performing most measurements on simulator (Gazebo) and taking only few samples from the real robot to learn a reliable and accurate performance model given a certain experimental budget. 5
  • 6. Overview of our approach ML Model Learn Model Measure Measure Data Source Target Simulator (Gazebo) Robot (TurtleBot) Predict Performance Predictions Adaptation Reason 6
  • 7. Overview of our approach ML Model Learn Model Measure Measure Data Source Target Simulator (Gazebo) Robot (TurtleBot) Predict Performance Predictions Adaptation Reason Data 7
  • 9. Goal 9 m(Xi). In general, Xi may either indicate (i) integer vari- e such as level of parallelism or (ii) categorical variable ch as messaging frameworks or Boolean variable such as abling timeout. We use the terms parameter and factor in- changeably; also, with the term option we refer to possible ues that can be assigned to a parameter. We assume that each configuration x 2 X in the configura- n space X = Dom(X1) ⇥ · · · ⇥ Dom(Xd) is valid, i.e., the tem accepts this configuration and the corresponding test ults in a stable performance behavior. The response with nfiguration x is denoted by f(x). Throughout, we assume at f(·) is latency, however, other metrics for response may used. We here consider the problem of finding an optimal nfiguration x⇤ that globally minimizes f(·) over X: x⇤ = arg min x2X f(x) (1) n fact, the response function f(·) is usually unknown or rtially known, i.e., y = f(x ), x ⇢ X. In practice, such it still requires hu per, we propose t with the search e than starting the the learned know software to accel version. This idea in real software e in DevOps di↵er tinuously, (ii) Big frameworks (e.g., on similar platfor versions of a syste To the best of o the possibility of The authors learn of a system and systems. However (x). Throughout, we assume her metrics for response may problem of finding an optimal minimizes f(·) over X: min 2X f(x) (1) n f(·) is usually unknown or i), xi ⇢ X. In practice, such ise, i.e., yi = f(xi) + ✏. The onfiguration is thus a black- ject to noise [27, 33], which terministic optimization. A sampling that starts with a ons. The performance of the s initial samples can deliver and guide the generation of properly guided, the process n-feedback-regeneration will ptimal configuration will be frameworks (e.g., Apache Hadoop, Spark, Kafka) an on similar platforms (e.g., cloud clusters), (iii) and di versions of a system often share a similar business log To the best of our knowledge, only one study [9] ex the possibility of transfer learning in system configu The authors learn a Bayesian network in the tuning p of a system and reuse this model for tuning other s systems. However, the learning is limited to the struc the Bayesian network. In this paper, we introduce a m that not only reuse a model that has been learned prev but also the valuable raw data. Therefore, we are not l to the accuracy of the learned model. Moreover, we consider Bayesian networks and instead focus on MT 2.4 Motivation A motivating example. We now illustrate the pr points on an example. WordCount (cf. Figure 1) is a p benchmark [12]. WordCount features a three-layer ar ture that counts the number of words in the incoming s Partially known Taking measurements Configuration space A black-box response function f : X ! R is used to build a performance model given some observations of the system performance under different settings. In practice, though, the observation data may contain noise, i.e., yi = f(xi) + ✏i, xi 2 X where ✏i ⇠ N (0, i) and we only partially know the response function through the observations D = {(xi, yi)}d i=1, |D| ⌧ |X|. In other words, a response function is simply a mapping from the configuration space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). Note that we can learn a model for all measurable attributes (including accuracy, safety if suitably operationalized), but here we mostly train predictive models on performance attributes (e.g., response time, throughput, CPU utilization). Our main goal is to learn a reliable regression model, ˆf(·), operations including refl guide the sample generat (QOG) [26] speeds up some heuristics to filter Machine learning. Als niques, such as support- evolutionary algorithms approaches trade simpl learned models for predic work [39] exploited a ch the configurable software only a small sample size. this fact, but iteratively c ing performance influenc 2) Positioning in the s easurable performance metric that produces interval-scaled a (here we assume it produces real numbers). Note that can learn a model for all measurable attributes (including uracy, safety if suitably operationalized), but here we stly train predictive models on performance attributes (e.g., ponse time, throughput, CPU utilization). Our main goal is to learn a reliable regression model, ˆf(·), can predict the performance of the system, f(·), given a ted number of observations D. More specifically, we aim minimize the prediction error over the configuration space: arg min x2X pe = | ˆf(x) f(x)| (1) n order to solve the problem above, we assume f(·) is onably smooth, but otherwise little is known about the learned models work [39] expl the configurab only a small sa this fact, but ite ing performanc 2) Positioni reasoning is a Time series tec response time of time. FUSI ships (e.g., fea configuration feasible. Diffe for the purpose function f : X ! R is used model given some observations of nder different settings. In practice, ata may contain noise, i.e., yi = ere ✏i ⇠ N (0, i) and we only e function through the observations ⌧ |X|. In other words, a response ng from the configuration space to metric that produces interval-scaled produces real numbers). Note that all measurable attributes (including ly operationalized), but here we dels on performance attributes (e.g., CPU utilization). n a reliable regression model, ˆf(·), operations including reflection, expansion, and contraction to guide the sample generation. Quick Optimization via Guessing (QOG) [26] speeds up the optimization process exploiting some heuristics to filter out sub-optimal configurations. Machine learning. Also, standard machine-learning tech- niques, such as support-vector machines, decision trees, and evolutionary algorithms have been tried [20], [38]. These approaches trade simplicity and understandability of the learned models for predictive power. For example, some recent work [39] exploited a characteristic of the response surface of the configurable software to learn Fourier sparse functions by only a small sample size. Another approach [31] also exploited this fact, but iteratively construct a regression model represent- ing performance influences in an active learning process. 2) Positioning in the self-adaptive community: Performance Response function to learn In practice, e, i.e., yi = nd we only observations , a response tion space to terval-scaled s). Note that es (including but here we ributes (e.g., model, ˆf(·), f(·), given a ally, we aim ration space: (1) (QOG) [26] speeds up the optimization process exploiting some heuristics to filter out sub-optimal configurations. Machine learning. Also, standard machine-learning tech- niques, such as support-vector machines, decision trees, and evolutionary algorithms have been tried [20], [38]. These approaches trade simplicity and understandability of the learned models for predictive power. For example, some recent work [39] exploited a characteristic of the response surface of the configurable software to learn Fourier sparse functions by only a small sample size. Another approach [31] also exploited this fact, but iteratively construct a regression model represent- ing performance influences in an active learning process. 2) Positioning in the self-adaptive community: Performance reasoning is a key activity for decision making at runtime. Time series techniques [10] shown to be effective in predicting response time and uncovering performance anomalies ahead of time. FUSION [12], [11] exploited inter-feature relation- ships (e.g., feature dependencies) to reduce the dimensions of Learn model
  • 10. Cost model 10 constructed using the target and the source samples. The new model is based on only 3 target measurements while providing accurate predictions. (d) [negative transfer] a regression model is constructed with transfer from a misleading response func- tion and therefore the model prediction is inaccurate. For the choice of number of samples from the source and target, we use a simple cost model (cf. Figure 1) that is based on the number of source and target samples as follows: C(Ds, Dt) = cs · |Ds| + ct · |Dt| + CT r(|Ds|, |Dt|), (2) C(Ds, Dt)  Cmax, (3) where Cmax is the experimental budget and CT r(|Ds|, |Dt|) is the cost of model training, which is a function of source s = g(xs) + ✏s, xs 2 X}, and transfer these obser- earn a performance model for the real system using bservations from that system, Dt = {(xt, yt)|yt = xt 2 X}. The cost of obtaining each observation on system is ct, and we assume that cs ⌧ ct and that response function, g(·), is related (correlated) to the tion, and this relatedness contributes to the learning get function, using only sparse samples from the m, i.e., |Dt| ⌧ |Ds|. Intuitively, rather than starting ch, we transfer the learned knowledge from the data ollected using cheap methods such as simulators s system versions, to learn a more accurate model target configurable system and use this model to ut its performance at runtime. e 2, we demonstrate the idea of transfer learning thetic example. Only three observations are taken (c) Fig. 2: (a) 9 samples h spectively 3 from a targ regression model is con ples. (c) [with transfer l constructed using the tar model is based on only 3 accurate predictions. (d) is constructed with trans tion and therefore the m ng gn ng m b- in at e- ms nd ed III. COST-AWARE TRANSFER LEARNING In this section, we explain our cost-aware transfer learning solution to improve learning an accurate performance model. A. Solution overview Ds = {(xs, ys)|ys = g(xs) + ✏s, xs 2 X}, and transfer these observations to learn a performance model for the real system using only few observations from that system, Dt = {(xt, yt)|yt = f(xt) + ✏t, xt 2 X}. The cost of obtaining each observation on the target system is ct, and we assume that cs ⌧ ct and that the source response function, g(·), is . COST-AWARE TRANSFER LEARNING ction, we explain our cost-aware transfer learning mprove learning an accurate performance model. overview xs, ys)|ys = g(xs) + ✏s, xs 2 X}, and transfer ations Dt = {(xt, yt)|yt = f(xt) + ✏t, xt 2 X}. obtaining each observation on the target system is assume that cs ⌧ ct and that the source response ·), is related (correlated) to the target function, atedness contributes to the learning of the target In this section, we explain our cost-aware transfer learning lution to improve learning an accurate performance model. Solution overview Ds = {(xs, ys)|ys = g(xs) + ✏s, xs 2 X}, and transfer ese observations Dt = {(xt, yt)|yt = f(xt) + ✏t, xt 2 X}. he cost of obtaining each observation on the target system is and we assume that cs ⌧ ct and that the source response nction, g(·), is related (correlated) to the target function, d this relatedness contributes to the learning of the target Source observations Target observations Assumption Budget Total cost
  • 11. Synthetic example (a) Source and target samples (b) Learning without transfer (c) Learning with transfer (d) Negative transfer 1 3 5 9 10 11 12 13 14 0 50 100 150 200 source samples target samples (a) (b) (c) (d) 11
  • 12. Assumption: Source and target are correlated 0 2 4 6 8 10 0 0.5 1 1.5 2 2.5 10 4 0 2 4 6 8 10 0 0.5 1 1.5 2 2.5 3 10 4 0 2 4 6 8 10 0 0.5 1 1.5 2 2.5 10 4 (a) (b) (c) 1.5 2 2.5 3 104 (d)0 2 4 6 8 10 -0.5 0 0.5 1 1.5 2 2.5 10 4 1.5 2 2.5 3 104 1.5 2 2.5 3 104 ntime it is cisely using ration i). In uch as rithm binary There- tesian terest space R is ations ✓ X. noise, words, from c that s real dimensional spaces. Relevant to software engineering commu- nity, several approaches tried different experimental designs for highly configurable software [14] and some even consider cost as an explicit factor to determine optimal sampling [26]. Recently, researchers have tried novel way of sampling with a feedback embedded inside the process where new samples are derived based on information gain from previous set of samples. Recursive Random Sampling (RRS) [33] integrates a restarting mechanism into the random sampling to achieve high search efficiency. Smart Hill Climbing (SHC) [32] inte- grates the importance sampling with Latin Hypercube Design (lhd). SHC estimates the local regression at each potential region, then it searches toward the steepest descent direction. An approach based on direct search [35] forms a simplex in the parameter space by a number of samples, and iteratively updates a simplex through a number of well-defined operations including reflection, expansion, and contraction to guide the sample generation. Quick Optimization via Guessing (QOG) in [23] speeds up the optimization process exploiting some heuristics to filter out sub-optimal configurations. Some recent work [34] exploited a characteristic of the response surface of the configurable software to learn Fourier sparse functions by only a small sample size. Another approach also exploited this fact, but iteratively construct a regression model representing performance influences in an active learning process. time constrained environments (e.g., runtime making in a feedback loop for robots), it is to select the sources purposefully. ormulation: Model learning ntroduce the concepts in our approach precisely we define the model learning problem using notation. Let Xi indicate the i-th configuration hich ranges in a finite domain Dom(Xi). In ay either indicate (i) an integer variable such as iterative refinements in a localization algorithm orical variable such as sensor names or binary local vs global localization method). There- figuration space is mathematically a Cartesian of the domains of the parameters of interest ) ⇥ · · · ⇥ Dom(Xd). ation x resides in the design parameter space black-box response function f : X ! R is a performance model given some observations performance under different settings, D ✓ X. hough, such measurements may contain noise, xi) + ✏i where ✏i ⇠ N (0, i). In other words, e model is simply a function (mapping) from space to a measurable performance metric that val-scaled data (here we assume it produces real dimensional spaces. Relevant to software engineering commu- nity, several approaches tried different experimental designs for highly configurable software [14] and some even consider cost as an explicit factor to determine optimal sampling [26]. Recently, researchers have tried novel way of sampling with a feedback embedded inside the process where new samples are derived based on information gain from previous set of samples. Recursive Random Sampling (RRS) [33] integrates a restarting mechanism into the random sampling to achieve high search efficiency. Smart Hill Climbing (SHC) [32] inte- grates the importance sampling with Latin Hypercube Design (lhd). SHC estimates the local regression at each potential region, then it searches toward the steepest descent direction. An approach based on direct search [35] forms a simplex in the parameter space by a number of samples, and iteratively updates a simplex through a number of well-defined operations including reflection, expansion, and contraction to guide the sample generation. Quick Optimization via Guessing (QOG) in [23] speeds up the optimization process exploiting some heuristics to filter out sub-optimal configurations. Some recent work [34] exploited a characteristic of the response surface of the configurable software to learn Fourier sparse functions by only a small sample size. Another approach also exploited this fact, but iteratively construct a regression model representing performance influences in an active learning process. in some time constrained environments (e.g., runtime decision making in a feedback loop for robots), it is important to select the sources purposefully. C. Problem formulation: Model learning In order to introduce the concepts in our approach precisely and concisely, we define the model learning problem using mathematical notation. Let Xi indicate the i-th configuration parameter, which ranges in a finite domain Dom(Xi). In general, Xi may either indicate (i) an integer variable such as the number of iterative refinements in a localization algorithm or (ii) a categorical variable such as sensor names or binary options (e.g., local vs global localization method). There- fore, the configuration space is mathematically a Cartesian product of all of the domains of the parameters of interest X = Dom(X1) ⇥ · · · ⇥ Dom(Xd). A configuration x resides in the design parameter space x 2 X. A black-box response function f : X ! R is used to build a performance model given some observations of the system performance under different settings, D ✓ X. In practice, though, such measurements may contain noise, i.e., yi = f(xi) + ✏i where ✏i ⇠ N (0, i). In other words, a performance model is simply a function (mapping) from configuration space to a measurable performance metric that produces interval-scaled data (here we assume it produces real dimensional spaces. Relevant to software engineering commu- nity, several approaches tried different experimental designs for highly configurable software [14] and some even consider cost as an explicit factor to determine optimal sampling [26]. Recently, researchers have tried novel way of sampling with a feedback embedded inside the process where new samples are derived based on information gain from previous set of samples. Recursive Random Sampling (RRS) [33] integrates a restarting mechanism into the random sampling to achieve high search efficiency. Smart Hill Climbing (SHC) [32] inte- grates the importance sampling with Latin Hypercube Design (lhd). SHC estimates the local regression at each potential region, then it searches toward the steepest descent direction. An approach based on direct search [35] forms a simplex in the parameter space by a number of samples, and iteratively updates a simplex through a number of well-defined operations including reflection, expansion, and contraction to guide the sample generation. Quick Optimization via Guessing (QOG) in [23] speeds up the optimization process exploiting some heuristics to filter out sub-optimal configurations. Some recent work [34] exploited a characteristic of the response surface of the configurable software to learn Fourier sparse functions by only a small sample size. Another approach also exploited this fact, but iteratively construct a regression model representing performance influences in an active learning process. in some time constrained environments (e.g., runtime decision making in a feedback loop for robots), it is important to select the sources purposefully. C. Problem formulation: Model learning In order to introduce the concepts in our approach precisely and concisely, we define the model learning problem using mathematical notation. Let Xi indicate the i-th configuration parameter, which ranges in a finite domain Dom(Xi). In general, Xi may either indicate (i) an integer variable such as the number of iterative refinements in a localization algorithm or (ii) a categorical variable such as sensor names or binary options (e.g., local vs global localization method). There- fore, the configuration space is mathematically a Cartesian product of all of the domains of the parameters of interest X = Dom(X1) ⇥ · · · ⇥ Dom(Xd). A configuration x resides in the design parameter space x 2 X. A black-box response function f : X ! R is used to build a performance model given some observations of the system performance under different settings, D ✓ X. In practice, though, such measurements may contain noise, i.e., yi = f(xi) + ✏i where ✏i ⇠ N (0, i). In other words, a performance model is simply a function (mapping) from configuration space to a measurable performance metric that produces interval-scaled data (here we assume it produces real dimensional spaces. Relevant to s nity, several approaches tried dif for highly configurable software [ cost as an explicit factor to deter Recently, researchers have tried a feedback embedded inside the are derived based on information samples. Recursive Random Sam a restarting mechanism into the high search efficiency. Smart Hil grates the importance sampling w (lhd). SHC estimates the local region, then it searches toward th An approach based on direct sea the parameter space by a numbe updates a simplex through a numb including reflection, expansion, a sample generation. Quick Optimi in [23] speeds up the optimizati heuristics to filter out sub-optimal work [34] exploited a characterist the configurable software to learn only a small sample size. Another fact, but iteratively construct a re performance influences in an acti a) Source may be a shift to the target b) Source may be a noisy version of the target c) Source may be different to target in some extreme positions in the configuration space d) Source may be irrelevant to the target -> negative transfer will happen! 12
  • 13. Correlations 100 150 1 200 250 Latency(ms) 300 2 5 3 104 5 15 6 14 16 18 20 1 22 24 26 Latency(ms) 28 30 32 2 53 104 5 156 number of countersnumber of splitters number of countersnumber of splitters 2.8 2.9 1 3 3.1 3.2 3.3 2 Latency(ms) 3.4 3.5 3.6 3 5 4 10 5 15 6 1.2 1.3 1.4 1 1.5 1.6 1.7 Latency(ms) 1.8 1.9 2 53 104 5 156 (a) WordCount v1 (b) WordCount v2 (c) WordCount v3 (d) WordCount v4 (e) Pearson correlation coefficients (g) Measurement noise across WordCount versions (f) Spearman correlation coefficients correlation coefficient p-value v1 v2 v3 v4 500 600 700 800 900 1000 1100 1200 Latency(ms) hardware change softwarechange Table 1: My caption v1 v2 v3 v4 v1 1 0.41 -0.46 -0.50 v2 7.36E-06 1 -0.20 -0.18 v3 6.92E-07 0.04 1 0.94 v4 2.54E-08 0.07 1.16E-52 1 Table 2: My caption v1 v2 v3 v4 v1 1 0.49 -0.51 -0.51 v2 5.50E-08 1 -0.2793 -0.24 v3 1.30E-08 0.003 1 0.88 v4 1.40E-08 0.01 8.30E-36 1 Table 3: My caption ver. µ µ v1 516.59 7.96 64.88 v1 v2 v3 v4 v1 1 0.41 -0.46 -0.50 v2 7.36E-06 1 -0.20 -0.18 v3 6.92E-07 0.04 1 0.94 v4 2.54E-08 0.07 1.16E-52 1 Table 2: My caption v1 v2 v3 v4 v1 1 0.49 -0.51 -0.51 v2 5.50E-08 1 -0.2793 -0.24 v3 1.30E-08 0.003 1 0.88 v4 1.40E-08 0.01 8.30E-36 1 Table 3: My caption ver. µ µ v1 516.59 7.96 64.88 v2 584.94 2.58 226.32 v3 654.89 13.56 48.30 v4 1125.81 16.92 66.56 - Different correlations - Different optimum Configurations - Different noise level
  • 16. GP for modeling black box response function true function GP mean GP variance observation selected point true minimum mposed by its prior mean (µ(·) : X ! R) and a covariance nction (k(·, ·) : X ⇥ X ! R) [41]: y = f(x) ⇠ GP(µ(x), k(x, x0 )), (2) here covariance k(x, x0 ) defines the distance between x d x0 . Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be e collection of t experimental data (observations). In this mework, we treat f(x) as a random variable, conditioned observations S1:t, which is normally distributed with the lowing posterior mean and variance functions [41]: µt(x) = µ(x) + k(x)| (K + 2 I) 1 (y µ) (3) 2 t (x) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (4) here y := y1:t, k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], := µ(x1:t), K := k(xi, xj) and I is identity matrix. The ortcoming of BO4CO is that it cannot exploit the observa- ns regarding other versions of the system and as therefore nnot be applied in DevOps. 2 TL4CO: an extension to multi-tasks TL4CO 1 uses MTGPs that exploit observations from other evious versions of the system under test. Algorithm 1 fines the internal details of TL4CO. As Figure 4 shows, 4CO is an iterative algorithm that uses the learning from her system versions. In a high-level overview, TL4CO: (i) ects the most informative past observations (details in ction 3.3); (ii) fits a model to existing data based on kernel arning (details in Section 3.4), and (iii) selects the next ork are based on tractable linear algebra. evious work [21], we proposed BO4CO that ex- task GPs (no transfer learning) for prediction of tribution of response functions. A GP model is y its prior mean (µ(·) : X ! R) and a covariance ·, ·) : X ⇥ X ! R) [41]: y = f(x) ⇠ GP(µ(x), k(x, x0 )), (2) iance k(x, x0 ) defines the distance between x us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be n of t experimental data (observations). In this we treat f(x) as a random variable, conditioned ons S1:t, which is normally distributed with the sterior mean and variance functions [41]: µ(x) + k(x)| (K + 2 I) 1 (y µ) (3) k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (4) 1:t, k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], , K := k(xi, xj) and I is identity matrix. The of BO4CO is that it cannot exploit the observa- ng other versions of the system and as therefore pplied in DevOps. CO: an extension to multi-tasks uses MTGPs that exploit observations from other Motivations: 1- mean estimates + variance 2- all computations are linear algebra 3- good estimations when few data K := 4 ... ... ... k(xt, x1) . . . k(xt, xt) 5 (7) odels have shown to be effective for performance ns in data scarce domains [20]. However, as we monstrated in Figure 2, it may becomes inaccurate e samples do not cover the space uniformly. For onfigurable systems, we require a large number of ions to cover the space uniformly, making GP models ve in such situations. el prediction using transfer learning nsfer learning, the key question is how to make accu- dictions for the target environment using observations er sources, Ds. We need a measure of relatedness not ween input configurations but between the sources The relationships between input configurations was in the GP models using the covariance matrix that ned based on the kernel function in Eq. (7). More lly, a kernel is a function that computes a dot product ure of “similarity”) between two input configurations. kernel helps to get accurate predictions for similar ations. We now need to exploit the relationship be- e source and target functions, g, f, using the current ions Ds, Dt to build the predictive model ˆf. To capture ionship, we define the following kernel function: k(f, g, x, x0 ) = kt(f, g) ⇥ kxx(x, x0 ), (8) other components can query the performance under specific configurations or update the model given a new observation. We use transfer learning to make the knowledge more accurate using observations that are taken from a simulator or any other cheap sources. For deciding how many observations and from what source to transfer, we use the cost model that we have introduced earlier. At runtime, the managed system is Monitored by pulling the end-to-end performance metrics (e.g., latency, throughput) from the corresponding sensors. Then, the retrieved performance data needs to be Analysed and the mean performance associated to a specific setting of the system will be stored in a data repository. Next, the GP model needs to be updated taking into account the new performance observation. Having updated the GP model, a new configuration may be Planned to replace the current configuration. Finally, the new configuration will be enacted by Executing appropriate platform specific operations. This enables model-based knowledge evolution using machine learning [2], [21]. The underlying GP model can now be updated not only when a new observation is available but also by transferring the learning from other related sources. So at each adaptation cycle, we can update our belief about the correct response given data from the managed system and other related sources, accelerating the learning process. IV. EXPERIMENTAL RESULTS We evaluate the effectiveness and applicability of our transfer learning approach for learning models for highly- orate domain knowledge as prior, if available, which nce the model accuracy [20]. er to describe the technical details of our transfer methodology, let us briefly describe an overview of l regression; a more detailed description can be found e [35]. GP models assume that the function ˆf(x) can reted as a probability distribution over functions: y = ˆf(x) ⇠ GP(µ(x), k(x, x0 )), (4) : X ! R is the mean function and k : X ⇥ X ! R variance function (kernel function) which describes onship between response values, y, according to the of the input values x, x0 . The mean and variance of model predictions can be derived analytically [35]: = µ(x) + k(x)| (K + 2 I) 1 (y µ), (5) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x), (6) (x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], I is matrix and K := 2 6 4 k(x1, x1) . . . k(x1, xt) ... ... ... k(xt, x1) . . . k(xt, xt) 3 7 5 (7) odels have shown to be effective for performance ns in data scarce domains [20]. However, as we Eq. (5), (6) with the new K. The main essence of transfer learning is, therefore, the kernel that capture the source and target relationship and provide more accurate predictions using the additional knowledge we can gain via the relationship between source and target. D. Transfer learning in a self-adaptation loop Now that we have described the idea of transfer learning for providing more accurate predictions, the question is whether such an idea can be applied at runtime and how the self- adaptive systems can benefit from it. More specifically, we now describe the idea of model learning and transfer learning in the context of self-optimization, where the system adapts its configuration to meet performance requirements at runtime. The difference to traditional configurable systems is that we learn the performance model online in a feedback loop under time and resource constraints. Such performance reasoning is done more frequently for self-adaptation purposes. An overview of a self-optimization solution is depicted in Figure 3 following the well-know MAPE-K framework [9], [23]. We consider the GP model as the K (knowledge) component of this framework that acts as an interface to which other components can query the performance under specific configurations or update the model given a new observation. We use transfer learning to make the knowledge more accurate using observations that are taken from a simulator or any
  • 17. The case where we learn from correlated responses -1.5 -1 -0.5 0 0.5 1 1.5 -4 -3 -2 -1 0 1 2 3 (a) 3 sample response functions configuration domain responsevalue (1) (2) (3) observations (b) GP fit for (1) ignoring observations for (2),(3) LCB not informative (c) multi-task GP fit for (1) by transfer learning from (2),(3) highly informative GP prediction mean GP prediction variance probability distribution of the minimizers
  • 20. Performance prediction for CoBot 5 10 15 20 25 5 10 15 20 25 14 16 18 20 22 24 5 10 15 20 25 5 10 15 20 25 8 10 12 14 16 18 20 22 24 26 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30 CPU usage [%] CPU usage [%] (a) (b) (c) (d) Prediction without transfer learning 5 10 15 20 25 5 10 15 20 25 10 15 20 25 Prediction with transfer learning (a) Source response function (cheap) (b) Target response function (expensive) 20
  • 21. Performance prediction for CoBot 5 10 15 20 25 5 10 15 20 25 14 16 18 20 22 24 5 10 15 20 25 5 10 15 20 25 8 10 12 14 16 18 20 22 24 26 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30 CPU usage [%] CPU usage [%] (a) (b) (c) (d) Prediction without transfer learning 5 10 15 20 25 5 10 15 20 25 10 15 20 25 Prediction with transfer learning (a) Source response function (cheap) (b) Target response function (expensive) (c) Prediction without transfer learning (d) Prediction with transfer learning 21
  • 22. Prediction accuracy and model reliability 2 2 2 2 2 2 2 2 2 44 4 4 6 6 6 8 8 1012 14 16 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 5 5 5 5 5 5 5 5 5 10 10 10 10 15 15 15 20 20 25 303540 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 (a) (b) 0.5 0.5 0.5 1 1 1 1.5 1.5 1.5 2 2 2 2.5 2.5 2.5 3 3 3 3 3.5 3.5 3.5 3.5 4 4 4 4 4 4.5 4.5 4.5 5 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 0.005 0.005 0.005 0.01 0.01 0.01 0.015 0.015 0.0150.02 0.02 0.02 0.02 0.025 0.025 0.025 0.025 0.025 0.025 0.025 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.035 0.035 0.035 0.04 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 (c) (d) (a) Prediction error • Trade-off among using more source or target samples (b) Model reliability • Transfer learning contributes to lower the prediction uncertainty 22
  • 23. Prediction accuracy and model reliability 2 2 2 2 2 2 2 2 2 44 4 4 6 6 6 8 8 1012 14 16 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 5 5 5 5 5 5 5 5 5 10 10 10 10 15 15 15 20 20 25 303540 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 (a) (b) 0.5 0.5 0.5 1 1 1 1.5 1.5 1.5 2 2 2 2.5 2.5 2.5 3 3 3 3 3.5 3.5 3.5 3.5 4 4 4 4 4 4.5 4.5 4.5 5 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 0.005 0.005 0.005 0.01 0.01 0.01 0.015 0.015 0.0150.02 0.02 0.02 0.02 0.025 0.025 0.025 0.025 0.025 0.025 0.025 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.035 0.035 0.035 0.04 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 (c) (d) (a) Prediction error • Trade-off among using more source or target samples (b) Model reliability • Transfer learning contributes to lower the prediction uncertainty (c) Training overhead (d) Evaluation overhead • Appropriate for runtime usage 23
  • 24. Level of correlation between source and target is important 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 TABL colum datase measu• Model becomes more accurate when the source is more related to the target • Even learning from a source with a small correlation is better than no transfer 24
  • 25. 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 5 10 15 20 25 30 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 12 14 16 18 20 22 24 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 6 8 10 12 14 16 18 20 22 24 (a) (b) (c) (d) (e) 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 12 14 16 18 20 22 24 (f) CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%]
  • 26. Prediction accuracy • The model provides more accurate predictions as we exploit more data • Transfer learning may (i) boost initial performance, (ii) increase the learning speed, (iii) lead to a more accurate performance finally correspondences. In much of the work on transfer learning, a human p this mapping, but some methods provide ways to perform the mappin matically. Another section of the chapter discusses work in this area. performance training with transfer without transfer higher start higher slope higher asymptote Fig. 2. Three ways in which transfer might improve learning. 2 [Lisa Torrey and Jude Shavlik] 26
  • 27. Experimental evaluations • RQ1: How much does transfer learning improve the prediction accuracy? • RQ2: What are the trade-offs between learning with different numbers of samples from source and target in terms of prediction accuracy? • RQ3: Is our transfer learning approach applicable in the context of self-adaptive systems in terms of model training and evaluation time? 27
  • 29. Stream Processing Systems Logical View Physical View pipe Spout A Bolt A Bolt B socket socket out queue in queue Worker A Worker B Worker C out queue in queue Kafka Spout Splitter Bolt Counter Bolt (sentence) (word) [paintings, 3] [poems, 60] [letter, 75] Kafka Topic Stream to Kafka File (sentence) (sentence) (sentence) Kafka Spout RollingCount Bolt Intermediate Ranking Bolt (hashtags) (hashtag, count) Ranking Bolt (ranking) (trending topics)Kafka Topic Twitter to Kafka (tweet) Twitter Stream (tweet) (tweet) Storm Architecture Word Count Architecture • CPU intensive Rolling Sort Architecture • Memory intensive Applications: • Fraud detection • Trending topics
  • 30. 30 5 5 5 5 5 5 5 5 5 10 10 10 15 15 202530 35 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 10 10 10 10 20 20 20 20 20 30 30 30 4040 40 5050 50 6060 60 70 708090 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 20 20 20 20 20 20 4040 40 6060 60 8080 80 100 100 120 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 20 20 20 20 4040 40 60 60 60 80 80 100120 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 20 20 20 4040 40 6060 60 8080 80 100 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 20 20 20 20 20 4040 40 6060 60 8080 80 100 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100 (a) (b) (c) (d) (e) (f) umber of source and target samples as follows: , Dt) = cs · |Ds| + ct · |Dt| + CT r(|Ds|, |Dt|), (2) , Dt)  Cmax, (3) max is the experimental budget and CT r(|Ds|, |Dt|) st of model training, which is a function of source et sample sizes. Note that this cost model makes the we have formulated in Eq. (1) into a multi-objective where not only accuracy is considered, but also cost onsidered in the transfer learning process. l learning: Technical details e Gaussian Process (GP) models to learn a reliable nce model ˆf(·) that can predict unobserved response The main motivation to chose GP here is that it offers CoBot WC SOL Cass (hdw)RS Cass (db size)
  • 31. Trade-off between measurement cost and accuracy 0 200 400 600 800 1000 1200 Measurement cost ($) 0 10 20 30 40 50 60 70 80 accuracy threshold budget threshold sweet spot 31
  • 32. Model training and evaluation overhead 0 10 20 30 40 50 60 70 80 90 100 Percentage of training data from source 10 -2 10 -1 10 0 10 1 10 2 10 3 Modeltrainingtime WC RS SOL cass 0 10 20 30 40 50 60 70 80 90 100 Percentage of training data from source 10 -3 10 -2 10 -1 10 0 10 1 Modelevaluationtime WC RS SOL cass (a) (b) 32
  • 34. What if multiple related sources? Target corr_2 corr_3 corr_1 source 1 source 2 source 3 Target select (a) (b) source 1 source 2 source 3 34
  • 35. Example of multiple sources Source Simulator Target Simulator Source Robot Target Robot (1) (2) (3) (4) (5) 35
  • 38. Future work: Integrating transfer learning with MAPE-K 38 Environment R.1 Performance Monitoring performance repository GP modelM A P E Highly Configurable System R.2 Data Analysis R.4 Select the most promising configuration to test R.5 Update System Configuration R.3 Model Update K AS Autonomic Manager simulation data repository D.1 Change config. D.2 Perform sim. D.3 Extract data R.3.1 Transfer Learning Simulator RuntimeDesign-time Real System