The document provides an introduction to Gaussian processes. It explains that Gaussian processes allow modeling any function directly and estimating uncertainty for predictions. It demonstrates how two random variables can be jointly distributed as a multivariate Gaussian distribution, and how the conditional distribution of one variable given the other can be derived from the joint distribution. Gaussian processes use these properties to perform nonparametric machine learning by modeling relationships between variables without assuming a specific function form.
14. x
y
x
y
Parametric ML Nonparametric ML
A learning model that
summarizes data with a set
of parameters of fixed size
(independent of the number
of training examples) is
called a parametric model.
Algorithms that do not make
strong assumptions about
the form of the mapping
function are called
nonparametric machine
learning algorithms.
y= ✓0 + ✓1x
15. x
y
x
y
Parametric ML Nonparametric ML
A learning model that
summarizes data with a set
of parameters of fixed size
(independent of the number
of training examples) is
called a parametric model.
Algorithms that do not make
strong assumptions about
the form of the mapping
function are called
nonparametric machine
learning algorithms.
y= ✓0 + ✓1x
Question: is K-nearest neighbour parametric
or nonparametric algorithm according to
these definitions?
22. µ
1p
2⇡
e
(x µ)2
2 2
With average coordinate and standard
deviation from centre
µ
Many important processes follow normal
distribution
23. µ
1p
2⇡
e
(x µ)2
2 2N(µ, 2
)
With average coordinate and standard
deviation from centre
µ
Many important processes follow normal
distribution
24. X1 ⇠ N(µ1, 2
1)1p
2⇡
e
(x µ)2
2 2
With average coordinate and standard
deviation from centre
µ
Many important processes follow normal
distribution
µ1
1
25. 1p
2⇡
e
(x µ)2
2 2
What If I draw
another distribution?
With average coordinate and standard
deviation from centre
µ
Many important processes follow normal
distribution
X1 ⇠ N(µ1, 2
1)
µ1
1
48.
x1
x2
⇠ N
✓
0
0
1 0
0 1
◆
X10
0
X2
X10
0
X2
x1
x2
⇠ N
✓
0
0
1 0.5
0.5 1
◆
Positive value of does not
tell much about
X1
X2
Some similarity (correlation)
Positive value of with
good probability means
positive
X1
X2
No similarity (no correlation)
79. 20D
Gaussian
Let’s add more dependency
between points
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
1 0.5 0.5 . . . 0.5
0.5 1 0.5 . . . 0.5
...
...
...
...
...
0.5 0.5 0.5 . . . 1
3
7
7
7
5
1
C
C
C
A
80. 20D
Gaussian
Let’s add more dependency
between points
(0.73, 0.18, 0.68, -0.2,…, 16 more)
0
1st 2nd
1
1
3rd 4th 5th 6th 7th
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
1 0.5 0.5 . . . 0.5
0.5 1 0.5 . . . 0.5
...
...
...
...
...
0.5 0.5 0.5 . . . 1
3
7
7
7
5
1
C
C
C
A
81. 20D
Gaussian
Let’s add more dependency
between points
0
1st 2nd
1
1
3rd 4th 5th 6th 7th
We want some notion of smoothness between points…
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
1 0.5 0.5 . . . 0.5
0.5 1 0.5 . . . 0.5
...
...
...
...
...
0.5 0.5 0.5 . . . 1
3
7
7
7
5
1
C
C
C
A
82. 20D
Gaussian
Let’s add more dependency
between points
0
1st 2nd
1
1
3rd 4th 5th 6th 7th
We want some notion of smoothness between points…
So that dependancy between 1st and 2nd points is larger than between 1st and the 3rd.
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
1 0.5 0.5 . . . 0.5
0.5 1 0.5 . . . 0.5
...
...
...
...
...
0.5 0.5 0.5 . . . 1
3
7
7
7
5
1
C
C
C
A
83. 20D
Gaussian
Let’s add more dependency
between points
We want some notion of smoothness between points…
So that dependancy between 1st and 2nd points is larger than between 1st and the 3rd.
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
1 0.5 0.5 . . . 0.5
0.5 1 0.5 . . . 0.5
...
...
...
...
...
0.5 0.5 0.5 . . . 1
3
7
7
7
5
1
C
C
C
A
We might have just increased corresponding values in
covariance matrix, right?
84. 20D
Gaussian
Let’s add more dependency
between points
We want some notion of smoothness between points…
So that dependancy between 1st and 2nd points is larger than between 1st and the 3rd.
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
1 0.5 0.5 . . . 0.5
0.5 1 0.5 . . . 0.5
...
...
...
...
...
0.5 0.5 0.5 . . . 1
3
7
7
7
5
1
C
C
C
A
We might have just increased corresponding values in
covariance matrix, right?
We need a way to generate a “smooth” covariance
matrix automatically depending on the distance
between points
85. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
20D
Gaussian
86. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
20D
Gaussian
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K120
K21 K22 K23 . . . K220
...
...
...
...
...
K201 K202 K203 . . . K2020
3
7
7
7
5
1
C
C
C
A
0
1st 2nd
1
1
3rd 4th 5th 6th 7th
87. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
88. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
89. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
90. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
91. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
92. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
µ⇤ ⇤
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
93. We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
200D
Gaussian
0
1
1
µ⇤ ⇤
0
B
B
B
@
2
6
6
6
4
0
0
...
0
3
7
7
7
5
2
6
6
6
4
K11 K12 K13 . . . K1200
K21 K22 K23 . . . K2200
...
...
...
...
...
K2001 K2002 K2003 . . . K200200
3
7
7
7
5
1
C
C
C
A
94. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
95. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Previously we were using:
to generate correlated points,
can we do it again here?
f1
f2
⇠
✓
0
0
1 0.5
0.5 1
◆
96. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Wait! But now we have three
points, we cannot use the
same formula!
f1
f2
⇠
✓
0
0
1 0.5
0.5 1
◆
Previously we were using:
to generate correlated points,
can we do it again here?
97. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Ok… What about now?
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5
2
4
1 0.5 0.5
0.5 1 0.5
0.5 0.5 1
3
5
1
A
98. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Ok… What about now?
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5
2
4
1 0.5 0.5
0.5 1 0.5
0.5 0.5 1
3
5
1
A
Wait, did he just said that f2
should be more correlated
to f1 than to f3?
99. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Ok… What about now?
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5
2
4
1 0.5 0.5
0.5 1 0.5
0.5 0.5 1
3
5
1
A
Wait, did he just said that f2
should be more correlated
to f1 than to f3?
Arrrr….
100. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Better now?
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5
2
4
1 0.7 0.2
0.7 1 0.5
0.2 0.5 1
3
5
1
A
101. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Better now?
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5
2
4
1 0.7 0.2
0.7 1 0.5
0.2 0.5 1
3
5
1
A
Yes, but what if we want to
obtain this matrix
automatically based on how
close points are by (Z)?
102. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
Better now?
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5
2
4
1 0.7 0.2
0.7 1 0.5
0.2 0.5 1
3
5
1
A
Yes, but what if we want to
obtain this matrix
automatically based on how
close points are by (Z)?
We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
103. f1
f2
f3
We are interested in modelling
for given
Z
Z
So that is more correlated with than
z1 z2 z3
F(z)
F(z)
f1f2 f3
We will use a similarity measure
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
So now, it will become:
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5 ,
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
1
A
104. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
105. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
106. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
107. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5 ,
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
1
A
108. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5 ,
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
1
A
Which is the same as saying:
f ⇠ N(0, K)
109. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5 ,
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
1
A
Which is the same as saying:
f ⇠ N(0, K)
But how do we model f*?
110. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5 ,
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
1
A
Which is the same as saying:
f ⇠ N(0, K)
But how do we model f*?
Well, probably again some
kinda normal…
111. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
2
4
f1
f2
f3
3
5 ⇠
0
@
2
4
0
0
0
3
5 ,
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
1
A
Which is the same as saying:
f ⇠ N(0, K)
But how do we model f*?
Well, probably again some
kinda normal…
Maybe something like:
f⇤ ⇠ N(0, ?)
112. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K)
Maybe something like:
f⇤ ⇠ N(0, ?)
113. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K)
Maybe something like:
f⇤ ⇠ N(0, ?)
But what is this “?”
covariance matrix of z* with z*?
114. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K)
Maybe something like:
f⇤ ⇠ N(0, ?)
But what is this “?”
covariance matrix of z* with z*?
f⇤ ⇠ N(0, K⇤⇤)
115. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K)
Maybe something like:
f⇤ ⇠ N(0, ?)
But what is this “?”
covariance matrix of z* with z*?
f⇤ ⇠ N(0, K⇤⇤)
But isn’t K** is just 1?
K⇤⇤ = e ||z⇤ z⇤||2
= 1
116. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
117. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
What else is left?
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
118. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
What else is left?
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
119. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
What else is left?
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
120. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
What else is left?
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Only one entity is left:
K1⇤ = K(z1, z⇤)
121. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
What else is left?
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Only one entity is left:
K1⇤ = K(z1, z⇤)
I guess we know how to
calculate this one!
Kij = e ||zi zj ||2
=
(
0, ||zi zj|| ! 1
1, zi = zj
122. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
Yeah! We did it!
123. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
Yeah! We did it!
Wait… but what we do now?
124. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
Yeah! We did it!
Wait… but what we do now?
Remember….
125.
x1
x2 i
⇠ N
✓
µ1
µ2
11 12
21 22
◆
X2
X1
Joint distribution of variables andx1 x2
P(x1, x2)
x2
Conditional distribution
1|2
µ1|2
P(x1|x2) = N(x1|µ1|2, 1|2)
µ1|2 = µ1 + 12 + T
22(x2 µ2)
1|2 = 11 12
T
22 21
126.
x1
x2 i
⇠ N
✓
µ1
µ2
11 12
21 22
◆
X2
X1
Joint distribution of variables andx1 x2
P(x1, x2)
x2
Conditional distribution
1|2
µ1|2
P(x1|x2) = N(x1|µ1|2, 1|2)
µ1|2 = µ1 + 12 + T
22(x2 µ2)
1|2 = 11 12
T
22 21
What if we substitute x1 with f* and x2 with f?
127.
x1
x2 i
⇠ N
✓
µ1
µ2
11 12
21 22
◆
X2
X1
Joint distribution of variables andx1 x2
P(x1, x2)
x2
Conditional distribution
1|2
µ1|2
P(x1|x2) = N(x1|µ1|2, 1|2)
µ1|2 = µ1 + 12 + T
22(x2 µ2)
1|2 = 11 12
T
22 21
Then we can compute mean and standard deviation of f*!
What if we substitute x1 with f* and x2 with f?
128.
x1
x2 i
⇠ N
✓
µ1
µ2
11 12
21 22
◆
X2
X1
Joint distribution of variables andx1 x2
P(x1, x2)
x2
Conditional distribution
1|2
µ1|2
P(x1|x2) = N(x1|µ1|2, 1|2)
µ1|2 = µ1 + 12 + T
22(x2 µ2)
1|2 = 11 12
T
22 21
Then we can compute mean and standard deviation of f*!
Exactly!
What if we substitute x1 with f* and x2 with f?
129. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
130. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
131. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤ ⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
⇤ = K⇤⇤ KT
⇤ K 1
K⇤
132. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
⇤ = K⇤⇤ KT
⇤ K 1
K⇤
133. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤
z⇤ z⇤
µ⇤
µ⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
⇤ = K⇤⇤ KT
⇤ K 1
K⇤
134. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤
z⇤ z⇤
µ⇤
µ⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
⇤ = K⇤⇤ KT
⇤ K 1
K⇤
135. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤also given
f⇤
Ok, so we have just modelled f:
f ⇠ N(0, K) and f⇤ ⇠ N(0, K⇤⇤)
f
f⇤
⇠
0
B
B
B
B
@
0
0
2
6
6
6
6
4
2
4
K11 K12 K13
K21 K22 K23
K31 K32 K33
3
5
2
4
K1⇤
K2⇤
K3⇤
3
5
⇥
K⇤1 K⇤2 K⇤3
⇤
[K⇤⇤]
3
7
7
7
7
5
1
C
C
C
C
A
K
1
Ki⇤
K⇤i
µ⇤
µ⇤
µ⇤
z⇤z⇤ z⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
⇤ = K⇤⇤ KT
⇤ K 1
K⇤
136. What is ?
f1
f2
f3
Zz1 z2 z3
F(z)
Given: {(f1, z1); (f2, z2); (f3, z3)}
z⇤also given
f⇤
µ⇤
µ⇤
µ⇤
z⇤z⇤ z⇤
µ⇤= µ(z⇤) + KT
⇤ K 1
(f µf )
⇤ = K⇤⇤ KT
⇤ K 1
K⇤
137. Pros:
1. Can model almost any function directly
3. Provides uncertainty estimates
2. Can be made more flexible with different kernels
Cons:
1. Cannot be interpreted
2. Loose efficiency in high dimensional spaces
3. Overfitting
138. Cat or Dog?
“It’s always seemed obvious to me that it’s better to know that
you don’t know, than to think you know and act on wrong
information.”
Katherine Bailey
140. Resources:
Katherine Bailey’s presentation: http://katbailey.github.io/gp_talk/
Gaussian_Processes.pdf
Katherine Bailey’s blog post: from both sides now: the math of linear
regression (http://katbailey.github.io/post/from-both-sides-now-the-
math-of-linear-regression/)
Katherine Bailey’s blog post: Gaussian processes for dummies (http://
katbailey.github.io/post/gaussian-processes-for-dummies/)
Kevin P. Murphy’s book: Machine Learning - A Probabilistic
Perspective, Chapter 15 (https://www.amazon.com/Machine-Learning-
Probabilistic-Perspective-Computation/dp/0262018020)
Alex Bridgland’s blog post: Introduction to Gaussian Processes - Part I
(http://bridg.land/posts/gaussian-processes-1)
Nando de Freitas, Machine Learning - Introduction to Gaussian
Processes (https://youtu.be/4vGiHC35j9s)