The presentation material for the reading club of Pattern Recognition and Machine Learning by Bishop.
The contents of the sections cover
- K-Means Clustering and its application for Image Compression
- Introduction of Latent Variables
- Mixtures of Gaussians and its update using EM-algorithm
-------------------------------------------------------------------------
研究室でのBishop著『パターン認識と機械学習』(PRML)の輪講用発表資料(ぜんぶ英語)です。
担当範囲は
・K-meansクラスタリングとその画像圧縮への応用
・隠れ変数の導入
・混合ガウス分布とEMアルゴリズムによる更新
2. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
3. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
4. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
5. Mixtures of Gaussians
K-means Clustering
Clustering Problem
An unsupervised machine learning problem
Divide data in some group (=cluster) where
ü
similar data
>
same group
ü
dissimilar data
>
different group
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
6. Mixtures of Gaussians
K-means Clustering
Clustering Problem
Divide data in some group (=cluster) where
ü
similar data
>
same group
ü
dissimilar data
>
different group
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
7. Mixtures of Gaussians
K-means Clustering
Clustering Problem
Divide data in some group (=cluster) where
ü
similar data
>
same group
ü
dissimilar data
>
different group
Minimize
N
n=1
xn − µk(n)
2
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
8. Mixtures of Gaussians
K-means Clustering
Clustering Problem
Divide data in some group (=cluster) where
ü
similar data
>
same group
ü
dissimilar data
>
different group
Minimize
N
n=1
xn − µk(n)
2
Center of the cluster
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
9. Mixtures of Gaussians
K-means Clustering
Clustering Problem
Given data set and # of cluster K
Let be cluster representative and be
assignment indicator ( ),
Here, J is called “distortion measure”.
X = {x1, . . . , xN }
µk rnk
rnk = 1 if x ∈ Ck
Minimize J =
N
n=1
K
k=1
rnk xn − µk
2
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
10. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
11. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
12. Mixtures of Gaussians
K-means Clustering
K-means Clustering
How to solve that?
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
13. Mixtures of Gaussians
K-means Clustering
K-means Clustering
How to solve that?
and are dependent each other
> No closed form solution
µk rnk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
14. Mixtures of Gaussians
K-means Clustering
K-means Clustering
How to solve that?
and are dependent each other
> No closed form solution
Use iterative algorithm !
µk rnk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
15. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Strategy
and can't be updated simultaneously
µk rnk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
16. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Strategy
and can't be updated simultaneously
> Update them one by one
µk rnk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
17. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (assignment)
Since each can be determined independently,
J will be minimum if they are assigned to the
nearest .
rnk
xn
µk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
18. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (assignment)
Since each can be determined independently,
J will be minimum if they are assigned to the
nearest . Therefore,
rnk
xn
µk
rnk =
1 if k = arg minj xn − µj
2
,
0 otherwise.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
19. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (parameter estimation)
Optimal is obtained by setting derivative 0.
µk
µk
∂
∂µk
N
n=1
K
k =1
rnk xn − µk
2
= 0.
⇐⇒ 2
N
n=1
rnk(xn − µk) = 0.
∴ µk =
N
n=1 rnkxn
N
n=1 rnk
=
1
Nk
xn∈Ck
xn.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
20. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (parameter estimation)
Optimal is obtained by setting derivative 0.
µk
µk
∂
∂µk
N
n=1
K
k =1
rnk xn − µk
2
= 0.
⇐⇒ 2
N
n=1
rnk(xn − µk) = 0.
∴ µk =
N
n=1 rnkxn
N
n=1 rnk
=
1
Nk
xn∈Ck
xn.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
21. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (parameter estimation)
Optimal is obtained by setting derivative 0.
µk
µk
∂
∂µk
N
n=1
K
k =1
rnk xn − µk
2
= 0.
⇐⇒ 2
N
n=1
rnk(xn − µk) = 0.
∴ µk =
N
n=1 rnkxn
N
n=1 rnk
=
1
Nk
xn∈Ck
xn.
Mean of the cluster
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
22. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (parameter estimation)
Optimal is obtained by setting derivative 0.
µk
µk
∂
∂µk
N
n=1
K
k =1
rnk xn − µk
2
= 0.
⇐⇒ 2
N
n=1
rnk(xn − µk) = 0.
∴ µk =
N
n=1 rnkxn
N
n=1 rnk
=
1
Nk
xn∈Ck
xn.
Mean of the cluster
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
23. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (parameter estimation)
Optimal is obtained by setting derivative 0.
µk
µk
∂
∂µk
N
n=1
K
k =1
rnk xn − µk
2
= 0.
⇐⇒ 2
N
n=1
rnk(xn − µk) = 0.
∴ µk =
N
n=1 rnkxn
N
n=1 rnk
=
1
Nk
xn∈Ck
xn.
Mean of the cluster
is the mean of the cluster
Cost function J corresponds to
the sum of inner-class variance!
µk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
24. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Update of (parameter estimation)
Optimal is obtained by setting derivative 0.
µk
µk
∂
∂µk
N
n=1
K
k =1
rnk xn − µk
2
= 0.
⇐⇒ 2
N
n=1
rnk(xn − µk) = 0.
∴ µk =
N
n=1 rnkxn
N
n=1 rnk
=
1
Nk
xn∈Ck
xn.
Mean of the cluster
is the mean of the cluster
Cost function J corresponds to
the sum of inner-class variance!
µk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
25. Mixtures of Gaussians
K-means Clustering
K-means Clustering
K-means algorithm
1. Initialize ,
2. Repeat following two steps until converge
i) Assign each to closest
ii) Update to the mean of the cluster
µk rnk
xn µk
µk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
26. Mixtures of Gaussians
K-means Clustering
K-means Clustering
K-means algorithm
1. Initialize ,
2. Repeat following two steps until converge
i) Assign each to closest
ii) Update to the mean of the cluster
µk rnk
xn µk
µk
E step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
27. Mixtures of Gaussians
K-means Clustering
K-means Clustering
K-means algorithm
1. Initialize ,
2. Repeat following two steps until converge
i) Assign each to closest
ii) Update to the mean of the cluster
µk rnk
xn µk
µk
M step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
28. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Convergence property
Both steps never increase J, so we can obtain
better result in every iteration.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
29. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Convergence property
Both steps never increase J, so we can obtain
better result in every iteration.
Since is finite, algorithm converge after
finite iterations.
rnk
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
30. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
31. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
E step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
32. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
M step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
33. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
E step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
34. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
M step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
35. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
E step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
36. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
M step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
37. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
E step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
38. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
M step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
39. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
40. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
41. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
42. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Calculation performance
E step
...
Comparison of every data point
and every cluster mean
> O(KN)
µk
xn
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
43. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Calculation performance
E step
...
Comparison of every data point
and every cluster mean
> O(KN)
µk
xn
Not good
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
44. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Calculation performance
E step
...
Comparison of every data point
and every cluster mean
> O(KN)
µk
xn
Not good
Improve with kd-tree,
triangle inequality...etc
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
45. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Calculation performance
E step
...
Comparison of every data point
and every cluster mean
> O(KN)
µk
xn
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
46. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Calculation performance
E step
...
Comparison of every data point
and every cluster mean
> O(KN)
M step
...
Calculation of mean for every cluster
> O(N)
µk
xn
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
47. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Here, two variation will be introduced:
1. On-line version
2. General dissimilarity
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
48. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Here, two variation will be introduced:
1. On-line version
2. General dissimilarity
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
49. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 1. On-line version
The case where one datum is observed at once.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
50. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 1. On-line version
The case where one datum is observed at once.
> Apply Robbins-Monro algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
51. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 1. On-line version
The case where one datum is observed at once.
> Apply Robbins-Monro algorithm
µnew
k = µold
k + ηn(xn − µold
k ).
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
52. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 1. On-line version
The case where one datum is observed at once.
> Apply Robbins-Monro algorithm
µnew
k = µold
k + ηn(xn − µold
k ).
Learning rate
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
53. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 1. On-line version
The case where one datum is observed at once.
> Apply Robbins-Monro algorithm
µnew
k = µold
k + ηn(xn − µold
k ).
Learning rate
Decrease with iteration
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
54. Mixtures of Gaussians
K-means Clustering
K-means Clustering
Here, two variation will be introduced:
1. On-line version
2. General dissimilarity
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
55. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 2. General dissimilarity
Euclidian distance is not
ü
appropriate to categorical data, etc.
ü
robust to outlier.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
56. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 2. General dissimilarity
Euclidian distance is not
ü
appropriate to categorical data, etc.
ü
robust to outlier.
> Use general dissimilarity measure
V(x, x )
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
57. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 2. General dissimilarity
Euclidian distance is not
ü
appropriate to categorical data, etc.
ü
robust to outlier.
> Use general dissimilarity measure
V(x, x )
E step ... No difference
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
58. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 2. General dissimilarity
Euclidian distance is not
ü
appropriate to categorical data, etc.
ü
robust to outlier.
> Use general dissimilarity measure
V(x, x )
M step ... Not assured J is easy to minimize
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
59. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 2. General dissimilarity
To make M-step easy, restrict to the vector
chosen from
>
A solution can be obtained by finite
number of comparison
µk
{xn}
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
60. Mixtures of Gaussians
K-means Clustering
K-means Clustering
[Variation] 2. General dissimilarity
To make M-step easy, restrict to the vector
chosen from
>
A solution can be obtained by finite
number of comparison
µk
{xn}
µk = arg min
xn
xn ∈Ck
V(xn, xn )
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
61. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
62. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
63. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
K-means algorithm can be applied to
Image Compression and Segmentation
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
64. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
K-means algorithm can be applied to
Image Compression and Segmentation
Basic Idea
Treat similar pixel as same one
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
65. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
K-means algorithm can be applied to
Image Compression and Segmentation
Basic Idea
Treat similar pixel as same one
Original data
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
66. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
K-means algorithm can be applied to
Image Compression and Segmentation
Basic Idea
Treat similar pixel as same one
Cluster center
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
67. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
K-means algorithm can be applied to
Image Compression and Segmentation
Basic Idea
Treat similar pixel as same one
Cluster center
(pallet / code-book vector)
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
68. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
K-means algorithm can be applied to
Image Compression and Segmentation
Basic Idea
Treat similar pixel as same one
= so called “vector quantization”
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
69. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
Demo
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
70. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
Demo
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
71. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
Demo
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
72. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
Compression rate
Original image...24N bits
(N=# of pixels)
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
73. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
Compression rate
Original image...24N bits
(N=# of pixels)
Compressed image... 24K+N log2K bits
(K=# of pallet)
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
74. Mixtures of Gaussians
K-means Clustering
Application for Image Compression
Compression rate
Original image...24N bits
(N=# of pixels)
Compressed image... 24K+N log2K bits
(K=# of pallet)
16.7% if N~1M, K=10
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
75. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
76. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
77. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
78. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
In K-means, all assignments
are equal, “all or nothing”.
Treated same
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
79. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
In K-means, all assignments
are equal, “all or nothing”.
Is these “hard” assignment
appropriate?
Treated same
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
80. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
In K-means, all assignments
are equal, “all or nothing”.
Is these “hard” assignment
appropriate?
>
Want introduce "soft"
assignment
Treated same
Probabilistic
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
81. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Introduce random variable z,
having 1-of-K representation
> Control unobserved “states”
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
82. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Introduce random variable z,
having 1-of-K representation
> Control unobserved “states”
Once state is determined,
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
83. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Introduce random variable z,
having 1-of-K representation
> Control unobserved “states”
Once state is determined,
x is drawn from Gaussian of the state
p(x|zk = 1) = N(x|µk, Σk).
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
84. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Introduce random variable z,
having 1-of-K representation
> Control unobserved “states”
Once state is determined,
x is drawn from Gaussian of the state
p(x|zk = 1) = N(x|µk, Σk).
x
z
Graphical representation
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
85. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Here the distribution over x is
p(x) =
z
p(z)p(x|z)
=
K
k=1
p(zk = 1)p(x|zk = 1)
=
K
k=1
πkN(x|µk, Σk).
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
86. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Here the distribution over x is
p(x) =
z
p(z)p(x|z)
=
K
k=1
p(zk = 1)p(x|zk = 1)
=
K
k=1
πkN(x|µk, Σk).
z is 1-of-K rep.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
87. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Here the distribution over x is
p(x) =
z
p(z)p(x|z)
=
K
k=1
p(zk = 1)p(x|zk = 1)
=
K
k=1
πkN(x|µk, Σk).
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
88. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Here the distribution over x is
p(x) =
z
p(z)p(x|z)
=
K
k=1
p(zk = 1)p(x|zk = 1)
=
K
k=1
πkN(x|µk, Σk).
Gaussian Mixtures !
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
89. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
90. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
91. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
92. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
Posteriors
93. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
Posteriors
Priors
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
94. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
Posteriors
Priors
Likelihood
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
95. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Estimate (or “explain”) x came from which state
This value is also called “responsibilities”
γ(zk) ≡ p(zk = 1|x) =
p(zk = 1)p(x|zk = 1)
j p(zj = 1)p(x|zj = 1)
=
πkN(x|µk, Σk)
j πjN(x|µj, Σj)
.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
96. Mixtures of Gaussians
K-means Clustering
Introduction of Latent Variable
Example of Gaussian Mixtures
(a)
0 0.5 1
0
0.5
1
(b)
0 0.5 1
0
0.5
1
(c)
0 0.5 1
0
0.5
1
No state info
Coloured by
true state
Coloured by
responsibility
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
97. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
98. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
99. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
ML estimates of mixtures of Gaussians have
two problems:
i. Presence of Singularities
ii. Identifiability
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
100. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
ML estimates of mixtures of Gaussians have
two problems:
i. Presence of Singularities
ii. Identifiability
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
101. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
∃j, m µj = xm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
102. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
Likelihood can be however large by
∃j, m µj = xm
σj → 0
L ∝
1
σj
+
k=j
pk,m
n=m
1
σj
exp −
(xn − µj)2
2σ2
j
+
k=j
pk,n
→∞.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
103. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
Likelihood can be however large by
∃j, m µj = xm
σj → 0
L ∝
1
σj
+
k=j
pk,m
n=m
1
σj
exp −
(xn − µj)2
2σ2
j
+
k=j
pk,n
→∞.→ ∞
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
104. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
Likelihood can be however large by
∃j, m µj = xm
σj → 0
L ∝
1
σj
+
k=j
pk,m
n=m
1
σj
exp −
(xn − µj)2
2σ2
j
+
k=j
pk,n
→∞. → ∞
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
105. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
Likelihood can be however large by
∃j, m µj = xm
σj → 0
L ∝
1
σj
+
k=j
pk,m
n=m
1
σj
exp −
(xn − µj)2
2σ2
j
+
k=j
pk,n
→∞. → ∞ → 0
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
106. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
Likelihood can be however large by
∃j, m µj = xm
σj → 0
L ∝
1
σj
+
k=j
pk,m
n=m
1
σj
exp −
(xn − µj)2
2σ2
j
+
k=j
pk,n
→∞. → ∞ > 0
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
107. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
What if a mean collides with a data point?
Likelihood can be however large by
∃j, m µj = xm
σj → 0
L ∝
1
σj
+
k=j
pk,m
n=m
1
σj
exp −
(xn − µj)2
2σ2
j
+
k=j
pk,n
→∞.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
108. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
It doesn't occur in single Gaussian.
L ∝
1
σN
j n=m
exp −
(xn − µj)2
2σ2
j
→0.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
109. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
It doesn't occur in single Gaussian.
L ∝
1
σN
j n=m
exp −
(xn − µj)2
2σ2
j
→0.→ ∞
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
110. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
It doesn't occur in single Gaussian.
L ∝
1
σN
j n=m
exp −
(xn − µj)2
2σ2
j
→0.→ ∞ → 0
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
111. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
It doesn't occur in single Gaussian.
L ∝
1
σN
j n=m
exp −
(xn − µj)2
2σ2
j
→0.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
112. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
i) Presence of Singularities
It doesn't occur in single Gaussian.
It doesn't occur in Bayesian approach either.
L ∝
1
σN
j n=m
exp −
(xn − µj)2
2σ2
j
→0.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
113. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
ML estimates of mixtures of Gaussians have
two problems:
i. Presence of Singularities
ii. Identifiability
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
114. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
ii) Identifiability
Optimal solutions are not unique:
If we have a solution, there are (K!-1) other
equivalent solution.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
115. Mixtures of Gaussians
K-means Clustering
Problems of ML estimates
ii) Identifiability
Optimal solutions are not unique:
If we have a solution, there are (K!-1) other
equivalent solution.
Matters when interpret,
but does not matter when model only
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
116. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
117. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
118. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
The conditions of ML are obtained by
where
∂
∂µk
L = 0,
∂
∂Σk
L = 0,
∂
∂πk
L + λ j πj − 1 = 0.
L(π, µ, Σ) =
N
n=1 ln
K
k=1 πkN(xn|µk, Σk)
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
119. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
The conditions of ML
where
µk =
1
Nk
N
n=1
γn(zk)xn,
Σk =
1
Nk
N
n=1
γn(zk)(xn − µj)(xn − µj)T
,
πk =
Nk
N
,
Nk =
N
n=1 γn(zk)
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
120. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
The conditions of ML
where
µk =
1
Nk
N
n=1
γn(zk)xn,
Σk =
1
Nk
N
n=1
γn(zk)(xn − µj)(xn − µj)T
,
πk =
Nk
N
,
Nk =
N
n=1 γn(zk)
γn(zk) appeared
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
121. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Recall that
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
122. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Recall that
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
Parameters appeared
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
123. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Recall that
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
Parameters appeared
= No closed form solution
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
124. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Recall that
Again, use iterative algorithm!
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
Parameters appeared
= No closed form solution
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
125. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
EM algorithm for Gaussian Mixtures
1. Initialize parameters
2. Repeat following two steps until converge
i) Calculate
ii) Update parameters
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
126. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
EM algorithm for Gaussian Mixtures
1. Initialize parameters
2. Repeat following two steps until converge
i) Calculate
ii) Update parameters
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
E step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
127. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
EM algorithm for Gaussian Mixtures
1. Initialize parameters
2. Repeat following two steps until converge
i) Calculate
ii) Update parameters
γn(zk) =
πkN(xn|µk, Σk)
j πjN(xn|µj, Σj)
.
M step
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
128. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
129. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
130. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
131. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
132. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Demo of algorithm
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
133. Mixtures of Gaussians
K-means Clustering
EM-algorithm for Gaussian Mixtures
Comparison with K-means
EM for Gaussian Mixtures
K-means Clustering
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
134. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
135. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA
136. Mixtures of Gaussians
K-means Clustering
Today's topics
1. K-means Clustering
1. Clustering Problem
2. K-means Clustering
3. Application for Image Compression
2. Mixtures of Gaussians
1. Introduction of latent variables
2. Problem of ML estimates
3. EM-algorithm for Mixture of Gaussians
July 16, 2014
PRML 9.1-9.2
Shinichi TAMURA