The fourth lecture from the Machine Learning course series of lectures. This lecture first introduces a problem of visualising multi-dimensional data on fewer dimensions and later discusses one of the most popular methods for reducing dimensionality - principal component analysis (PCA). Later, also t-SNE is mentioned briefly as a non-linear alternative to PCA. A link to my github (https://github.com/skyfallen/MachineLearningPracticals) with practicals that I have designed for this course in both R and Python. I can share keynote files, contact me via e-mail: dmytro.fishman@ut.ee.
15. Patients
Healthy
Every person in its turn
could be described by
hundreds of protein
reactivities
Each dot in this plot
represents a person
16. Patients
Healthy
Every person in its turn
could be described by
hundreds of protein
reactivities
Each dot in this plot
represents a person
The idea is that people with
similar protein reactivity
profiles cluster together
34. 2-Dimensional data
Protein profiles are correlated
Person Protein #1 Protein #2
A 24 29
B 63 59
C 51 32
D 34 56
E 15 4
… … …
30
60
Protein#2 300 60
Protein #1
0
35. 2-Dimensional data
Here is no correlation
Person Protein #1 Protein #2
A 24 29
B 63 59
C 51 32
D 34 56
E 15 4
… … …
30
60
Protein#2 300 60
Protein #1
0
47. Projected data -
does not look
too different
30
60
Protein#2
300 60
Protein #1
0
Importance of axes
48. We can safely
remove 2nd dimension
30
60
Protein#2
300 60
Protein #1
0
300 60
Protein #1
Importance of axes
Projected data -
does not look
too different
49. 30
60
Protein#2
300 60
Protein #1
0
300 60
Protein #1
The important variation is from left to right
We can safely
remove 2nd dimension
Importance of axes
Projected data -
does not look
too different
53. 30
60
Protein#2
300 60
Protein #1
0
Data is mostly spread
along this line
and a little bit along
this line
What if we make new axes from these lines?
Principle components
56. Principle components
PC1
Easier to see right/left and
above/below variation
PC2
These new axes are called
Principle Components or
PCs
New X and Y axes
57. Principle components
PC1
Easier to see right/left and
above/below variation
PC1 spans along the
direction of the most
PC2
These new axes are called
Principle Components or
PCs
New X and Y axes
58. Principle components
PC1
Easier to see right/left and
above/below variation
PC1 spans along the
direction of the most
PC2
PC2 spans along the
direction of the most
These new axes are called
Principle Components or
PCs
New X and Y axes
59. Principle components
PC1
Easier to see right/left and
above/below variation
PC1 spans along the
direction of the most
PC2
PC2 spans along the
direction of the most
These new axes are called
Principle Components or
PCs
Principle Components are
always orthogonal one to
another
New X and Y axes
62. Principle components
If original data would have 3 dimensions, we would
have 3rd PC
There is always as
many PCs as there are
dimensions
Each new PCs is
guaranteed to explain
less variance
PC1
PC2
PC3
63. Principle components
If original data would have 3 dimensions, we would
have 3rd PC
There is always as
many PCs as there are
dimensions
Usually it is enough to project data points onto first
two PCs to see important patterns
PC1
PC2
PC3
Each new PCs is
guaranteed to explain
less variance
70. PC1
PC2
30
3
For example, this one
has coordinates (3,3)
Where PCs come from
These are vectors
All vectors have
coordinates
71. PC1
PC2
30
3
For example, this one
has coordinates (3,3)
In 200-D space vectors have 200 coordinates
Where PCs come from
These are vectors
All vectors have
coordinates
75. 30
60
Protein#2
300 60
Protein #1
0
How did we choose PC1?
Coordinates of this vector
are coordinates of PC1
By minimising the distance
from points to the vector
Where PCs come from
PC1
76. 30
60
Protein#2
300 60
Protein #1
0
How did we choose PC1?
Coordinates of this vector
are coordinates of PC1
By minimising the distance
from points to the vector
Where PCs come from
Coordinates of PC2 are
chosen in similar fashion but
PC2 should be orthogonal
to PC1
PC1
PC2
77. PCA Example
Reactivities PCs coordinates
PC1 PC2
2 3
4 12
… …
-5 0
Let’s find coordinates of Person A in terms of PCs
Person Pr1 Pr2 … PrN
A 24 29 … 11
B 63 59 … 1
C 51 32 … 23
D 34 56 … 2
E 15 4 … 8
… … … … …
78. PCA Example
Reactivities PCs coordinates
PC1 PC2
2 3
4 12
… …
-5 0
Let’s find coordinates of Person A in terms of PCs
Person Pr1 Pr2 … PrN
A 24 29 … 11
B 63 59 … 1
C 51 32 … 23
D 34 56 … 2
E 15 4 … 8
… … … … …
asmanyasthere
areproteins(N)
79. PCA Example
Reactivities PCs coordinates
Let’s find coordinates of Person A in terms of PCs
Person A (PC1 score) =
PC1 PC2
2 3
4 12
… …
-5 0
Person Pr1 Pr2 … PrN
A 24 29 … 11
B 63 59 … 1
C 51 32 … 23
D 34 56 … 2
E 15 4 … 8
… … … … …
80. PCA Example
Reactivities PCs coordinates
Let’s find coordinates of Person A in terms of PCs
Person A (PC1 score) = 24*2 + 29*4 +… + 11*(-5) = 11
PC1 PC2
2 3
4 12
… …
-5 0
Person Pr1 Pr2 … PrN
A 24 29 … 11
B 63 59 … 1
C 51 32 … 23
D 34 56 … 2
E 15 4 … 8
… … … … …
81. PCA Example
Reactivities PCs coordinates
Person A (PC2 score) = 24*3+ 29*12 + … +11*0 = 21
Reactivities PCs coordinates
Let’s find coordinates of Person A in terms of PCs
Person A (PC1 score) = 24*2 + 29*4 +… + 11*(-5) = 11
PC1 PC2
2 3
4 12
… …
-5 0
Person Pr1 Pr2 … PrN
A 24 29 … 11
B 63 59 … 1
C 51 32 … 23
D 34 56 … 2
E 15 4 … 8
… … … … …
82. Principle components
Principle component 1
Principle
component 2
21
110
0
A
Person A (PC2 score) = 24*3+ 29*12 + … +11*0 = 21
Person A (PC1 score) = 24*2 + 29*4 +… + 11*(-5) = 11
85. Principle components
Principle component 1
Principle
component 2
0
0
A
B
C
D E
G
The idea is that similar points in
multi-dimensional space get
located close in fewer dimensions
Healthy
Patients
86. PCA is good, but it is a linear algorithm, meaning
that it cannot capture complex relationship between
features
87. PCA is good, but it is a linear algorithm, meaning
that it cannot capture complex relationship between
features
There are always alternative options to consider…
90. https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/
t-SNE
visualisation is form http://distill.pub/2016/misread-tsne/
PCA
O(N2) makes it very slow
Meaning of features is lost
Has been shown to capture the
structure of multi-dimensional
data better (than PCA)
Works relatively fast even on big
datasets
Transformed features could be
traced back
Usually is not so good in figuring
out hidden structure
92. References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)