"Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest

Principal Component Analysis
– the original paper
ADRIAN FLOREA
PAPERS WE LOVE – BUCHAREST CHAPTER – MEETUP #12
DECEMBER 16TH, 2016

Karl Pearson, 1901
K. Pearson, "On lines and planes of closest fit to systems of points in space",
The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science,
Sixth Series, 2, pp. 559-572 (1901)
(1857-1936)

Dimensionality reduction (Pearson’s words)
(1901)

Dimensionality reduction (115 years later)
(2016)
71291 points
200 features
2 principal components
https://projector.tensorflow.org

Dimensionality reduction
n
d
n
r
• n observations
• d possibly correlated features
• r << d
• n observations
• r uncorrelated features
• retain most of the data variability
transform
Original data set
New data set

Introduction
niRxxxx
xxxX
dT
idiii
T
n
,1,)(
)(
21
21




),,,span(s.t.,},,1|{
),,,span(
21,
21
djij
T
i
d
i
d
d
uuuxuudiRu
eeexRx





xUaaUUxU
UUUU
aUxuauauaxuuux TTT
T
ddd







1
221121
orthogonalisbasislorthonormaanareofColumns
),,,span( 

Goal: find an “optimal” basis
),,,(
)(
),,,(
)(
21
21
21
21
d
T
d
U
d
T
d
uuuspan
aaa
eeespan
xxx
T





=
r << d
preservesmostof“variability”
xPxxUUx
xUaxUa
aUx
uauauax
r
T
rrT
rr
T
rr
rr








''
'
' 2211 
'x
ra

r=1
Find unitary that maximizes the variance of the projected points on it.
Projection of on
1, uuu
T
ix :u
uauxuxxu
xu
xu
xuxxx ii
T
ii
T
i
i
T
iii 

 )('
||||||||
||||),cos(||||||'||
uuuXuuxx
n
uuxxu
n
xuxu
n
xu
n
xu
n
a
n
TTn
i
T
ii
n
i
TT
ii
T
u
n
i
T
i
T
i
Tn
i i
Tn
i i
Tn
i uiu






)cov()
1
()(
1
))((
1
)(
1
)0(
1
)(
1
11
2
11
2
1
2
1
22


1,max  uuuu
TT
u
0Assume u


First principal component
0220)(0
),(
),(max),1(),(
,u







uuuuuu
uu
uL
uLuuuuuL
TT
TT




uu  is an eigenvalue of the covariance matrix with the associated eigenvector u
1
22
maxmax  
uuu
TT
u
uuuu
The largest eigenvalue is the projected variance on the associated eigenvector that is the direction of most variance,
called the first principal component.





 1
)(min
uu
uMSE
T
u The equivalent optimization problem
solved by Pearson

An equivalent objective function
 

n
i i
T
ii
T
iii
n
i i
T
iii
n
i i xxxxx
n
xxxx
n
xx
n
uMSE 1
2
1
2
1
)'''2||(||
1
)'()'(
1
||'||
1
)(
But:


n
i i
TT
i
T
i
TT
iii
T
i uxuuxuuxuxx
n
uMSEuxux 1
2
))())(()(2||(||
1
)()('
)))((||||(
1
)))(())((2||(||
1 2
11
2
uxxux
n
uuuxxuuxxux
n
T
ii
T
i
n
i
n
i
TT
ii
TT
ii
T
i   
2
const
1
2
2
1
maxmax)(min)(min
||||
))(||||(
1
uu
T
u
T
uu
n
i
i
TT
ii
T
i
n
i
uuuuuMSE
n
x
uuuxxux
n
  
 
Maximizing the projected variance is equivalent with minimizing the mean squared error
2
max)(min uuu
uMSE 

r=2
We have found the vector of the most variance (r = 1 case)1u









0
1
max
1
2
uv
vv
T
T
vv






vuvuuvuvuvvu
uuvuvuuvv
v
L
uvvvvvvL
TTTTTT
TTT
TTT
11111111
11111
1
2202222
220220
)1(),,(



vv  So is an eigenvector ofv 
2
2
maxmax  
uvv
the second largest eigenvalue of 
The second principal component is the eigenvector associated with the second largest eigenvalue, 2
The total variance is invariant to a change of basis!  

d
i i
d
i i 11
2


PCA algorithm
1. Normalize to zero-mean, unit-variance
2. Calculate the covariance matrix of the normalized data set
3. Find all eigenvalues of and arrange them in descending order
4. Choose r for the top r eigenvalues (and corresponding eigenvectors/principal components)


021  d 
 rr uuuU 21 the reduced basis (of principal components)
,nixUa i
T
ri 1,  the reduced dimensionality data set

How to choose r?









 ]9.0,7.0[,|min
1
1



di
i
ir


Kaiser criterion (Henry Felix Kaiser, 1960) }1|min{  iir 
}|min{ 1
d
ir d
i





screeplot(modelname)

PCA Pros & Cons
PROS
• dimensionality reduction can help learning
• removes noise
• can deal with large data sets
CONS
• hard to interpret
• sample dependent
• linear (

Bibliography
• A.A. Farag, S. Elhabian, "A Tutorial on Principal Component Analysis" (2009) -
http://dai.fmph.uniba.sk/courses/ml/sl/PCA.pdf
• N. de Freitas, "CPSC 340 - Machine Learning and Data Mining Lectures", University of British Columbia (2012) -
http://www.cs.ubc.ca/~nando/340-2012/lectures.php
• I. Georgescu, “Inteligență computațională“, Editura ASE (2015)
• I.T. Jolliffe, "Principal Component Analysis", 2nd ed., Springer (2002)
• J.N. Kutz, "Data-Driven Modeling & Scientific Computation. Methods for Complex Systems & Big Data", Oxford
University Press (2013)
•J. N. Kutz, "Singular Value Decomposition" - http://courses.washington.edu/am301/page1/page15/video.html
• K. Pearson, "On lines and planes of closest to systems of points in space", Philos. Mag, Sixth Series, 2, pp. 559-572
(1901) - http://stat.smmu.edu.cn/history/pearson1901.pdf
• M.J. Zaki, W. Meira Jr., “Data Mining and Analysis. Fundamental Concepts and Algorithms”, Cambridge University
Press (2014)

"Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie "Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest

Ähnlich wie "Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

"Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest