SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Principal Component Analysis
– the original paper
ADRIAN FLOREA
PAPERS WE LOVE – BUCHAREST CHAPTER – MEETUP #12
DECEMBER 16TH, 2016
Karl Pearson, 1901
K. Pearson, "On lines and planes of closest fit to systems of points in space",
The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science,
Sixth Series, 2, pp. 559-572 (1901)
(1857-1936)
Dimensionality reduction (Pearson’s words)
(1901)
Dimensionality reduction (115 years later)
(2016)
71291 points
200 features
2 principal components
https://projector.tensorflow.org
Dimensionality reduction
n
d
n
r
• n observations
• d possibly correlated features
• r << d
• n observations
• r uncorrelated features
• retain most of the data variability
transform
Original data set
New data set
Introduction
niRxxxx
xxxX
dT
idiii
T
n
,1,)(
)(
21
21




),,,span(s.t.,},,1|{
),,,span(
21,
21
djij
T
i
d
i
d
d
uuuxuudiRu
eeexRx





xUaaUUxU
UUUU
aUxuauauaxuuux TTT
T
ddd







1
221121
orthogonalisbasislorthonormaanareofColumns
),,,span( 
Goal: find an “optimal” basis
),,,(
)(
),,,(
)(
21
21
21
21
d
T
d
U
d
T
d
uuuspan
aaa
eeespan
xxx
T





=
r << d
preservesmostof“variability”
xPxxUUx
xUaxUa
aUx
uauauax
r
T
rrT
rr
T
rr
rr








''
'
' 2211 
'x
ra
r=1
Find unitary that maximizes the variance of the projected points on it.
Projection of on
1, uuu
T
ix :u
uauxuxxu
xu
xu
xuxxx ii
T
ii
T
i
i
T
iii 

 )('
||||||||
||||),cos(||||||'||
uuuXuuxx
n
uuxxu
n
xuxu
n
xu
n
xu
n
a
n
TTn
i
T
ii
n
i
TT
ii
T
u
n
i
T
i
T
i
Tn
i i
Tn
i i
Tn
i uiu






)cov()
1
()(
1
))((
1
)(
1
)0(
1
)(
1
11
2
11
2
1
2
1
22


1,max  uuuu
TT
u
0Assume u

First principal component
0220)(0
),(
),(max),1(),(
,u







uuuuuu
uu
uL
uLuuuuuL
TT
TT




uu  is an eigenvalue of the covariance matrix with the associated eigenvector u
1
22
maxmax  
uuu
TT
u
uuuu
The largest eigenvalue is the projected variance on the associated eigenvector that is the direction of most variance,
called the first principal component.




 1
)(min
uu
uMSE
T
u The equivalent optimization problem
solved by Pearson
An equivalent objective function
 

n
i i
T
ii
T
iii
n
i i
T
iii
n
i i xxxxx
n
xxxx
n
xx
n
uMSE 1
2
1
2
1
)'''2||(||
1
)'()'(
1
||'||
1
)(
But:


n
i i
TT
i
T
i
TT
iii
T
i uxuuxuuxuxx
n
uMSEuxux 1
2
))())(()(2||(||
1
)()('
)))((||||(
1
)))(())((2||(||
1 2
11
2
uxxux
n
uuuxxuuxxux
n
T
ii
T
i
n
i
n
i
TT
ii
TT
ii
T
i   
2
const
1
2
2
1
maxmax)(min)(min
||||
))(||||(
1
uu
T
u
T
uu
n
i
i
TT
ii
T
i
n
i
uuuuuMSE
n
x
uuuxxux
n
  
 
Maximizing the projected variance is equivalent with minimizing the mean squared error
2
max)(min uuu
uMSE 
r=2
We have found the vector of the most variance (r = 1 case)1u









0
1
max
1
2
uv
vv
T
T
vv






vuvuuvuvuvvu
uuvuvuuvv
v
L
uvvvvvvL
TTTTTT
TTT
TTT
11111111
11111
1
2202222
220220
)1(),,(



vv  So is an eigenvector ofv 
2
2
maxmax  
uvv
the second largest eigenvalue of 
The second principal component is the eigenvector associated with the second largest eigenvalue, 2
The total variance is invariant to a change of basis!  

d
i i
d
i i 11
2

PCA algorithm
1. Normalize to zero-mean, unit-variance
2. Calculate the covariance matrix of the normalized data set
3. Find all eigenvalues of and arrange them in descending order
4. Choose r for the top r eigenvalues (and corresponding eigenvectors/principal components)


021  d 
 rr uuuU 21 the reduced basis (of principal components)
,nixUa i
T
ri 1,  the reduced dimensionality data set
How to choose r?









 ]9.0,7.0[,|min
1
1



di
i
ir


Kaiser criterion (Henry Felix Kaiser, 1960) }1|min{  iir 
}|min{ 1
d
ir d
i





screeplot(modelname)
PCA Pros & Cons
PROS
• dimensionality reduction can help learning
• removes noise
• can deal with large data sets
CONS
• hard to interpret
• sample dependent
• linear (
Bibliography
• A.A. Farag, S. Elhabian, "A Tutorial on Principal Component Analysis" (2009) -
http://dai.fmph.uniba.sk/courses/ml/sl/PCA.pdf
• N. de Freitas, "CPSC 340 - Machine Learning and Data Mining Lectures", University of British Columbia (2012) -
http://www.cs.ubc.ca/~nando/340-2012/lectures.php
• I. Georgescu, “Inteligență computațională“, Editura ASE (2015)
• I.T. Jolliffe, "Principal Component Analysis", 2nd ed., Springer (2002)
• J.N. Kutz, "Data-Driven Modeling & Scientific Computation. Methods for Complex Systems & Big Data", Oxford
University Press (2013)
•J. N. Kutz, "Singular Value Decomposition" - http://courses.washington.edu/am301/page1/page15/video.html
• K. Pearson, "On lines and planes of closest to systems of points in space", Philos. Mag, Sixth Series, 2, pp. 559-572
(1901) - http://stat.smmu.edu.cn/history/pearson1901.pdf
• M.J. Zaki, W. Meira Jr., “Data Mining and Analysis. Fundamental Concepts and Algorithms”, Cambridge University
Press (2014)

Weitere ähnliche Inhalte

Was ist angesagt?

データサイエンス概論第一=4-2 確率と確率分布
データサイエンス概論第一=4-2 確率と確率分布データサイエンス概論第一=4-2 確率と確率分布
データサイエンス概論第一=4-2 確率と確率分布Seiichi Uchida
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Heuristic approach optimization
Heuristic  approach optimizationHeuristic  approach optimization
Heuristic approach optimizationAng Sovann
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
PRML2.4 指数型分布族
PRML2.4 指数型分布族PRML2.4 指数型分布族
PRML2.4 指数型分布族hiroki yamaoka
 
PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半Ohsawa Goodfellow
 
距離まとめられませんでした
距離まとめられませんでした距離まとめられませんでした
距離まとめられませんでしたHaruka Ozaki
 
みどりぼん3章前半
みどりぼん3章前半みどりぼん3章前半
みどりぼん3章前半Akifumi Eguchi
 
2 5 2.一般化線形モデル色々_ロジスティック回帰
2 5 2.一般化線形モデル色々_ロジスティック回帰2 5 2.一般化線形モデル色々_ロジスティック回帰
2 5 2.一般化線形モデル色々_ロジスティック回帰logics-of-blue
 
はじパタ6章前半
はじパタ6章前半はじパタ6章前半
はじパタ6章前半T T
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
データサイエンス概論第一=2-1 データ間の距離と類似度
データサイエンス概論第一=2-1 データ間の距離と類似度データサイエンス概論第一=2-1 データ間の距離と類似度
データサイエンス概論第一=2-1 データ間の距離と類似度Seiichi Uchida
 
続・わかりやすいパターン認識_3章
続・わかりやすいパターン認識_3章続・わかりやすいパターン認識_3章
続・わかりやすいパターン認識_3章weda654
 
13 分類とパターン認識
13 分類とパターン認識13 分類とパターン認識
13 分類とパターン認識Seiichi Uchida
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaEdureka!
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門Takeshi Arabiki
 
Vector space classification
Vector space classificationVector space classification
Vector space classificationUjjawal
 

Was ist angesagt? (20)

データサイエンス概論第一=4-2 確率と確率分布
データサイエンス概論第一=4-2 確率と確率分布データサイエンス概論第一=4-2 確率と確率分布
データサイエンス概論第一=4-2 確率と確率分布
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Heuristic approach optimization
Heuristic  approach optimizationHeuristic  approach optimization
Heuristic approach optimization
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
PRML2.4 指数型分布族
PRML2.4 指数型分布族PRML2.4 指数型分布族
PRML2.4 指数型分布族
 
PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半
 
距離まとめられませんでした
距離まとめられませんでした距離まとめられませんでした
距離まとめられませんでした
 
みどりぼん3章前半
みどりぼん3章前半みどりぼん3章前半
みどりぼん3章前半
 
2 5 2.一般化線形モデル色々_ロジスティック回帰
2 5 2.一般化線形モデル色々_ロジスティック回帰2 5 2.一般化線形モデル色々_ロジスティック回帰
2 5 2.一般化線形モデル色々_ロジスティック回帰
 
はじパタ6章前半
はじパタ6章前半はじパタ6章前半
はじパタ6章前半
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
データサイエンス概論第一=2-1 データ間の距離と類似度
データサイエンス概論第一=2-1 データ間の距離と類似度データサイエンス概論第一=2-1 データ間の距離と類似度
データサイエンス概論第一=2-1 データ間の距離と類似度
 
続・わかりやすいパターン認識_3章
続・わかりやすいパターン認識_3章続・わかりやすいパターン認識_3章
続・わかりやすいパターン認識_3章
 
13 分類とパターン認識
13 分類とパターン認識13 分類とパターン認識
13 分類とパターン認識
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
 

Ähnlich wie "Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest

Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classificationAlex Henderson
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-feiTianlu Wang
 
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
Recurrence Quantification Analysis :Tutorial & application to eye-movement dataRecurrence Quantification Analysis :Tutorial & application to eye-movement data
Recurrence Quantification Analysis : Tutorial & application to eye-movement dataDeb Aks
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Simon Price
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptjohnsaida3101
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component AnalysisSunjeet Jena
 

Ähnlich wie "Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest (20)

Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
 
Q
QQ
Q
 
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
Recurrence Quantification Analysis :Tutorial & application to eye-movement dataRecurrence Quantification Analysis :Tutorial & application to eye-movement data
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
 
9.venkata naga vamsi. a
9.venkata naga vamsi. a9.venkata naga vamsi. a
9.venkata naga vamsi. a
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.ppt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Ds2 statistics
Ds2 statisticsDs2 statistics
Ds2 statistics
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 

Kürzlich hochgeladen

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 

Kürzlich hochgeladen (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 

"Principal Component Analysis - the original paper" presentation @ Papers We Love Bucharest

  • 1. Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER – MEETUP #12 DECEMBER 16TH, 2016
  • 2. Karl Pearson, 1901 K. Pearson, "On lines and planes of closest fit to systems of points in space", The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, Sixth Series, 2, pp. 559-572 (1901) (1857-1936)
  • 4. Dimensionality reduction (115 years later) (2016) 71291 points 200 features 2 principal components https://projector.tensorflow.org
  • 5. Dimensionality reduction n d n r • n observations • d possibly correlated features • r << d • n observations • r uncorrelated features • retain most of the data variability transform Original data set New data set
  • 7. Goal: find an “optimal” basis ),,,( )( ),,,( )( 21 21 21 21 d T d U d T d uuuspan aaa eeespan xxx T      = r << d preservesmostof“variability” xPxxUUx xUaxUa aUx uauauax r T rrT rr T rr rr         '' ' ' 2211  'x ra
  • 8. r=1 Find unitary that maximizes the variance of the projected points on it. Projection of on 1, uuu T ix :u uauxuxxu xu xu xuxxx ii T ii T i i T iii    )(' |||||||| ||||),cos(||||||'|| uuuXuuxx n uuxxu n xuxu n xu n xu n a n TTn i T ii n i TT ii T u n i T i T i Tn i i Tn i i Tn i uiu       )cov() 1 ()( 1 ))(( 1 )( 1 )0( 1 )( 1 11 2 11 2 1 2 1 22   1,max  uuuu TT u 0Assume u 
  • 9. First principal component 0220)(0 ),( ),(max),1(),( ,u        uuuuuu uu uL uLuuuuuL TT TT     uu  is an eigenvalue of the covariance matrix with the associated eigenvector u 1 22 maxmax   uuu TT u uuuu The largest eigenvalue is the projected variance on the associated eigenvector that is the direction of most variance, called the first principal component.
  • 10.
  • 11.      1 )(min uu uMSE T u The equivalent optimization problem solved by Pearson
  • 12. An equivalent objective function    n i i T ii T iii n i i T iii n i i xxxxx n xxxx n xx n uMSE 1 2 1 2 1 )'''2||(|| 1 )'()'( 1 ||'|| 1 )( But:   n i i TT i T i TT iii T i uxuuxuuxuxx n uMSEuxux 1 2 ))())(()(2||(|| 1 )()(' )))((||||( 1 )))(())((2||(|| 1 2 11 2 uxxux n uuuxxuuxxux n T ii T i n i n i TT ii TT ii T i    2 const 1 2 2 1 maxmax)(min)(min |||| ))(||||( 1 uu T u T uu n i i TT ii T i n i uuuuuMSE n x uuuxxux n      Maximizing the projected variance is equivalent with minimizing the mean squared error 2 max)(min uuu uMSE 
  • 13. r=2 We have found the vector of the most variance (r = 1 case)1u          0 1 max 1 2 uv vv T T vv       vuvuuvuvuvvu uuvuvuuvv v L uvvvvvvL TTTTTT TTT TTT 11111111 11111 1 2202222 220220 )1(),,(    vv  So is an eigenvector ofv  2 2 maxmax   uvv the second largest eigenvalue of  The second principal component is the eigenvector associated with the second largest eigenvalue, 2 The total variance is invariant to a change of basis!    d i i d i i 11 2 
  • 14. PCA algorithm 1. Normalize to zero-mean, unit-variance 2. Calculate the covariance matrix of the normalized data set 3. Find all eigenvalues of and arrange them in descending order 4. Choose r for the top r eigenvalues (and corresponding eigenvectors/principal components)   021  d   rr uuuU 21 the reduced basis (of principal components) ,nixUa i T ri 1,  the reduced dimensionality data set
  • 15. How to choose r?           ]9.0,7.0[,|min 1 1    di i ir   Kaiser criterion (Henry Felix Kaiser, 1960) }1|min{  iir  }|min{ 1 d ir d i      screeplot(modelname)
  • 16. PCA Pros & Cons PROS • dimensionality reduction can help learning • removes noise • can deal with large data sets CONS • hard to interpret • sample dependent • linear (
  • 17. Bibliography • A.A. Farag, S. Elhabian, "A Tutorial on Principal Component Analysis" (2009) - http://dai.fmph.uniba.sk/courses/ml/sl/PCA.pdf • N. de Freitas, "CPSC 340 - Machine Learning and Data Mining Lectures", University of British Columbia (2012) - http://www.cs.ubc.ca/~nando/340-2012/lectures.php • I. Georgescu, “Inteligență computațională“, Editura ASE (2015) • I.T. Jolliffe, "Principal Component Analysis", 2nd ed., Springer (2002) • J.N. Kutz, "Data-Driven Modeling & Scientific Computation. Methods for Complex Systems & Big Data", Oxford University Press (2013) •J. N. Kutz, "Singular Value Decomposition" - http://courses.washington.edu/am301/page1/page15/video.html • K. Pearson, "On lines and planes of closest to systems of points in space", Philos. Mag, Sixth Series, 2, pp. 559-572 (1901) - http://stat.smmu.edu.cn/history/pearson1901.pdf • M.J. Zaki, W. Meira Jr., “Data Mining and Analysis. Fundamental Concepts and Algorithms”, Cambridge University Press (2014)