SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Presented by
Eshan Agarwal
Implement Principal Component
Analysis(PCA) in python
How do we choose the
right features ?
Given a
classification
problem ….
 PCA is a method for reducing the dimensionality of data.
 It can be thought of as a projection method where data with m-columns
(features) is projected into a subspace with m or fewer columns, while
retaining the essence of the original data.
An PCA
Xn
km
Introduction to PCA
In this presentation, we will discover the PCA
method for dimensionality reduction and how to
implement it from scratch in Python.
 Before go in deep of PCA let us understand
some key points of PCA
 Variance
 The variance of each variable is the average squared deviation of its
n values around the mean of that variable. It can also think of as
spread of data points.
Geometric Rationale of PCA
 Covariance
Covariance of
variables i and j
Sum over all
n objects
Value of
variable i
in object m
Mean of
variable i
Value of
variable j
in object m
Mean of
variable j
 Degree to which the variables are linearly correlated is represented by
their covariances.
Geometric Rationale of PCA
Objective of PCA
 Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions
(principal axes)
 PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next
highest variance .... , and axis p has the lowest variance
Implement PCA in Python (Scratch)
 Load the Data-Set :
 We can use Boston Housing dataset for PCA. Boston dataset has 13
features. So question here is how to visualize the data ?. We can
reduce the dimensions of data by using PCA and then visualize.
 Standardize data:
 PCA is largely affected by scales and different features might have different
scales. So it is better to standardize data before finding PCA components.
Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
The Algebra of PCA
 Calculating PCA involves following steps:
a. Calculating the covariance matrix.
b. Calculating the eigenvalues and eigenvector.
c. Forming Principal Components.
d. Projection into the new feature space.
a b dc+ + ++ =
 Calculating the covariance matrix (S) :
 Covariance matrix is a matrix of variances and covariances (or correlations) among
every pair of the m variables .
 It is square, symmetric matrix.
 Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function
in python.
Calculating the eigenvalues and eigenvector :
 ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic
equation:
det( ƛ*I - A ) = 0
Where, I is the identity matrix of the same dimension as X.
 The sum of all m eigenvalues equals the trace of S (the sum of the variances of
the original variables).
 For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by
solving :
( ƛ*I - A )v = 0
 The eigenvalues, 1, 2, ... m are the variances of the coordinates
on each principal component axis.
Calculating the eigenvalues and eigenvector :
 We are using scipy.linalg, which have eigh function for finding the top eigen-
values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow.
Code for finding eigenvalues and eigenvector :
Forming Principal Components :
 Below is code for forming principal components, formed by two principal eigen
vectors by vector-vector multiplication
 Projection into the new feature space :
 Creating a Data Frame having 1st principal & 2nd Principal components.
Visualize Data after PCA
Steps for PCA
 Standardize the Data.
 Calculate the covariance matrix.
 Find the eigenvalues and eigenvectors of the covariance matrix.
 Plot the eigenvectors / principal components over the scaled data.
1) [ True or False ] PCA can be used for projecting and visualizing data in lower
dimensions.
A. TRUE
B. FALSE
2) We apply PCA on image dataset.
A. TRUE
B. FALSE
3) PCA is based on variance maximization and distance minimization.
A. TRUE
B. FALSE
 Implement PCA for number of components = 3 and then visualize data, also load
iris dataset and perform same task
Assessment and Evaluation
Ans:1-A,2-A,3-A
For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data-
Set/blob/master/PCA_BOston.ipynb

Weitere ähnliche Inhalte

Was ist angesagt?

Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 

Was ist angesagt? (20)

PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
07 regularization
07 regularization07 regularization
07 regularization
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
SIFT
SIFTSIFT
SIFT
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 

Ähnlich wie Implement principal component analysis (PCA) in python from scratch

Image recogonization
Image recogonizationImage recogonization
Image recogonization
SANTOSH RATH
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURF
Eric Jansen
 
Kulum alin-11 jan2014
Kulum alin-11 jan2014Kulum alin-11 jan2014
Kulum alin-11 jan2014
rolly purnomo
 

Ähnlich wie Implement principal component analysis (PCA) in python from scratch (20)

ML Lab.docx
ML Lab.docxML Lab.docx
ML Lab.docx
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
Practical --1.pdf
Practical --1.pdfPractical --1.pdf
Practical --1.pdf
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
 
Unit3_1.pptx
Unit3_1.pptxUnit3_1.pptx
Unit3_1.pptx
 
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PCA-SIFT: A More Distinctive Representation for Local Image DescriptorsPCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
 
Image recogonization
Image recogonizationImage recogonization
Image recogonization
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURF
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
pca.pdf
pca.pdfpca.pdf
pca.pdf
 
Kulum alin-11 jan2014
Kulum alin-11 jan2014Kulum alin-11 jan2014
Kulum alin-11 jan2014
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
search engine for images
search engine for imagessearch engine for images
search engine for images
 

Kürzlich hochgeladen

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 

Kürzlich hochgeladen (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Implement principal component analysis (PCA) in python from scratch

  • 1. Presented by Eshan Agarwal Implement Principal Component Analysis(PCA) in python
  • 2. How do we choose the right features ? Given a classification problem ….
  • 3.  PCA is a method for reducing the dimensionality of data.  It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, while retaining the essence of the original data. An PCA Xn km Introduction to PCA
  • 4. In this presentation, we will discover the PCA method for dimensionality reduction and how to implement it from scratch in Python.  Before go in deep of PCA let us understand some key points of PCA
  • 5.  Variance  The variance of each variable is the average squared deviation of its n values around the mean of that variable. It can also think of as spread of data points. Geometric Rationale of PCA
  • 6.  Covariance Covariance of variables i and j Sum over all n objects Value of variable i in object m Mean of variable i Value of variable j in object m Mean of variable j  Degree to which the variables are linearly correlated is represented by their covariances. Geometric Rationale of PCA
  • 7. Objective of PCA  Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions (principal axes)  PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance .... , and axis p has the lowest variance
  • 8. Implement PCA in Python (Scratch)  Load the Data-Set :  We can use Boston Housing dataset for PCA. Boston dataset has 13 features. So question here is how to visualize the data ?. We can reduce the dimensions of data by using PCA and then visualize.
  • 9.  Standardize data:  PCA is largely affected by scales and different features might have different scales. So it is better to standardize data before finding PCA components. Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
  • 10. The Algebra of PCA  Calculating PCA involves following steps: a. Calculating the covariance matrix. b. Calculating the eigenvalues and eigenvector. c. Forming Principal Components. d. Projection into the new feature space. a b dc+ + ++ =
  • 11.  Calculating the covariance matrix (S) :  Covariance matrix is a matrix of variances and covariances (or correlations) among every pair of the m variables .  It is square, symmetric matrix.  Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function in python.
  • 12. Calculating the eigenvalues and eigenvector :  ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic equation: det( ƛ*I - A ) = 0 Where, I is the identity matrix of the same dimension as X.  The sum of all m eigenvalues equals the trace of S (the sum of the variances of the original variables).
  • 13.  For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by solving : ( ƛ*I - A )v = 0  The eigenvalues, 1, 2, ... m are the variances of the coordinates on each principal component axis. Calculating the eigenvalues and eigenvector :
  • 14.  We are using scipy.linalg, which have eigh function for finding the top eigen- values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow. Code for finding eigenvalues and eigenvector :
  • 15. Forming Principal Components :  Below is code for forming principal components, formed by two principal eigen vectors by vector-vector multiplication
  • 16.  Projection into the new feature space :  Creating a Data Frame having 1st principal & 2nd Principal components.
  • 18. Steps for PCA  Standardize the Data.  Calculate the covariance matrix.  Find the eigenvalues and eigenvectors of the covariance matrix.  Plot the eigenvectors / principal components over the scaled data.
  • 19. 1) [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE 2) We apply PCA on image dataset. A. TRUE B. FALSE 3) PCA is based on variance maximization and distance minimization. A. TRUE B. FALSE  Implement PCA for number of components = 3 and then visualize data, also load iris dataset and perform same task Assessment and Evaluation Ans:1-A,2-A,3-A
  • 20. For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data- Set/blob/master/PCA_BOston.ipynb

Hinweis der Redaktion

  1. How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them. Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.
  2. Lesson descriptions should be brief.
  3. Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.