2. How do we choose the
right features ?
Given a
classification
problem ….
3. PCA is a method for reducing the dimensionality of data.
It can be thought of as a projection method where data with m-columns
(features) is projected into a subspace with m or fewer columns, while
retaining the essence of the original data.
An PCA
Xn
km
Introduction to PCA
4. In this presentation, we will discover the PCA
method for dimensionality reduction and how to
implement it from scratch in Python.
Before go in deep of PCA let us understand
some key points of PCA
5. Variance
The variance of each variable is the average squared deviation of its
n values around the mean of that variable. It can also think of as
spread of data points.
Geometric Rationale of PCA
6. Covariance
Covariance of
variables i and j
Sum over all
n objects
Value of
variable i
in object m
Mean of
variable i
Value of
variable j
in object m
Mean of
variable j
Degree to which the variables are linearly correlated is represented by
their covariances.
Geometric Rationale of PCA
7. Objective of PCA
Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions
(principal axes)
PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next
highest variance .... , and axis p has the lowest variance
8. Implement PCA in Python (Scratch)
Load the Data-Set :
We can use Boston Housing dataset for PCA. Boston dataset has 13
features. So question here is how to visualize the data ?. We can
reduce the dimensions of data by using PCA and then visualize.
9. Standardize data:
PCA is largely affected by scales and different features might have different
scales. So it is better to standardize data before finding PCA components.
Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
10. The Algebra of PCA
Calculating PCA involves following steps:
a. Calculating the covariance matrix.
b. Calculating the eigenvalues and eigenvector.
c. Forming Principal Components.
d. Projection into the new feature space.
a b dc+ + ++ =
11. Calculating the covariance matrix (S) :
Covariance matrix is a matrix of variances and covariances (or correlations) among
every pair of the m variables .
It is square, symmetric matrix.
Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function
in python.
12. Calculating the eigenvalues and eigenvector :
ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic
equation:
det( ƛ*I - A ) = 0
Where, I is the identity matrix of the same dimension as X.
The sum of all m eigenvalues equals the trace of S (the sum of the variances of
the original variables).
13. For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by
solving :
( ƛ*I - A )v = 0
The eigenvalues, 1, 2, ... m are the variances of the coordinates
on each principal component axis.
Calculating the eigenvalues and eigenvector :
14. We are using scipy.linalg, which have eigh function for finding the top eigen-
values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow.
Code for finding eigenvalues and eigenvector :
15. Forming Principal Components :
Below is code for forming principal components, formed by two principal eigen
vectors by vector-vector multiplication
16. Projection into the new feature space :
Creating a Data Frame having 1st principal & 2nd Principal components.
18. Steps for PCA
Standardize the Data.
Calculate the covariance matrix.
Find the eigenvalues and eigenvectors of the covariance matrix.
Plot the eigenvectors / principal components over the scaled data.
19. 1) [ True or False ] PCA can be used for projecting and visualizing data in lower
dimensions.
A. TRUE
B. FALSE
2) We apply PCA on image dataset.
A. TRUE
B. FALSE
3) PCA is based on variance maximization and distance minimization.
A. TRUE
B. FALSE
Implement PCA for number of components = 3 and then visualize data, also load
iris dataset and perform same task
Assessment and Evaluation
Ans:1-A,2-A,3-A
20. For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data-
Set/blob/master/PCA_BOston.ipynb
Hinweis der Redaktion
How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them.
Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.
Lesson descriptions should be brief.
Example objectives
At the end of this lesson, you will be able to:
Save files to the team Web server.
Move files to different locations on the team Web server.
Share files on the team Web server.