SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Principal Component Analysis (PCA) applied to images
Outline of the lecture:

Principal components, informal idea.

Needed linear algebra.

Least-squares approximation. 
PCA derivation, PCA for images.

Drawbacks. Interesting behaviors live in manifolds.

Subspace methods, LDA, CCA, . . .
2/27
PCA, the instance of the eigen-analysis

PCA seeks to represent observations (or signals, images, and general data) in a form that
enhances the mutual independence of contributory components.

One observation is assumed to be a point in a p-dimensional linear space.

This linear space has some ‘natural’ orthogonal basis vectors. It is of advantage to express
observation as a linear combination with regards to this ‘natural’ base (given by eigen-vectors
as we will see later).

PCA is mathematically defined as an orthogonal linear transformation that transforms the
data to a new coordinate system such that the greatest variance by some projection of the
data comes to lie on the first coordinate (called the first principal component), the second
greatest variance on the second coordinate, and so on.
3/27
Geometric rationale of PCA
PCA objective is to rotate rigidly the coordinate
axes of the p-dimensional linear space to new
‘natural’ positions (principal axes) such that:

Coordinate axes are ordered such that
principal axis 1 corresponds to the highest
variance in data, axis 2 has the next highest
variance, . . . , and axis p has the lowest
variance.

The covariance among each pair of principal
axes is zero, i.e. they are uncorrelated.
4/27
Geometric motivation, principal components (1)

Two-dimensional vector space of observations, (x1, x2).

Each observation corresponds to a single point in the
vector space.

The goal:
Find another basis of the vector space, which treats
variations of data better.

We will see later:
Data points (observations) are represented in a rotated
orthogonal coordinate system. The origin is the mean of the
data points and the axes are provided by the eigenvectors.
5/27
Geometric motivation, principal components (2)

Assume a single straight line approximating best the
observation in the (total) least-square sense, i.e. by
minimizing the sum of perpendicular distances between
data points and the line.

The first principal direction (component) is the direction of
this line. Let it be a new basis vector z1.

The second principal direction (component, basis vector)
z2 is a direction perpendicular to z1 and minimizing the
distances to data points to a corresponding straight line.

For higher dimensional observation spaces, this
construction is repeated.
6/27
Eigen-values, eigen-vectors of matrices

Assume a finite-dimensional vector space and a square n × n regular matrix A.

Eigen-vectors are solutions of the eigen-equation A v = λ v, where a (column) eigen-vector v
is one of matrix A eigen-vectors and λ is one of eigen-values (which may be complex). The
matrix A has n eigen-values λi and n eigen-vectors vi, i = 1, . . . , n.

Let us derive: A v = λ v ⇒ A v − λ v = 0 ⇒ (A − λ I) v = 0. Matrix I is the identity
matrix. The equation (A − λ I) v = 0 has the non-zero solution v if and only if
det(A − λ I) = 0.

The polynomial det(A − λ I) is called the characteristic polynomial of the matrix A. The
fundamental theorem of algebra implies that the characteristic polynomial can be factored,
i.e. det(A − λ I) = 0 = (λ1 − λ)(λ2 − λ) . . . (λn − λ).

Eigen-values λi are not necessarily distinct. Multiple eigen-values arise from multiple roots of
the characteristic polynomial.
7/27
Deterministic view first, statistical view later

We start reviewing eigen-analysis from a deterministic, linear algebra standpoint.

Later, we will develop a statistical view based on covariance matrices and principal component
analysis.
8/27
A system of linear equations, a reminder

A system of linear equations can be expressed in a matrix form as Ax = b, where A is the
matrix of the system.
Example:
x + 3y − 2z = 5
3x + 5y + 6z = 7
2x + 4y + 3z = 8



=⇒ A =


1 3 −2
3 5 6
2 4 3

 , b =


5
7
8

 .

The augmented matrix of the system is created by concatenating a column vector b to the
matrix A, i.e., [A|b].
Example: [A|b] =


1 3 −2
3 5 6
2 4 3
5
7
8

 .

This system has a solution if and only if the rank of the matrix A is equal to the rank of the
extended matrix [A|b]. The solution is unique if the rank of matrix (A) equals to the number
of unknowns or equivalently null(A) = A.
9/27
Similarity transformations of a matrix

Let A be a regular matrix.

Matrices A and B with real or complex entries are called similar if there exists an invertible
square matrix P such that P−1
A P = B.

Matrix P is called the change of basis matrix.

The similarity transformation refers to a matrix transformation that results in similar matrices.

Similar matrices have useful properties: they have the same rank, determinant, trace,
characteristic polynomial, minimal polynomial and eigen-values (but not necessarily the same
eigen-vectors).

Similarity transformations allow us to express regular matrices in several useful forms, e.g.,
Jordan canonical form, Frobenius normal form (called also rational canonical form).
10/27
Jordan canonical form of a matrix

Any complex square matrix is similar to a matrix in the Jordan canonical form


J1 0
...
0 Jp

 , where Ji are Jordan blocks




λi 1 0
0 λi
... 0
0 0 ... 1
0 0 λi



 ,
in which λi are the multiple eigen-values.

The multiplicity of the eigen-value gives the size of the Jordan block.

If the eigen-value is not multiple then the Jordan block degenerates to the eigen-value itself.
11/27
Least-square approximation

Assume that abundant data comes from many observations or measurements. This case is
very common in practice.

We intent to approximate the data by a linear model - a system of linear equations, e.g., a
straight line in particular.

Strictly speaking, the observations are likely to be in a contradiction with respect to the
system of linear equations.

In the deterministic world, the conclusion would be that the system of linear equations has no
solution.

There is an interest in finding the solution to the system, which is in some sense ‘closest’ to
the observations, perhaps compensating for noise in observations.

We will usually adopt a statistical approach by minimizing the least square error.
12/27
Principal component analysis, introduction

PCA is a powerful and widely used linear technique in statistics, signal processing, image
processing, and elsewhere.

Several names: the (discrete) Karhunen-Loève transform (KLT, after Kari Karhunen,
1915-1992 and Michael Loève, 1907-1979) or the Hotelling transform (after Harold Hotelling,
1895-1973). Invented by Pearson (1901) and H. Hotelling (1933).

In statistics, PCA is a method for simplifying a multidimensional dataset to lower dimensions
for analysis, visualization or data compression.

PCA represents the data in a new coordinate system in which basis vectors follow modes of
greatest variance in the data.

Thus, new basis vectors are calculated for the particular data set.

The price to be paid for PCA’s flexibility is in higher computational requirements as compared
to, e.g., the fast Fourier transform.
13/27
Derivation, M-dimensional case (1)

Suppose a data set comprising N observations, each of M variables (dimensions). Usually
N  M.

The aim: to reduce the dimensionality of the data so that each observation can be usefully
represented with only L variables, 1 ≤ L  M.

Data are arranged as a set of N column data vectors, each representing a single observation
of M variables: the n-th observations is a column vector xn = (x1, . . . , xM)
,
n = 1, . . . , N.

We thus have an M × N data matrix X. Such matrices are often huge because N may be
very large: this is in fact good, since many observations imply better statistics.
14/27
Data normalization is needed first

This procedure is not applied to the raw data R but to normalized data X as follows.

The raw observed data is arranged in a matrix R and the empirical mean is calculated along
each row of R. The result is stored in a vector u the elements of which are scalars
u(m) =
1
N
N
X
n=1
R(m, n) , where m = 1, . . . , M .

The empirical mean is subtracted from each column of R: if e is a unitary vector of size N
(consisting of ones only), we will write
X = R − u e .
15/27
Derivation, M-dimensional case (2)
If we approximate higher dimensional space X (of dimension M) by the lower dimensional matrix
Y (of dimension L) then the mean square error ε2
of this approximation is given by
ε2
=
1
N
N
X
n=1
|xn|2
−
L
X
i=1
b
i
1
N
N
X
n=1
xn x
n
!
bi ,
where bi, i = 1, . . . , L are basis vector of the linear space of dimension L.
If ε2
has to be minimal then the following term has to be maximal
L
X
i=1
b
i cov(x) bi , where cov(x) =
1
N
N
X
n=1
xn x
n ,
is the covariance matrix.
16/27
Approximation error

The covariance matrix cov(x) has special properties: it is real, symmetric and positive
semi-definite.

So the covariance matrix can be guaranteed to have real eigen-values.

Matrix theory tells us that these eigen-values may be sorted (largest to smallest) and the
associated eigen-vectors taken as the basis vectors that provide the maximum we seek.

In the data approximation, dimensions corresponding to the smallest eigen-values are omitted.
The mean square error ε2
is given by
ε2
= trace cov(x)

−
L
X
i=1
λi =
M
X
i=L+1
λi ,
where trace(A) is the trace—sum of the diagonal elements—of the matrix A. The trace
equals the sum of all eigenvalues.
17/27
Can we use PCA for images?

It took a while to realize (Turk, Pentland, 1991), but yes.

Let us consider a 321 × 261 image.

The image is considered as a very long 1D vector by concatenating image pixels column by
column (or alternatively row by row), i.e. 321 × 261 = 83781.

The huge number 83781 is the dimensionality of our vector space.

The intensity variation is assumed in each pixel of the image.
18/27
What if we have 32 instances of images?
19/27
Fewer observations than unknowns, and what?

We have only 32 observations and 83781 unknowns in our example!

The induced system of linear equations is not over-constrained but under-constrained.

PCA is still applicable.

The number of principle components is less than or equal to the number of observations
available (32 in our particular case). This is because the (square) covariance matrix has a size
corresponding to the number of observations.

The eigen-vectors we derive are called eigen-images, after rearranging back from the 1D
vector to a rectangular image.

Let us perform the dimensionality reduction from 32 to 4 in our example.
20/27
PCA, graphical illustration
... ...
...
~
~
one image one basis
vector
one PCA
represented image
}
}
}
data matrix
N observed images L basis vectors
PCA repesentation
of images
N
21/27
Approximation by 4 principal components only

Reconstruction of the image from four basis vectors bi, i = 1, . . . , 4 which can be displayed
as images by rearranging the (long) vector back to the matrix form.

The linear combination was computed as q1b1 + q2b2 + q3b3 + q4b4 = 0.078 b1 +
0.062 b2 − 0.182 b3 + 0.179 b4.

The mean value of images subtracted when data were normalized earlier has to be added, cf.
slide 14.
= q1 + q2 + q4
+ q3

Weitere ähnliche Inhalte

Was ist angesagt?

Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Al
ramiljayureta
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
Förderverein Technische Fakultät
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization
csandit
 
On the lambert w function
On the lambert w functionOn the lambert w function
On the lambert w function
TrungKienVu3
 
GraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaperGraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaper
Chiraz Nafouki
 
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Nimai Chand Das Adhikari
 

Was ist angesagt? (17)

Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Sparse codes for natural images
Sparse codes for natural imagesSparse codes for natural images
Sparse codes for natural images
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Al
 
Exact network reconstruction from consensus signals and one eigen value
Exact network reconstruction from consensus signals and one eigen valueExact network reconstruction from consensus signals and one eigen value
Exact network reconstruction from consensus signals and one eigen value
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization
 
Network topology
Network topologyNetwork topology
Network topology
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
 
On the lambert w function
On the lambert w functionOn the lambert w function
On the lambert w function
 
GraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaperGraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaper
 
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
 
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
 

Ähnlich wie Principal component analysis-- UNIT-IV.pdf

Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeksBeginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
JinTaek Seo
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
nozomuhamada
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3
Mintu246
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
Mintu246
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
RohanBorgalli
 
20070823
2007082320070823
20070823
neostar
 

Ähnlich wie Principal component analysis-- UNIT-IV.pdf (20)

Linear_Algebra_final.pdf
Linear_Algebra_final.pdfLinear_Algebra_final.pdf
Linear_Algebra_final.pdf
 
Tutorial on EM algorithm – Part 1
Tutorial on EM algorithm – Part 1Tutorial on EM algorithm – Part 1
Tutorial on EM algorithm – Part 1
 
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeksBeginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
Beginning direct3d gameprogrammingmath05_matrices_20160515_jintaeks
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
EE8120_Projecte_15
EE8120_Projecte_15EE8120_Projecte_15
EE8120_Projecte_15
 
project report(1)
project report(1)project report(1)
project report(1)
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Linear Algebra and its use in finance:
Linear Algebra and its use in finance:Linear Algebra and its use in finance:
Linear Algebra and its use in finance:
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
20070823
2007082320070823
20070823
 
simpl_nie_engl
simpl_nie_englsimpl_nie_engl
simpl_nie_engl
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 
Cocentroidal and Isogonal Structures and Their Matricinal Forms, Procedures a...
Cocentroidal and Isogonal Structures and Their Matricinal Forms, Procedures a...Cocentroidal and Isogonal Structures and Their Matricinal Forms, Procedures a...
Cocentroidal and Isogonal Structures and Their Matricinal Forms, Procedures a...
 
Tim Maudlin: New Foundations for Physical Geometry
Tim Maudlin: New Foundations for Physical GeometryTim Maudlin: New Foundations for Physical Geometry
Tim Maudlin: New Foundations for Physical Geometry
 
Linear Algebra
Linear AlgebraLinear Algebra
Linear Algebra
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural Networks
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
Using Mathematical Foundations To Study The Equivalence Between Mass And Ener...
Using Mathematical Foundations To Study The Equivalence Between Mass And Ener...Using Mathematical Foundations To Study The Equivalence Between Mass And Ener...
Using Mathematical Foundations To Study The Equivalence Between Mass And Ener...
 

Kürzlich hochgeladen

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Kürzlich hochgeladen (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 

Principal component analysis-- UNIT-IV.pdf

  • 1. Principal Component Analysis (PCA) applied to images Outline of the lecture: Principal components, informal idea. Needed linear algebra. Least-squares approximation. PCA derivation, PCA for images. Drawbacks. Interesting behaviors live in manifolds. Subspace methods, LDA, CCA, . . .
  • 2. 2/27 PCA, the instance of the eigen-analysis PCA seeks to represent observations (or signals, images, and general data) in a form that enhances the mutual independence of contributory components. One observation is assumed to be a point in a p-dimensional linear space. This linear space has some ‘natural’ orthogonal basis vectors. It is of advantage to express observation as a linear combination with regards to this ‘natural’ base (given by eigen-vectors as we will see later). PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
  • 3. 3/27 Geometric rationale of PCA PCA objective is to rotate rigidly the coordinate axes of the p-dimensional linear space to new ‘natural’ positions (principal axes) such that: Coordinate axes are ordered such that principal axis 1 corresponds to the highest variance in data, axis 2 has the next highest variance, . . . , and axis p has the lowest variance. The covariance among each pair of principal axes is zero, i.e. they are uncorrelated.
  • 4. 4/27 Geometric motivation, principal components (1) Two-dimensional vector space of observations, (x1, x2). Each observation corresponds to a single point in the vector space. The goal: Find another basis of the vector space, which treats variations of data better. We will see later: Data points (observations) are represented in a rotated orthogonal coordinate system. The origin is the mean of the data points and the axes are provided by the eigenvectors.
  • 5. 5/27 Geometric motivation, principal components (2) Assume a single straight line approximating best the observation in the (total) least-square sense, i.e. by minimizing the sum of perpendicular distances between data points and the line. The first principal direction (component) is the direction of this line. Let it be a new basis vector z1. The second principal direction (component, basis vector) z2 is a direction perpendicular to z1 and minimizing the distances to data points to a corresponding straight line. For higher dimensional observation spaces, this construction is repeated.
  • 6. 6/27 Eigen-values, eigen-vectors of matrices Assume a finite-dimensional vector space and a square n × n regular matrix A. Eigen-vectors are solutions of the eigen-equation A v = λ v, where a (column) eigen-vector v is one of matrix A eigen-vectors and λ is one of eigen-values (which may be complex). The matrix A has n eigen-values λi and n eigen-vectors vi, i = 1, . . . , n. Let us derive: A v = λ v ⇒ A v − λ v = 0 ⇒ (A − λ I) v = 0. Matrix I is the identity matrix. The equation (A − λ I) v = 0 has the non-zero solution v if and only if det(A − λ I) = 0. The polynomial det(A − λ I) is called the characteristic polynomial of the matrix A. The fundamental theorem of algebra implies that the characteristic polynomial can be factored, i.e. det(A − λ I) = 0 = (λ1 − λ)(λ2 − λ) . . . (λn − λ). Eigen-values λi are not necessarily distinct. Multiple eigen-values arise from multiple roots of the characteristic polynomial.
  • 7. 7/27 Deterministic view first, statistical view later We start reviewing eigen-analysis from a deterministic, linear algebra standpoint. Later, we will develop a statistical view based on covariance matrices and principal component analysis.
  • 8. 8/27 A system of linear equations, a reminder A system of linear equations can be expressed in a matrix form as Ax = b, where A is the matrix of the system. Example: x + 3y − 2z = 5 3x + 5y + 6z = 7 2x + 4y + 3z = 8    =⇒ A =   1 3 −2 3 5 6 2 4 3   , b =   5 7 8   . The augmented matrix of the system is created by concatenating a column vector b to the matrix A, i.e., [A|b]. Example: [A|b] =   1 3 −2 3 5 6 2 4 3
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. 5 7 8   . This system has a solution if and only if the rank of the matrix A is equal to the rank of the extended matrix [A|b]. The solution is unique if the rank of matrix (A) equals to the number of unknowns or equivalently null(A) = A.
  • 15. 9/27 Similarity transformations of a matrix Let A be a regular matrix. Matrices A and B with real or complex entries are called similar if there exists an invertible square matrix P such that P−1 A P = B. Matrix P is called the change of basis matrix. The similarity transformation refers to a matrix transformation that results in similar matrices. Similar matrices have useful properties: they have the same rank, determinant, trace, characteristic polynomial, minimal polynomial and eigen-values (but not necessarily the same eigen-vectors). Similarity transformations allow us to express regular matrices in several useful forms, e.g., Jordan canonical form, Frobenius normal form (called also rational canonical form).
  • 16. 10/27 Jordan canonical form of a matrix Any complex square matrix is similar to a matrix in the Jordan canonical form   J1 0 ... 0 Jp   , where Ji are Jordan blocks     λi 1 0 0 λi ... 0 0 0 ... 1 0 0 λi     , in which λi are the multiple eigen-values. The multiplicity of the eigen-value gives the size of the Jordan block. If the eigen-value is not multiple then the Jordan block degenerates to the eigen-value itself.
  • 17. 11/27 Least-square approximation Assume that abundant data comes from many observations or measurements. This case is very common in practice. We intent to approximate the data by a linear model - a system of linear equations, e.g., a straight line in particular. Strictly speaking, the observations are likely to be in a contradiction with respect to the system of linear equations. In the deterministic world, the conclusion would be that the system of linear equations has no solution. There is an interest in finding the solution to the system, which is in some sense ‘closest’ to the observations, perhaps compensating for noise in observations. We will usually adopt a statistical approach by minimizing the least square error.
  • 18. 12/27 Principal component analysis, introduction PCA is a powerful and widely used linear technique in statistics, signal processing, image processing, and elsewhere. Several names: the (discrete) Karhunen-Loève transform (KLT, after Kari Karhunen, 1915-1992 and Michael Loève, 1907-1979) or the Hotelling transform (after Harold Hotelling, 1895-1973). Invented by Pearson (1901) and H. Hotelling (1933). In statistics, PCA is a method for simplifying a multidimensional dataset to lower dimensions for analysis, visualization or data compression. PCA represents the data in a new coordinate system in which basis vectors follow modes of greatest variance in the data. Thus, new basis vectors are calculated for the particular data set. The price to be paid for PCA’s flexibility is in higher computational requirements as compared to, e.g., the fast Fourier transform.
  • 19. 13/27 Derivation, M-dimensional case (1) Suppose a data set comprising N observations, each of M variables (dimensions). Usually N M. The aim: to reduce the dimensionality of the data so that each observation can be usefully represented with only L variables, 1 ≤ L M. Data are arranged as a set of N column data vectors, each representing a single observation of M variables: the n-th observations is a column vector xn = (x1, . . . , xM) , n = 1, . . . , N. We thus have an M × N data matrix X. Such matrices are often huge because N may be very large: this is in fact good, since many observations imply better statistics.
  • 20. 14/27 Data normalization is needed first This procedure is not applied to the raw data R but to normalized data X as follows. The raw observed data is arranged in a matrix R and the empirical mean is calculated along each row of R. The result is stored in a vector u the elements of which are scalars u(m) = 1 N N X n=1 R(m, n) , where m = 1, . . . , M . The empirical mean is subtracted from each column of R: if e is a unitary vector of size N (consisting of ones only), we will write X = R − u e .
  • 21. 15/27 Derivation, M-dimensional case (2) If we approximate higher dimensional space X (of dimension M) by the lower dimensional matrix Y (of dimension L) then the mean square error ε2 of this approximation is given by ε2 = 1 N N X n=1 |xn|2 − L X i=1 b i 1 N N X n=1 xn x n ! bi , where bi, i = 1, . . . , L are basis vector of the linear space of dimension L. If ε2 has to be minimal then the following term has to be maximal L X i=1 b i cov(x) bi , where cov(x) = 1 N N X n=1 xn x n , is the covariance matrix.
  • 22. 16/27 Approximation error The covariance matrix cov(x) has special properties: it is real, symmetric and positive semi-definite. So the covariance matrix can be guaranteed to have real eigen-values. Matrix theory tells us that these eigen-values may be sorted (largest to smallest) and the associated eigen-vectors taken as the basis vectors that provide the maximum we seek. In the data approximation, dimensions corresponding to the smallest eigen-values are omitted. The mean square error ε2 is given by ε2 = trace cov(x) − L X i=1 λi = M X i=L+1 λi , where trace(A) is the trace—sum of the diagonal elements—of the matrix A. The trace equals the sum of all eigenvalues.
  • 23. 17/27 Can we use PCA for images? It took a while to realize (Turk, Pentland, 1991), but yes. Let us consider a 321 × 261 image. The image is considered as a very long 1D vector by concatenating image pixels column by column (or alternatively row by row), i.e. 321 × 261 = 83781. The huge number 83781 is the dimensionality of our vector space. The intensity variation is assumed in each pixel of the image.
  • 24. 18/27 What if we have 32 instances of images?
  • 25. 19/27 Fewer observations than unknowns, and what? We have only 32 observations and 83781 unknowns in our example! The induced system of linear equations is not over-constrained but under-constrained. PCA is still applicable. The number of principle components is less than or equal to the number of observations available (32 in our particular case). This is because the (square) covariance matrix has a size corresponding to the number of observations. The eigen-vectors we derive are called eigen-images, after rearranging back from the 1D vector to a rectangular image. Let us perform the dimensionality reduction from 32 to 4 in our example.
  • 26. 20/27 PCA, graphical illustration ... ... ... ~ ~ one image one basis vector one PCA represented image } } } data matrix N observed images L basis vectors PCA repesentation of images N
  • 27. 21/27 Approximation by 4 principal components only Reconstruction of the image from four basis vectors bi, i = 1, . . . , 4 which can be displayed as images by rearranging the (long) vector back to the matrix form. The linear combination was computed as q1b1 + q2b2 + q3b3 + q4b4 = 0.078 b1 + 0.062 b2 − 0.182 b3 + 0.179 b4. The mean value of images subtracted when data were normalized earlier has to be added, cf. slide 14. = q1 + q2 + q4 + q3
  • 30. 24/27 PCA drawbacks, the images case By rearranging pixels column by column to a 1D vector, relations of a given pixel to pixels in neighboring rows are not taken into account. Another disadvantage is in the global nature of the representation; small change or error in the input images influences the whole eigen-representation. However, this property is inherent in all linear integral transforms.
  • 31. 25/27 Data (images) representations Reconstructive (also generative) representation Enables (partial) reconstruction of input images (hallucinations). It is general. It is not tuned for a specific task. Enables closing the feedback loop, i.e. bidirectional processing. Discriminative representation Does not allow partial reconstruction. Less general. A particular task specific. Stores only information needed for the decision task.
  • 32. 26/27 Dimensionality issues, low-dimensional manifolds Images, as we saw, lead to enormous dimensionality. The data of interest often live in a much lower-dimensional subspace called the manifold. Example (courtesy Thomas Brox): The 100 × 100 image of the number 3 shifted and rotated, i.e. there are only 3 degrees of variations. All data points live in a 3-dimensional manifold of the 10,000-dimensional observation space. The difficulty of the task is to find out empirically from the data in which manifold the data vary.
  • 33. 27/27 Subspace methods Subspace methods explore the fact that data (images) can be represented in a subspace of the original vector space in which data live. Different methods examples: Method (abbreviation) Key property Principal Component Analysis (PCA) reconstructive, unsupervised, optimal reconstruction, mini- mizes squared reconstruction error, maximizes variance of projected input vectors Linear Discriminative Analysis (LDA) discriminative, supervised, optimal separation, maximizes distance between projection vectors Canonical Correlation Analysis (CCA) supervised, optimal correlation, motivated by regression task, e.g. robot localization Independent Component Analysis (ICA) independent factors Non-negative matrix factorization (NMF) non-negative factors Kernel methods for nonlinear extension local straightening by kernel functions