1. Because of the study of 2DLDA based on LDA, I want to investigate that if
there exists a similar algorithm 2DPCA based on PCA. Hence, I found the
relevant papers to explore it. In this study, I want to introduce the
algorithm 2DPCA and compare it with PCA as well as 2DLDA.
The dataset I used is ORL Database of Faces, it contains images from 40
individuals, each providing 10 different images with size of 112×92. In
this project, I use first six image samples per class for training, and the
remaining images for test.
I compared the recognition accuracy of those algorithms with knn
classifier in term of test error rates.
About this study
What is 2DPCA?
Figure 1: 2DPCA + knn
Experiment I: 2DPCA +knn
Figure 1: error rate versus dimensions
Experiment III: 2DLDA vs (2D)²PCA
Conclusions
Although the k of knn showed in experiment II,III and IV is 1, I also
attempted k from 1:10. As a result, k=1 is the best choice.
Two kinds of 2DPCA(compute different directions )
In this study, image is a 112×92 matrix, the difference of two
directional length is small, so I get the similar results by these
two methods. And I guess, if the difference of two directional
length is big, there will be a greater difference between these
two results.
PCA vs (2D)²PCA
Obviously, (2D)²PCA performs better than PCA with the same
dimension. It’s surely faster and get lower error rates.
2DPCA vs (2D)²PCA
Although I never directly compare these two methods, we all
can see the advantages of (2D)²PCA. Compared with 2DPCA,
the only disadvantage of (2D)²PCA is computing two sets of
projection axes, but the result of this step simplifies the
following classification.
2DLDA vs (2D)²PCA
Both of the results of two methods are not bad , and they
consume similar time. But (2D)²PCA always a little better than
2DLDA.
References
[1]Yang J, Zhang D, Frangi A F, et al. Two-dimensional PCA: a new
approach to appearance-based face representation and
recognition[J].Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 2004, 26(1):131-137.
[2]Zhang D, Zhou Z H. (2D)2PCA: two-directional two-dimensional PCA
for efficient face representation and recognition [J]. Neurocomputing,
2005, 69(1~3):224-231.
[3] ORL Database of Faces (ORL):
Source: http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
As opposed to PCA, the original 2DPCA (Two-Dimension PCA, proposed
by Yang J[1] ) is based on 2D image matrices rather than 1D vectors so the
image matrix does not need to be transformed into a vector prior to feature
extraction. Instead, an image covariance matrix is constructed directly
using the original image matrices, and its eigenvectors are derived for
image feature extraction.
As for the dataset used in this study, each image has 10304 digits,
computing the eigenvectors of this size covariance matrix is very time-
consuming. But 2DPCA directly computes eigenvectors of the so-called
image covariance matrix without matrix-to-vector conversion. Because the
size of the image covariance matrix is equal to the width of images, which
is quite small compared with the size of a covariance matrix in PCA,
2DPCA evaluates the image covariance matrix more accurately and
computes the corresponding eigenvectors more efficiently than PCA.
Similarly, an alternative 2DPCA (proposed by Zhang D[2]) computes the
size of the image covariance matrix is equal to the height of images. The
only difference of these two kinds of 2DPCA is direction used to compute
image covariance matrix.
San Jose State University
Math285 Final Project
By Yijun Zhou
Face Discrimination by 2DPCA
Algorithm for 2DPCA
Let X denote an n-dimensional unitary column vector., and denote image
A, an m×n random matrix. Project A onto X and get .
The trace of the covariance matrix Sx of the Y is defined as the total
scatter :
And,
So,
Let us define the following matrix:
The matrix Gt is called the image covariance (scatter) matrix. Gt is an n×n
nonnegative definite matrix.
Suppose that there are M training image samples in total, the jth training
image is denoted by an m×n matrix , and the average
image of all training samples is denoted by .
Then,
Alternatively, ,is called the generalized total scatter
criterion. X is a unitary column vector, and the unitary vector X that
maximizes the criterion is called the optimal projection axis. It means that
the total scatter of the projected samples is maximized after the projection
of an image matrix onto X.
To select a set of projection axes, X1,… ,Xd,
In fact, the optimal projection axes, X1,… ,Xd, are the orthonormal
eigenvectors of Gt corresponding to the first d largest eigenvalues.
,
Experiment II: PCA vs 2DPCA vs (2D)²PCA
Table 1: Comparisons of three methods (+knn , k=1)
Experiment IV: Comparison of time consumed
Table 1 : comparisons of five methods(+knn) on ORL database
An improved algorithm - (2D)²PCA
(2D)²PCA (Two-Directional Two-Dimensional PCA, proposed by Zhang
D[2]) is the improvement of 2DPCA.
As for an m×n image matrix A, the aim of 2DPCA is to find a n × d
matrix X to project A onto , and the aim of another directional 2DPCA is
to find an m ×d2 matrix X2 to project A onto. (2D)²PCA is the
combination of these two method.
In (2D)²PCA ,projected vector Y is computed by the following matrix:
So, the size of matrix Y is d2×d.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
e
r
r
o
r
r
a
t
e
d
k=1
k=2
k=3
k=4
k=5
k=6
k=7
k=8
k=9
k=10
The d in figure 1 represents that the dimensions of feature vectors are d ×
112, and the d in figure 2 1 represents that the dimensions of feature
vectors are d × 92.
These ten curves have the same trend, the values when k is 1 are smallest
and them are better than others obviously, I think it’s because of the small
number of train set. And I choose k=1 for the other experiments.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
e
r
r
o
r
r
a
t
e
d
2DPCA for weight
2DPCA for height
Figure 2: two kinds of 2DPCA + knn(k=1)
Method Error rate Dimensions Time(s)
PCA 0.0375 47 4.2
2DPCA 0.0375 12×112 2.9
Alternative 2DPCA 0.0375 12×92 2.8
(2D)²PCA 0.0313 12×12 2.7
2DLDA 0.0375 12×11 3.2
Method PCA 2DPCA (2D)²PCA
error rate
(dimensio-
ns)
0.0875(9) 0.0938(1×112) 0.0688(3×3)
0.0688(16) 0.0563(2×112) 0.0438(4×4)
0.0563(25) 0.0438(3×112) 0.0375(5×5)
0.0438(36) 0.0438(4×112) 0.0438(6×6)
0.0500(49) 0.0438(5×112) 0.0250(7×7)
0.0500(64) 0.0313(6×112) 0.0313(8×8)
0.0625(81) 0.0313(7×112) 0.0375(9×9)
0.0688(100) 0.0313(8×112) 0.0313(10×10)
0.0750(121) 0.0438(9×112) 0.0375(11×11)
0.0750(144) 0.0375(10×112) 0.0313(12×12)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
9 16 25 36 49 64 81 100 121 144
e
r
r
o
r
r
a
t
e
dimensions
PCA
(2D)²PCA
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
4*4 5*5 6*6 7*7 8*8 9*9 10*10 11*11 12*12
e
r
r
o
r
r
a
t
e
dimensions
(2D)²PCA
2DLDA
Figure 1: error rate versus dimensions
Examples of wrong Discrimination
Original
person
Identified
person
At first, I want to show some examples of wrong Discrimination. To be
honest, it’s hard for me to distinguish these three sets of people through
eyesight without any tips. So I think the mistakes of 2DPCA is acceptable.
As can be seen from fig. 2, these two 2DPCA have similar effect.
These two methods get the same smallest error rate that is 0.025. But,
obviously, it can be seen that (2D)²PCA has a better performance than
2DLDA from the trend of curves in fig.1.
Because the numbers of dimension in three different methods are not
unified, I display the error rate in table 1 and only plot two curves instead
of three in figure 1.
In this experiment, I select different dimensions for different method based
on the similar error rate. And the time consumed by these five methods is
displayed in table 1.
Although these values of time may be not accurate due to my code, it can
also reflect the differences of the five methods in some degree.
CPU: Intel(i3) 2.13Ghz RAM: 2GB
xJ X tr S
Y AX
T
x
T
T
S E Y EY Y EY
E AX E AX AX E AX
E A EA X A EA X
TT
xtr S X E A EA A EA X
T
tG E A EA A EA
1,2,...,jA j M
A
1
1 M
T
t j j
j
G A A A A
M
T
tJ X X G X
1,..., arg max
0, , , 1,...,
d
T
i j
X X J X
X X i j i j d
2
T
m d m n n dY X A X