SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
On The Equivalent of Low-Rank Linear Regressions and Linear Discriminant Analysis Based Regressions
Xiao Cai, Chris Ding, Feiping Nie, Heng Huang CSE Department, The University of Texas at Arlington
xiao.cai@mavs.uta.edu, chqding@uta.edu, feipingnie@gmail.com, heng@uta.edu
Problem
Multivariate linear regression attempts to model the
relationship between predictors and responses by
fitting a linear equation to observed data. Such
linear regression models suffer from the following
two deficiencies. On one hand, the linear regression
models usually have low performance for analyzing
the high-dimensional data. To perform accurate re-
gression or classification tasks on such data, we have
to collect an enormous number of samples. Howev-
er, due to the data and label collection difficulty,
we often cannot obtain enough samples and suffer
from the curse-of-dimensionality problem [1]. On
the other hand, the linear regression models don’t
emphasize the correlations among different respons-
es. Standard least squares regression is equivalent
to regressing each response on the predictors sepa-
rately.
Our Key Contributions
(1) We prove that the low-rank linear regression is e-
quivalent to doing linear regression in the LDA sub-
space.
(2) We derive global and concise algorithms for low-
rank regression models.
(3) We show the connection between low-rank re-
gression and regularized LDA. From both theory
and experiments, the low-rank ridge regression has
better performance than the low-rank linear regres-
sion, which has been used in many existing studies.
(4) To solve related feature selection problem, we
propose the sparse low-rank regression method with
exploring both classes/tasks correlations and fea-
ture structures.
Reference
[1] D.Donoho: High-dimensional data analysis:
The curses and blessings of dimensionality. AM-
S Math Challenges Lecture, pages 1-32, (2000)
[2] T. Anderson. Estimating linear retrictions on
regression coefficients for multivariate normal
distributions. AMS, pages 327-351, (1951)
Linear Low-Rank Regression And LDA + LR
Traditional Linear Regression model for classification is
to solve the following problem:
min
W
||Y − XT
W ||2
F , (1)
where X = [x1, x2, ...., xn] ∈ ℜd×n
is the centered
training data matrix and Y ∈ ℜn×k
is the normalized
class indicator matrix, i.e. Yi,j = 1/
√
nj if the i-th data
point belongs to the j-th class and Yi,j = 0 otherwise
and nj is the sample size of the j-th class.
When the class or task number is large, there are of-
ten underlying correlation structures between classes or
tasks. To incorporate the response correlations into the
regression method [2], we propose the following discrimi-
nant Low-Rank Linear Regression formulation (LRLR):
min
A,B
||Y − XT
AB||2
F , (2)
where A ∈ ℜd×s
, B ∈ ℜs×k
, s < min(n, k). Thus
W = AB has low-rank s.
Theorem 1 The low-rank linear regression method of
Eq. (2) is identical to doing standard linear regression
in LDA subspace.
Proof: Denoting J1(A, B) = ||Y −XT
AB||2
F and taking
its derivative w.r.t. B, we have,
∂J1(A, B)
∂B
= −2AT
XY + 2AT
XXT
AB. (3)
Setting Eq. (3) to zero, we obtain,
B = (AT
XXT
A)−1
AT
XY. (4)
Substituting Eq. (4) back into Eq. (2), we have,
min
A
||Y − XT
A(AT
XXT
A)−1
AT
XY ||2
F , (5)
which is equivalent to
max
A
Tr ((AT
(XXT
)A)−1
AT
XY Y T
XT
A). (6)
Note that
St = XXT
, Sb = XY Y T
XT
, (7)
where St and Sb are the total-class scatter matrix and
the between-class scatter matrix defined in the LDA,
respectively. Therefore, the solution of Eq. (6) can be
written as:
A∗
= arg max
A
Tr [(AT
StA)−1
AT
SbA], (8)
which is exactly the problem of LDA.
Two Extensions: LRRR, SLRR
Theorem 2 The proposed Low-Rank Ridge Regres-
sion (LRRR) method min
A,B
||Y −XT
AB||2
F +λ||AB||2
F
is equivalent to doing the regularized regression in
the regularized LDA subspace.
Theorem 3 The optimal solution of the proposed
SLRR method min
A,B
||Y − XT
AB||2
F + λ||AB||2,1 has
the same column space of a special regularized LDA.
Algorithms
The algorithm to LRLR or LRRR or SLRR
Input:
1. The centralized training data X ∈ ℜd×n.
2. The normalized training indicator matrix Y ∈ ℜn×k.
3. The low-rank parameter s.
4. IF LRRR or SLRR, the regularization parameter λ.
Output:
1. The matrices A ∈ ℜd×s and B ∈ ℜs×k.
Process:
IF LRLR,
Calculate A by Eq. (8)
Calculate B by Eq. (4)
ELSE IF LRRR,
Calculate A by
A∗ = arg max
A
{Tr((AT (St + λI)A)−1AT SbA)}
Calculate B by
B = (AT (XXT + λI)A)−1AT XY
ELSE IF SLRR,
Initialization:
1. Set t = 0
2. Initialize D(t) = I ∈ ℜd×d.
Repeat:
1. Calculate A(t+1)
A∗ = arg max
A
{Tr ((AT (St + λD)A)−1AT SbA)}
2. Calculate B(t+1)
B = (AT (XXT + λD)A)−1AT XY
3. Update the diagonal matrix D(t+1) ∈ ℜd×d, where the
i-th diagonal element is 1
2||(A(t+1)B(t+1))i||2
.
4. Update t = t + 1
Until Converge.
END
Experiment Data Summary
Dataset k d n
UMIST 20 10304 575
BIN36 36 320 1404
BIN26 26 320 1014
VOWEL 11 10 990
MNIST 10 784 150
JAFFE 10 1024 213
Experiment Results
The average classification accuracy V.S. the rank
using 5-fold C.V. on six datasets, low rank is
marked as red and full rank is marked as blue. Left
column: linear regression; middle column: ridge
regression; right column: sparse regression
11 12 13 14 15 16 17 18 19
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
The number of rank s
Theclassificationacc
full rank
low rank
(a) UMIST linear regression
11 12 13 14 15 16 17 18 19
0.91
0.92
0.93
0.94
0.95
0.96
0.97
The number of rank s
Theclassificationacc
full rank
low rank
(b) UMIST ridge regression
11 12 13 14 15 16 17 18 19
0.935
0.94
0.945
0.95
0.955
0.96
0.965
0.97
0.975
The number of rank s
Theclassificationacc
full rank
low rank
(c) UMIST sparse linear regression
5 6 7 8 9 10
0.29
0.291
0.292
0.293
0.294
0.295
0.296
0.297
0.298
0.299
The number of rank s
Theclassificationacc
full rank
low rank
(d) VOWEL linear regression
5 6 7 8 9 10
0.292
0.294
0.296
0.298
0.3
0.302
0.304
0.306
0.308
The number of rank s
Theclassificationacc
full rank
low rank
(e) VOWEL ridge regression
5 6 7 8 9 10
0.29
0.292
0.294
0.296
0.298
0.3
0.302
0.304
0.306
The number of rank s
Theclassificationacc
full rank
low rank
(f) VOWEL sparse linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.36
0.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
The number of rank s
Theclassificationacc
full rank
low rank
(g) MNIST linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
The number of rank s
Theclassificationacc
full rank
low rank
(h) MNIST ridge regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.65
0.7
0.75
0.8
0.85
The number of rank s
Theclassificationacc
full rank
low rank
(i) MNIST sparse linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.6
0.65
0.7
0.75
0.8
0.85
The number of rank s
Theclassificationacc
full rank
low rank
(j) JAFFE linear regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.85
0.9
0.95
1
The number of rank s
Theclassificationacc
Least square loss with ridge regression
full rank
low rank
(k) JAFFE ridge regression
5 5.5 6 6.5 7 7.5 8 8.5 9
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
The number of rank s
Theclassificationacc
full rank
low rank
(l) JAFFE sparse linear regression
17 19 21 23 25 27 29 31 33 35
0.34
0.35
0.36
0.37
0.38
0.39
0.4
The number of rank s
Theclassificationacc
full rank
low rank
(m) BINALPHA36 linear regression
17 19 21 23 25 27 29 31 33 35
0.59
0.595
0.6
0.605
0.61
0.615
The number of rank s
Theclassificationacc
full rank
low rank
(n) BINALPHA36 ridge regression
17 19 21 23 25 27 29 31 33 35
0.58
0.585
0.59
0.595
0.6
0.605
0.61
The number of rank s
Theclassificationacc
full rank
low rank
(o) BINALPHA36 sparse linear regres-
sion
13 15 17 19 21 23 25
0.36
0.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
The number of rank s
Theclassificationacc
full rank
low rank
(p) BINALPHA26 linear regression
13 15 17 19 21 23 25
0.645
0.65
0.655
0.66
0.665
0.67
0.675
0.68
0.685
The number of rank s
Theclassificationacc
full rank
low rank
(q) BINALPHA26 ridge regression
13 15 17 19 21 23 25
0.635
0.64
0.645
0.65
0.655
0.66
0.665
The number of rank s
Theclassificationacc
full rank
low rank
(r) BINALPHA26 sparse linear regres-
sion
Demonstration of the low-rank structure and sparse
structure found by our proposed SLRR method.
0 5 10 15 20
0
0.2
0.4
0.6
0.8
1
1.2
1.4
x 10
−3
index of singular value
singularvalue
index of class
absofweightcoefficients
5 10 15 20
100
200
300
400
500
600
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
x 10
−4
(a) UMIST low-rank structure and sparse structure
1 2 3 4 5 6 7 8 9 10
0
0.01
0.02
0.03
0.04
0.05
0.06
index of singular value
singularvalue
index of class
absofweightcoefficients
2 4 6 8 10
1
2
3
4
5
6
7
8
9
10
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
(b) VOWEL low-rank structure and sparse structure
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
1.2
x 10
−3
index of singular value
singularvalue
index of class
absofweightcoefficients
2 4 6 8 10
100
200
300
400
500
600
700
0.5
1
1.5
2
2.5
x 10
−4
(c) MNIST low-rank structure and sparse structure
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
1.2
x 10
−3
index of singular value
singularvalue
index of class
absofweightcoefficients
2 4 6 8 10
100
200
300
400
500
600
700
800
900
1000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
−4
(d) JAFFE low-rank structure and sparse structure
0 5 10 15 20 25 30 35 40
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
index of singular value
singularvalue
index of class
absofweightcoefficients
5 10 15 20 25 30 35
50
100
150
200
250
300
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
−3
(e) BINALPHA36 low-rank structure and sparse structure
0 5 10 15 20 25 30
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
index of singular value
singularvalue
index of class
absofweightcoefficients
5 10 15 20 25
50
100
150
200
250
300
0.5
1
1.5
2
2.5
3
3.5
4
x 10
−3
(f) BINALPHA26 low-rank structure and sparse structure

Weitere ähnliche Inhalte

Was ist angesagt?

Direct Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems OfDirect Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems OfMarcela Carrillo
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysisnadiazaheer
 
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...Alkis Vazacopoulos
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and RegressionNeha Dokania
 
Solution to linear equhgations
Solution to linear equhgationsSolution to linear equhgations
Solution to linear equhgationsRobin Singh
 
Direct Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations SystemsDirect Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations SystemsLizeth Paola Barrero
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regressionalok tiwari
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-reportRyen Krusinga
 

Was ist angesagt? (20)

Direct Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems OfDirect Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems Of
 
Mx2421262131
Mx2421262131Mx2421262131
Mx2421262131
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Solution of linear system of equations
Solution of linear system of equationsSolution of linear system of equations
Solution of linear system of equations
 
5 regression
5 regression5 regression
5 regression
 
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...
Sparse Observability using LP Presolve and LTDL Factorization in IMPL (IMPL-S...
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Solution to linear equhgations
Solution to linear equhgationsSolution to linear equhgations
Solution to linear equhgations
 
Ijetr021210
Ijetr021210Ijetr021210
Ijetr021210
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Direct Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations SystemsDirect Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations Systems
 
004
004004
004
 
Chap5 correlation
Chap5 correlationChap5 correlation
Chap5 correlation
 
Regression
RegressionRegression
Regression
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 

Ähnlich wie poster_kdd_2013_LRLR

Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regressionAlexander Decker
 
Stability and pole location
Stability and pole locationStability and pole location
Stability and pole locationssuser5d64bb
 
Dynamics of actin filaments in the contractile ring
Dynamics of actin filaments in the contractile ringDynamics of actin filaments in the contractile ring
Dynamics of actin filaments in the contractile ringPrafull Sharma
 
Electromagnetic theory Chapter 1
Electromagnetic theory Chapter 1Electromagnetic theory Chapter 1
Electromagnetic theory Chapter 1Ali Farooq
 
Robust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionRobust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionActiveEon
 
Bba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsBba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsStephen Ong
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...Austin Benson
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression modelRADO7900
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decompositionAllenWu
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionMaria Theresa
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionMaria Theresa
 
Variable charts
Variable chartsVariable charts
Variable chartsHunhNgcThm
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptxnikshaikh786
 

Ähnlich wie poster_kdd_2013_LRLR (20)

Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regression
 
presentation
presentationpresentation
presentation
 
Presentation on matrix
Presentation on matrixPresentation on matrix
Presentation on matrix
 
Stability and pole location
Stability and pole locationStability and pole location
Stability and pole location
 
Dynamics of actin filaments in the contractile ring
Dynamics of actin filaments in the contractile ringDynamics of actin filaments in the contractile ring
Dynamics of actin filaments in the contractile ring
 
Electromagnetic theory Chapter 1
Electromagnetic theory Chapter 1Electromagnetic theory Chapter 1
Electromagnetic theory Chapter 1
 
Robust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionRobust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object Detection
 
Bba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsBba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression models
 
Lesson04
Lesson04Lesson04
Lesson04
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression model
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Variable charts
Variable chartsVariable charts
Variable charts
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 

poster_kdd_2013_LRLR

  • 1. On The Equivalent of Low-Rank Linear Regressions and Linear Discriminant Analysis Based Regressions Xiao Cai, Chris Ding, Feiping Nie, Heng Huang CSE Department, The University of Texas at Arlington xiao.cai@mavs.uta.edu, chqding@uta.edu, feipingnie@gmail.com, heng@uta.edu Problem Multivariate linear regression attempts to model the relationship between predictors and responses by fitting a linear equation to observed data. Such linear regression models suffer from the following two deficiencies. On one hand, the linear regression models usually have low performance for analyzing the high-dimensional data. To perform accurate re- gression or classification tasks on such data, we have to collect an enormous number of samples. Howev- er, due to the data and label collection difficulty, we often cannot obtain enough samples and suffer from the curse-of-dimensionality problem [1]. On the other hand, the linear regression models don’t emphasize the correlations among different respons- es. Standard least squares regression is equivalent to regressing each response on the predictors sepa- rately. Our Key Contributions (1) We prove that the low-rank linear regression is e- quivalent to doing linear regression in the LDA sub- space. (2) We derive global and concise algorithms for low- rank regression models. (3) We show the connection between low-rank re- gression and regularized LDA. From both theory and experiments, the low-rank ridge regression has better performance than the low-rank linear regres- sion, which has been used in many existing studies. (4) To solve related feature selection problem, we propose the sparse low-rank regression method with exploring both classes/tasks correlations and fea- ture structures. Reference [1] D.Donoho: High-dimensional data analysis: The curses and blessings of dimensionality. AM- S Math Challenges Lecture, pages 1-32, (2000) [2] T. Anderson. Estimating linear retrictions on regression coefficients for multivariate normal distributions. AMS, pages 327-351, (1951) Linear Low-Rank Regression And LDA + LR Traditional Linear Regression model for classification is to solve the following problem: min W ||Y − XT W ||2 F , (1) where X = [x1, x2, ...., xn] ∈ ℜd×n is the centered training data matrix and Y ∈ ℜn×k is the normalized class indicator matrix, i.e. Yi,j = 1/ √ nj if the i-th data point belongs to the j-th class and Yi,j = 0 otherwise and nj is the sample size of the j-th class. When the class or task number is large, there are of- ten underlying correlation structures between classes or tasks. To incorporate the response correlations into the regression method [2], we propose the following discrimi- nant Low-Rank Linear Regression formulation (LRLR): min A,B ||Y − XT AB||2 F , (2) where A ∈ ℜd×s , B ∈ ℜs×k , s < min(n, k). Thus W = AB has low-rank s. Theorem 1 The low-rank linear regression method of Eq. (2) is identical to doing standard linear regression in LDA subspace. Proof: Denoting J1(A, B) = ||Y −XT AB||2 F and taking its derivative w.r.t. B, we have, ∂J1(A, B) ∂B = −2AT XY + 2AT XXT AB. (3) Setting Eq. (3) to zero, we obtain, B = (AT XXT A)−1 AT XY. (4) Substituting Eq. (4) back into Eq. (2), we have, min A ||Y − XT A(AT XXT A)−1 AT XY ||2 F , (5) which is equivalent to max A Tr ((AT (XXT )A)−1 AT XY Y T XT A). (6) Note that St = XXT , Sb = XY Y T XT , (7) where St and Sb are the total-class scatter matrix and the between-class scatter matrix defined in the LDA, respectively. Therefore, the solution of Eq. (6) can be written as: A∗ = arg max A Tr [(AT StA)−1 AT SbA], (8) which is exactly the problem of LDA. Two Extensions: LRRR, SLRR Theorem 2 The proposed Low-Rank Ridge Regres- sion (LRRR) method min A,B ||Y −XT AB||2 F +λ||AB||2 F is equivalent to doing the regularized regression in the regularized LDA subspace. Theorem 3 The optimal solution of the proposed SLRR method min A,B ||Y − XT AB||2 F + λ||AB||2,1 has the same column space of a special regularized LDA. Algorithms The algorithm to LRLR or LRRR or SLRR Input: 1. The centralized training data X ∈ ℜd×n. 2. The normalized training indicator matrix Y ∈ ℜn×k. 3. The low-rank parameter s. 4. IF LRRR or SLRR, the regularization parameter λ. Output: 1. The matrices A ∈ ℜd×s and B ∈ ℜs×k. Process: IF LRLR, Calculate A by Eq. (8) Calculate B by Eq. (4) ELSE IF LRRR, Calculate A by A∗ = arg max A {Tr((AT (St + λI)A)−1AT SbA)} Calculate B by B = (AT (XXT + λI)A)−1AT XY ELSE IF SLRR, Initialization: 1. Set t = 0 2. Initialize D(t) = I ∈ ℜd×d. Repeat: 1. Calculate A(t+1) A∗ = arg max A {Tr ((AT (St + λD)A)−1AT SbA)} 2. Calculate B(t+1) B = (AT (XXT + λD)A)−1AT XY 3. Update the diagonal matrix D(t+1) ∈ ℜd×d, where the i-th diagonal element is 1 2||(A(t+1)B(t+1))i||2 . 4. Update t = t + 1 Until Converge. END Experiment Data Summary Dataset k d n UMIST 20 10304 575 BIN36 36 320 1404 BIN26 26 320 1014 VOWEL 11 10 990 MNIST 10 784 150 JAFFE 10 1024 213 Experiment Results The average classification accuracy V.S. the rank using 5-fold C.V. on six datasets, low rank is marked as red and full rank is marked as blue. Left column: linear regression; middle column: ridge regression; right column: sparse regression 11 12 13 14 15 16 17 18 19 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 The number of rank s Theclassificationacc full rank low rank (a) UMIST linear regression 11 12 13 14 15 16 17 18 19 0.91 0.92 0.93 0.94 0.95 0.96 0.97 The number of rank s Theclassificationacc full rank low rank (b) UMIST ridge regression 11 12 13 14 15 16 17 18 19 0.935 0.94 0.945 0.95 0.955 0.96 0.965 0.97 0.975 The number of rank s Theclassificationacc full rank low rank (c) UMIST sparse linear regression 5 6 7 8 9 10 0.29 0.291 0.292 0.293 0.294 0.295 0.296 0.297 0.298 0.299 The number of rank s Theclassificationacc full rank low rank (d) VOWEL linear regression 5 6 7 8 9 10 0.292 0.294 0.296 0.298 0.3 0.302 0.304 0.306 0.308 The number of rank s Theclassificationacc full rank low rank (e) VOWEL ridge regression 5 6 7 8 9 10 0.29 0.292 0.294 0.296 0.298 0.3 0.302 0.304 0.306 The number of rank s Theclassificationacc full rank low rank (f) VOWEL sparse linear regression 5 5.5 6 6.5 7 7.5 8 8.5 9 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 The number of rank s Theclassificationacc full rank low rank (g) MNIST linear regression 5 5.5 6 6.5 7 7.5 8 8.5 9 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 The number of rank s Theclassificationacc full rank low rank (h) MNIST ridge regression 5 5.5 6 6.5 7 7.5 8 8.5 9 0.65 0.7 0.75 0.8 0.85 The number of rank s Theclassificationacc full rank low rank (i) MNIST sparse linear regression 5 5.5 6 6.5 7 7.5 8 8.5 9 0.6 0.65 0.7 0.75 0.8 0.85 The number of rank s Theclassificationacc full rank low rank (j) JAFFE linear regression 5 5.5 6 6.5 7 7.5 8 8.5 9 0.85 0.9 0.95 1 The number of rank s Theclassificationacc Least square loss with ridge regression full rank low rank (k) JAFFE ridge regression 5 5.5 6 6.5 7 7.5 8 8.5 9 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 The number of rank s Theclassificationacc full rank low rank (l) JAFFE sparse linear regression 17 19 21 23 25 27 29 31 33 35 0.34 0.35 0.36 0.37 0.38 0.39 0.4 The number of rank s Theclassificationacc full rank low rank (m) BINALPHA36 linear regression 17 19 21 23 25 27 29 31 33 35 0.59 0.595 0.6 0.605 0.61 0.615 The number of rank s Theclassificationacc full rank low rank (n) BINALPHA36 ridge regression 17 19 21 23 25 27 29 31 33 35 0.58 0.585 0.59 0.595 0.6 0.605 0.61 The number of rank s Theclassificationacc full rank low rank (o) BINALPHA36 sparse linear regres- sion 13 15 17 19 21 23 25 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 The number of rank s Theclassificationacc full rank low rank (p) BINALPHA26 linear regression 13 15 17 19 21 23 25 0.645 0.65 0.655 0.66 0.665 0.67 0.675 0.68 0.685 The number of rank s Theclassificationacc full rank low rank (q) BINALPHA26 ridge regression 13 15 17 19 21 23 25 0.635 0.64 0.645 0.65 0.655 0.66 0.665 The number of rank s Theclassificationacc full rank low rank (r) BINALPHA26 sparse linear regres- sion Demonstration of the low-rank structure and sparse structure found by our proposed SLRR method. 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 1.2 1.4 x 10 −3 index of singular value singularvalue index of class absofweightcoefficients 5 10 15 20 100 200 300 400 500 600 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 x 10 −4 (a) UMIST low-rank structure and sparse structure 1 2 3 4 5 6 7 8 9 10 0 0.01 0.02 0.03 0.04 0.05 0.06 index of singular value singularvalue index of class absofweightcoefficients 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 (b) VOWEL low-rank structure and sparse structure 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 1.2 x 10 −3 index of singular value singularvalue index of class absofweightcoefficients 2 4 6 8 10 100 200 300 400 500 600 700 0.5 1 1.5 2 2.5 x 10 −4 (c) MNIST low-rank structure and sparse structure 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 1.2 x 10 −3 index of singular value singularvalue index of class absofweightcoefficients 2 4 6 8 10 100 200 300 400 500 600 700 800 900 1000 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 −4 (d) JAFFE low-rank structure and sparse structure 0 5 10 15 20 25 30 35 40 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 index of singular value singularvalue index of class absofweightcoefficients 5 10 15 20 25 30 35 50 100 150 200 250 300 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 −3 (e) BINALPHA36 low-rank structure and sparse structure 0 5 10 15 20 25 30 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 index of singular value singularvalue index of class absofweightcoefficients 5 10 15 20 25 50 100 150 200 250 300 0.5 1 1.5 2 2.5 3 3.5 4 x 10 −3 (f) BINALPHA26 low-rank structure and sparse structure