Presentation 5 march persyval

SLICED INVERSE REGRESSION
5
th
March 2015, Grenoble
ALESSANDRO CHIANCONE
STEPHANE GIRARD JOCELYN CHANUSSOT

1 2 3
SEQUENTIAL DIMENSION REDUCTION
Given a response continuous variable Y 2 R and X 2 Rp
explanatory variables
we look for a functional relationship between the two:
Y = f(X, ✏), ✏ independent of X
It is reasonable to think that X contains more information that what we need
for our task of predicting Y , in addition when the dimension p is large
regression is a very hard task when we do not have information about f.
Given that a possible solution is to make a strong assumption:
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
The space S = Span{ 1, 2, ..., k} is called e↵ective dimension reduction
(e.d.r.) space. Only few linear combinations of the predictors, k ⌧ p, are
su cient to regress Y .
1
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
2
Suppose k = 1 for simplicity: Y = f( T
X, ✏)
We want to find the direction that best explains Y.
In other words if Y is fixed then T
X should not vary. Consequently our goal
is to find a direction which minimizes the variations of T
X given Y .
To estimate T
X|Y we arrange Y in h slices each with the same number of
samples, our goal is to find a direction which minimizes the within-slice
variance of T
X under the constraint var( T
X) = 1.
Since ˆ⌃ = ˆB + ˆW, where ˆ⌃, ˆB, ˆW are respectively the sample covariance
matrix, the between-slice covariance matrix and the within-slice covariance
matrix an equivalent approach is to maximize the between-slice variance under
the same constraint.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
Y
T
X
2

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
S1
S2
Whithin-slice variance for slice S10
Y
T
X
2

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
3
SIR IN PRACTICE
SIR is quite simple to implement:
• Split Y in h slices
• Estimate using the slices, ˆ is the between-slice covariance matrix
• Compute the eigendecomposition of the matrix ˆ⌃ 1 ˆ, where ˆ⌃ is the
empirical covariance matrix of X
• Select the eigenvectors corresponding to the highest eigenvalues.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
4
SIR IN THEORY
Let us go back to the general case Y = f( T
1 X, T
2 X, ..., T
k X, ✏).
Since f is unknown it is not possible to directly retrieve 1, 2, ..., k but a
basis of the e.d.r. space S. It can be shown that the previous SIR methodology
solves this problem if the so-called Linearity Design Condition is veriﬁed:
(LDC) 8b 2 Rp
E(bT
X| T
1 X, T
2 X, ..., T
k X) = c0 + c1
T
1 X + ... + ck
T
k X for
some constants c0, ..., ck.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
5
Recently many papers pointed out limitations of SIR since the
eigendecomposition can be challenging when the covariance matrix ⌃ is ill
conditioned. Many solutions have been proposed starting from preprocessing
the data using PCA [1]
to a more comprehensive approach to regularize SIR [2]
.
The weakest point of SIR is the (LDC) which cannot be verified in practice
because it depends on the true directions 1, ..., k. The condition holds in
case of elliptic symmetry and more generally it has been shown [3]
that if the
dimension p tends to infinity the condition is always approximately verified.
[2]BERNARD-MICHEL, Caroline; GARDES, Laurent; GIRARD, Stéphane. Gaussian regularized sliced inverse regression. Statistics and Computing,
2009, 19.1: 85-98.
[1]LI, Lexin; LI, Hongzhe. Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics, 2004, 20.18:
3406-3412.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
5
Recently many papers pointed out limitations of SIR since the
eigendecomposition can be challenging when the covariance matrix ⌃ is ill
conditioned. Many solutions have been proposed starting from preprocessing
the data using PCA [1]
to a more comprehensive approach to regularize SIR [2]
.
The weakest point of SIR is the (LDC) which cannot be verified in practice
because it depends on the true directions 1, ..., k. The condition holds in
case of elliptic symmetry and more generally it has been shown [3]
that if the
dimension p tends to infinity the condition is always approximately verified.
[2]BERNARD-MICHEL, Caroline; GARDES, Laurent; GIRARD, Stéphane. Gaussian regularized sliced inverse regression. Statistics and Computing,
2009, 19.1: 85-98.
[3]HALL, Peter; LI, Ker-Chau. On almost linearity of low dimensional projections from high dimensional data. The annals of Statistics, 1993, 867-889.
[1]LI, Lexin; LI, Hongzhe. Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics, 2004, 20.18:
3406-3412.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
6
CLUSTER SIR
[1]KUENTZ, Vanessa; SARACCO, Jérôme. Cluster-based sliced inverse regression. Journal of the Korean Statistical Society, 2010, 39.2: 251-267.
The LDC is verified when X follows an elliptically symmetric distribution
(e.g. multi normality of X).
When X follows a Gaussian mixture model the condition does not globally
hold but it is verified locally (i.e. in each mixture).
Kuentz & Saracco [1]
clusterized X to force the condition to hold locally in
each cluster and then combine the result in each cluster to obtain the final
e.d.r. directions
Our work is based on this intuition. We first clusterize the predictor space X
then a greedy merging algorithm is proposed to assign each cluster to its e.d.r
space taking into account the size of the cluster on which SIR is performed.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
SIR - CLUSTER SIR k = 1
SIR - Y = f( T
X)
Cluster SIR- X = X1 [ X2 [ .... [ Xc,
where c is the number of clusters
Yi = f( T
Xi), i = 1, ..., c
The e.d.r. direction and the linking
function do not change depending
on the cluster.
7

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
SIR - CLUSTER SIR COLLABORATIVE SIRk = 1
X = X1 [ X2 [ ... [ Xc
Yi = fi( T
i Xi), i 2 { 1, ..., D}
The number D (D  c) of e.d.r.
directions is unknown.
The e.d.r. directions and the link
function may change depending
on the cluster.
A merging algorithm is introduced to
infer the number D based on the
collinearity of the vectors i.
SIR - Y = f( T
X)
Cluster SIR- X = X1 [ X2 [ .... [ Xc,
where c is the number of clusters
Yi = f( T
Xi), i = 1, ..., c
The e.d.r. direction and the linking
function do not change depending
on the cluster.
7

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
8
The directions 1, ..., c are obtained applying SIR independently in each
cluster.
A hierarchical structure is built to infer the number D of e.d.r. directions
using a proximity criteria.
The most collinear vector to a set of vectors A = { 1, ..., c} given the
proximity criterion m(a, b) = cos2
(a, b) = (aT
b)2
is the solution of the
following problem:
(A) = max
a2Rp
X
t2A
wtm( t, a) s.t. kak = 1
= largest eigenvalue of
X
t2A
wt t
T
t
where wt are weights and sum to one.
To build the hierarchy we consider the following iterative algorithm initialized
with the set A = {{ 1}, ..., { c}}:
while card(A) 6= 1

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
9
1 122 11
...
11 12
...
...
...
1 , ..., 12
1 , 2
1 , 2 11 , 12
number of merges

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
10
(A) = max
a2Rp
X
t2A
wtm( t, a) s.t. kak = 1
= largest eigenvalue of
X
t2A
wt t
T
t
where wt are weights and sum to one.
To build the hierarchy we consider the following iterative algorithm initialized
with the set A = {{ 1}, ..., { c}}:
while card(A) 6= 1
Let a, b 2 A such that (a [ b) > (c [ d)8c, d 2 A
A = (A {a, b}) [ a [ b
end
at each step the cardinality of the set A decreases merging the most collinear
sets of directions. Therefore it is possible to infer the number D of underlying
e.d.r. spaces analyzing the values of in the hierarchy.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
11
REGULARIZATION
After merging, each cluster is assigned one of the D e.d.r. directions ˆ1, ..., ˆD
For each Xi, i = 1, ..., c we consider the D e.d.r directions and we analyze the
two-dimensional distributions:
(Yi, ˆT
j Xi) 8j = 1, ..., D
Then we select the direction ˆj⇤ , j⇤
= min
j=1,...,D
2,j, where 2,j is the second
eigenvalue of the covariance matrix Cov(Yi, ˆT
j Xi).
This step reconsiders the data to select the best direction from the pool of the
D estimated directions

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
12
EXPERIMENTAL RESULTS
[1]CHAVENT, Marie, et al. A sliced inverse regression approach for data stream. Computational Statistics, 2013, 1-24.
We simulated 100 di↵erent datasets following a gaussian mixture model:
• 10 mixture components
• uniform mixing proportions
• 2500 samples
• dimension p = 240
The response variable Y is generated using the hyperbolic sin link function:
Yi = sinh( T
i Xi) + ✏, ✏ independent of X.
We will show the case where i 2 { 1, 2} i.e. the number of e.d.r. directions
D = 2
The proximity criteria m(ˆ, ) = cos2
(ˆ, ) = (ˆT
)2
is evaluated to assess
the quality of the estimation [1]

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
13
−10
−5
0
5
10
−6−4−202468
−6
−4
−2
0
2
4
6
8
Projection on the first two principal components

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
14
The proximity criteria is computed using the estimation obtained in each
cluster independently, PC, and after collaborative SIR, PCM, for each run of
K-means:
The average PC is 0.4499 with a standard deviation of 0.0690
The average PCM is 0.7939 with a standard deviation of 0.0960
1 2 3 4 5 6 7 8 9
0
10
20
30
40
50
60
ˆD
0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
5
10
15
20
25
PCM PC

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
15
Y
projection

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
16
REAL DATASET: HORSE MUSSELS
We consider the data example of horse mussels present in cluster SIR. The
observations correspond to n = 192 horse mussels captured in the Malborough
Sounds at the Northeast of New Zealand’s South Island. The response variable
Y is the muscle mass, the edible portion of the mussel, in grams. The
predictor X is of dimension p = 4 and measures numerical characteristics of
the shell: length, width, height, each in mm, and mass in grams.
We repeated the following algorithm 100 times:
(1) Randomly select 80% of training and 20% of test.
(2) Apply SIR, cluster SIR and collaborative SIR on the training.
(3) Regress the functions using the training samples
(4) Compute the Mean Absolute Relative Error (MARE) on the test

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
17
MARE

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
18

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
19
REAL DATASET: GALAXIES
We consider the data example of Galaxies. The observations correspond to
n = 292766 di↵erent galaxies. The response variable Y is the stellar formation
rate. The predictor X is of dimension p = 46 and is composed of spectral
characteristics of the galaxies
We applied Collaborative SIR on the whole dataset to investigate the presence
of subgroups and di↵erent directions.
After di↵erent runs and number k of clusters we can observe that there are
two di↵erent subgroups and hence directions 1, 2.

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
20
Stellarformationrate
XT
i, i = 1, 2

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
21
1
2
Di↵erences between 1 and 2

1 2 3
Y = f( T
1 X, T
2 X, ..., T
k X, ✏)
22
CONCLUSIONS
Slices Inverse Regression
is deﬁnitely and
underestimated method,
give it a try!
Since data nowadays are
complex and rich, our
Collaborative SIR may
catch the subgroups and
allows to reduce
dimensionality in an easy
and fast way
ROBUST
SIR
FUTURE WORK
Publications
Chini, Marco, Alessandro Chiancone, and Salvatore Stramondo. “Scale Object Selection (SOS) through a hierarchical segmentation by a
multi-spectral per-pixel classification.” Pattern Recognition Letters 49 (2014): 214-223. 
Papers in preparation
Alessandro Chiancone, Stephane Girard and Jocelyn Chanussot. “Collaborative Sliced Inverse Regression.”

[4]KUENTZ, Vanessa; SARACCO, Jérôme. Cluster-based sliced inverse regression. Journal of the Korean Statistical Society,
2010, 39.2: 251-267.
[2]Bernard-Michel, C., Gardes, L., & Girard, S. (2009). Gaussian regularized sliced inverse regression. Statistics and Computing,
19(1), 85-98.
[3]HALL, Peter; LI, Ker-Chau. On almost linearity of low dimensional projections from high dimensional data. The annals of Statistics,
1993, 867-889.
[1]LI, Lexin; LI, Hongzhe. Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics,
2004, 20.18: 3406-3412.
[1]CHAVENT, Marie, et al. A sliced inverse regression approach for data stream. Computational Statistics, 2013, 1-24.
CLASSIFICATION
SEGMENTATIONREGRESSION
thank you

Presentation 5 march persyval

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie Presentation 5 march persyval

Ähnlich wie Presentation 5 march persyval (20)

Presentation 5 march persyval