SlideShare a Scribd company logo
1 of 36
Download to read offline
Combining Lazy Learning, Racing
and Subsampling
for Effective Feature Selection
Gianluca Bontempi, Mauro Birattari, Patrick E. Meyer
{gbonte,mbiro,pmeyer}@ulb.ac.be
ULB, Université Libre de Bruxelles
Boulevard de Triomphe - CP 212
Bruxelles, Belgium
http://www.ulb.ac.be/di/mlg
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 1/2
Outline
• Local vs. global modeling
• Wrapper feature selection and local modeling
• F-Racing and subsampling
• Experimental results
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 2/2
The global modeling approach
x
y
q
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Input-output regression problem.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
The global modeling approach
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
x
y
q
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
0
0
1
1
01
01
01
01
01
01
Training data set.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
The global modeling approach
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
x
y
q
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
Global model fitting.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
The global modeling approach
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
y
q x
Prediction by discarding the data and using the fitted global model.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
The global modeling approach
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
x
y
q
Another prediction by using the fitted global model.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
The local modeling approach
x
y
q
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Input-output regression problem.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
The local modeling approach
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
x
y
q
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
0
0
1
1
01
01
01
01
01
01
Training data set.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
The local modeling approach
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
x
y
q
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
01
0
0
1
101
0
0
1
101
0
0
1
101
0
0
1
101
0
0
0
1
1
1
01
01
01
01
01
01
0
0
1
1
01
Ranking of data according to a metric, selection of neighbours, local
fitting and prediction.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
The local modeling approach
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
0
0
1
1
01
01
01
01
01
01
01
01
01
01
01
01
0
0
1
1 0
0
1
1
01
01
01
01
0
0
1
1
01
01
01
01
01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
x
y
q
01
01
01
01
Another prediction: again ranking of data according to a metric,
selection of neighbours, local fitting and prediction
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
Global models: pros and cons
• Examples of global models are linear regression models and
neural networks.
• PRO: even for huge datasets, a parametric model can be stored
in a small memory.
• CON:
• in the nonlinear case learning procedures are typically slow
and analytically intractable.
• validation methods, which address the problem of assessing a
global model on the basis of a finite amount of noisy samples,
are computationally prohibitive.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 5/2
Local models: pros and cons
• Examples of local models are locally weighted regression and
nearest neighbours
• We will consider here a Lazy Learning algorithm [2, 5, 4]
published in previous works.
• PRO: fast and easy local linear learning procedures for
parametric identification and validation.
• CON:
• the dataset of observed input/output data must always be kept
in memory.
• Each prediction requires a repetition of the learning procedure.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 6/2
Complexity in global and local modeling
• Consider a nonlinear regression problem where we have N
training samples, n given features and Q query points (i.e. Q
predictions to be performed).
• Let us compare the computational cost of a nonlinear global
learner (e.g. a neural network) and a local learner (with k << N
neighbors).
• Suppose that the nonlinear global learning procedure relies on a
nonlinear parametric identification step (e.g. backpropagation to
compute the weights) and a structural identification step (e.g.
K-fold cross-validation to define the number of hidden nodes).
• Suppose that the local learning relies on a local leave-one-out
linear criterion (PRESS statistic).
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 7/2
Complexity in global and local modeling
GLOBAL LOCAL
Parametric identification CNLS O(Nn)+CLS
Structural identification by K-fold cross-validation KCNLS small
Cost of Q predictions (K + 1)CNLS Q(O(Nn) + CLS)
where CNLS and CLS represent the cost of a nonlinear and a linear
least squares, respectively.
The global modeling approach is computationally advantageous wrt to
the local modeling one when the same model is expected to be used
for many predictions. Otherwise, a local approach is to be preferred.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 8/2
Feature selection
• In recent years many applications of data mining (text mining,
bioinformatics, sensor networks) deal with a very large number n
of features (e.g. tens or hundreds of thousands of variables) and
often comparably few samples.
• In these cases, it is common practice to adopt feature selection
algorithms [7] to improve the generalization accuracy.
• Several techniques exist for feature selection: we focus here on
wrapper search techniques.
• Wrapper methods assess subsets of variables according to their
usefulness to a given learning machine. These methods conducts
a search for a good subset using the learning algorithm itself as
part of the evaluation function. The problem boils down to a
problem of stochastic state space search.
• Well-known example of greedy wrapper search is forward
selection. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 9/2
Why being local in feature selection?
• Suppose that we have F feature set candidates, N training
samples and that the assessment is perfomed by leave-one-out.
• The conventional approach is to to test all the F leave-one-out
models on all the N samples ans choose the best.
• This requires the training of F ∗ N different models, each one
used for a single prediction.
• The use of a global model demands a huge cost of retraining.
• Local approaches appear to be an effective alternative.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 10/2
Racing and subsampling: an analogy
• You are a national team football trainer who has to select the
goalkeeper among a set of four candidates for the next World
Cup, starting the next month.
• You have available only twenty days of training session and eight
days to let the players play matches.
• Two options:
1. (i) Train all the candidates during the first twenty days, (ii) test
all of them with matches the last eight days, and (iii) make a
decision.
2. (i) Alternate each week of training with two matches, (ii) after
each week, assess the candidates and if there is someone
significantly worse than the others discard him (iii) keep
selecting the others.
• In our analogy the players are the feature subsets, the training
days are the training data, the matches are the test data.Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 11/2
The racing idea
• Suppose that we have F feature set candidates, N training
samples and that the assessment is perfomed by leave-one-out.
• The conventional approach is to to test all the F models on all the
N samples and eventually choose the best.
• The racing idea [8] is to test each feature set on one point at the
time.
• After only a small number of points, by using statistical tests, we
can detect that some feature sets are significantly worse than
others.
• We can discard them and keep focusing on the others.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 12/2
Non racing approach
Consider this simple example: we have F = 5 feature subsets and
N = 10 samples to select the best feature set by leave-one-out corss
validation.
Squared error
0.1
0.4
0.3
0.7
0.5
2
0.1
4
3.2
4
1.5ESTIMATED
0.3
0.6
1.7
2.5
2
3.1
4
5.2
4
4
0.2
0.5
0.4
1.2
1
2.7
3.5
5.3
3.9
4
2.7 2.2
0.0
0.1
0.1
0.9
0.4
1.9
0.0
3.5
3.4
0.2
1.0
0.05
0.2
0.4
0.8
0.5
2.4
3.0
8.4
4.2
3.9
2.4
WINNER
MSE
F1 F2 F3 F4 F5
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
i=10
After 50 training and test procedures, we have the best candidate.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 13/2
Racing approach
Squared error
0.1
0.4
0.3
0.6
0.2
0.5
0.0
0.1
0.05
0.2
F1 F2 F3 F4 F5
i=1
i=2
OUT
After only 33 training and test procedures, we have the best candidate.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
Racing approach
Squared error
0.1
0.4
0.3
0.7
0.5
0.3
0.6
0.2
0.5
0.4
1.2
1
0.0
0.1
0.1
0.9
0.4
0.05
0.2
0.4
0.8
0.5
F1 F2 F3 F4 F5
i=1
i=2
i=3
i=4
i=5
OUT
OUT
After only 33 training and test procedures, we have the best candidate.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
Racing approach
Squared error
0.1
0.4
0.3
0.7
0.5
2
0.3
0.6
0.2
0.5
0.4
1.2
1
0.0
0.1
0.1
0.9
0.4
1.9
0.05
0.2
0.4
0.8
0.5
2.4
F1 F2 F3 F4 F5
i=1
i=2
i=3
i=4
i=5
i=6
OUT
OUT
OUT
After only 33 training and test procedures, we have the best candidate.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
Racing approach
MSE
0.1
0.4
0.3
0.7
0.5
2
0.1
4
3.2
4
0.3
0.6
0.2
0.5
0.4
1.2
1
0.0
0.1
0.1
0.9
0.4
1.9
0.0
3.5
3.4
0.2
1.0
0.05
0.2
0.4
0.8
0.5
2.4
WINNER
F1 F2 F3 F4 F5
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
i=10
OUT
OUT
OUT
OUT
Squared error
After only 33 training and test procedures, we have the best candidate.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
F-racing for feature selection
• We propose a nonparametric multiple test, the Friedman test [6],
to compare different configurations of input variables and to select
the ones to be eliminated from the race.
• The use of the Friedman test for racing was proposed first by one
of the authors in the context of a technique for comparing
metaheuristics for combinatorial optimization problems [3]. This is
the first time that the technique is used in a feature selection
setting.
• The main merit of this nonparametric approach is that it does not
require to formulate hypotheses on the distribution of the
observations.
• The idea of F-racing techniques consists in using blocking and
paired multiple test to compare different models in similar conditions
and discard as soon as possible the worst ones.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 15/2
Sub-sampling and LL
• The goal of feature selection is to find the best subset in a set of
alternatives.
• Given a set of alternative subsets, what we expect is a correct
ranking of their generalization accuracy (eg F2 > F3 > F5 > F1>
F4).
• By subsampling we mean using a random subset of the training
set to perform the assessment of the different feature sets.
• The rationale of subsampling is that by reducing the training set
size N, we deteriorate the accuracy of each single feature subset
without affecting their ranking.
• In LL reducing the training set size N reduces the computational
cost.
• This makes more competitive the LL approach
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 16/2
RACSAM for feature selection
We proposed the RACSAM (RACing+SAMpling) algorithm
1. Define an initial group of promising feature subsets.
2. Start with small training and test sets.
3. Discard by racing all the feature subsets that appear as
significantly worse than the others.
4. Increase the training and test size until at most W winners models
remain.
5. Update the group with new candidates proposed by the search
strategy and go back to step 3.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 17/2
Experimental session
• We compare the performance accuracy of the LL algorithm
enhanced by the RACSAM procedure to the the accuracy of two
state-of-art algorithms, a SVM for regression and a regression
tree (RTREE).
• Two version of the RACSAM algorithm were tested: the first
(LL-RAC1) takes as feature set the best one (in terms of estimate
Mean absolute Error (MAE)) among the W winning candidates :
the second (LL-RAC2) averages the predictions of the best W LL
predictors.
• W = 5, and p-value is 0.01.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 18/2
Experimental results
Five-fold cross-validation on six real datasets of high dimensionality:
Ailerons (N = 14308, n = 40), Pole (N = 15000, n = 48),
Elevators (N = 16599, n = 18), Triazines (N = 186, n = 60),
Wisconsin (N = 194, n = 32) and Census (N = 22784, n = 137).
Dataset AIL POL ELE TRI WIS CEN
LL-RAC1 9.7e-5 3.12 1.6e-3 0.21 27.39 0.17
LL-RAC2 9.0e-5 3.13 1.5e-3 0.12 27.41 0.16
SVM 1.3e-4 26.5 1.9e-3 0.11 29.91 0.21
RTREE 1.8e-4 8.80 3.1e-3 0.11 33.02 0.17
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 19/2
Statistical significativity
• LL-RAC1 vs. LL-RAC2:
• LL-RAC2 is significantly better than LL-RAC1 3 times out of 6
• LL-RAC2 is never significantly worse than LL-RAC1.
• LL-RAC2 vs.state-of-the-art techniques:
• LL-RAC2 approach is never significantly worse than SVM
and/or RTREE
• LL-RAC2 5 times out of 6 significantly better than SVM and 6
times out of 6 significantly better than RTREE.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 20/2
Software
• MATLAB toolbox on Lazy Learning [1].
• R contributed packages:
• lazy package.
• racing package.
• Web page: http://iridia.ulb.ac.be/~lazy.
• About 5000 accesses since October 2002.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 21/2
Conclusions
• Wrapper strategies asks for a huge number of assessments. It is
important to make this process faster and less prone to instability.
• Local strategies reduce the computational cost of training models
that has to be used for few predictions.
• Ranking speeds up the evaluation by discarding bad candidates
as soon as they appear to be statistically significantly worse than
others.
• Sub-sampling combined with local learning can speed up the
training phase in preliminary phases when it is important to
discard the highest number of bad candidates.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 22/2
ULB Machine Learning Group (MLG)
• 7 researchers (1 prof, 6 PhD students), 4 graduate students).
• Research topics: Local learning, Classification, Computational statistics, Data
mining, Regression, Time series prediction, Sensor networks, Bioinformatics.
• Computing facilities: cluster of 16 processors, LEGO Robotics Lab.
• Website: www.ulb.ac.be/di/mlg.
• Scientific collaborations in ULB: IRIDIA (Sciences Appliquées), Physiologie
Moléculaire de la Cellule (IBMM), Conformation des Macromolécules Biologiques
et Bioinformatique (IBMM), CENOLI (Sciences), Microarray Unit (Hopital Jules
Bordet), Service d’Anesthesie (ERASME).
• Scientific collaborations outside ULB: UCL Machine Learning Group (B),
Politecnico di Milano (I), Universitá del Sannio (I), George Mason University (US).
• The MLG is part to the "Groupe de Contact FNRS" on Machine Learning.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 23/2
ULB-MLG: running projects
1. "Integrating experimental and theoretical approaches to decipher the molecular
networks of nitrogen utilisation in yeast": ARC (Action de Recherche Concertée)
funded by the Communauté Fran ˛Açaise de Belgique (2004-2009). Partners:
IBMM (Gosselies and La Plaine), CENOLI.
2. "COMP2
SYS" (COMPutational intelligence methods for COMPlex SYStems)
MARIE CURIE Early Stage Research Training funded by the European Union
(2004-2008). Main contractor: IRIDIA (ULB).
3. "Predictive data mining techniques in anaesthesia": FIRST Europe Objectif 1
funded by the Région wallonne and the Fonds Social Européen (2004-2009).
Partners: Service d’anesthesie (ERASME).
4. "AIDAR - Adressage et Indexation de Documents Multimédias Assistés par des
techniques de Reconnaissance Vocale": funded by Région Bruxelles-Capitale
(2004-2006). Partners: Voice Insight, RTBF, Titan.
Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 24/2
References
[1] M. Birattari and G. Bontempi. The lazy learning toolbox, for
use with matlab. Technical Report TR/IRIDIA/99-7, IRIDIA-
ULB, Brussels, Belgium, 1999.
[2] M. Birattari, G. Bontempi, and H. Bersini. Lazy learn-
ing meets the recursive least-squares algorithm. In M. S.
Kearns, S. A. Solla, and D. A. Cohn, editors, NIPS 11,
pages 375–381, Cambridge, 1999. MIT Press.
[3] M. Birattari, T. Stützle, L. Paquete, and K. Varrentrapp. A
racing algorithm for configuring metaheuristics. In W. B.
Langdon, editor, GECCO 2002, pages 11–18. Morgan
Kaufmann, 2002.
[4] G. Bontempi, M. Birattari, and H. Bersini. Lazy learning
for modeling and control design. International Journal of
Control, 72(7/8):643–658, 1999.
[5] G. Bontempi, M. Birattari, and H. Bersini. A model selection
approach for local learning. Artificial Intelligence Commu-
nications, 121(1), 2000.
[6] W. J. Conover. Practical Nonparametric Statistics. John
Wiley & Sons, New York, NY, USA, third edition, 1999.
24-1
[7] I. Guyon and A. Elisseeff. An introduction to variable and
feature selection. Journal of Machine Learning Research,
3:1157–1182, 2003.
[8] O. Maron and A. Moore. The racing algorithm: Model selec-
tion for lazy learners. Artificial Intelligence Review, 11(1–
5):193–225, 1997.
24-2

More Related Content

Similar to Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
AutonomyIncubator
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
MDO_Lab
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
mosi2005
 
AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
OptiModel
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
butest
 

Similar to Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection (20)

IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation System
 
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for AD
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
 
Local modeling in regression and time series prediction
Local modeling in regression and time series predictionLocal modeling in regression and time series prediction
Local modeling in regression and time series prediction
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VI
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 

More from Gianluca Bontempi

A statistical criterion for reducing indeterminacy in linear causal modeling
A statistical criterion for reducing indeterminacy in linear causal modelingA statistical criterion for reducing indeterminacy in linear causal modeling
A statistical criterion for reducing indeterminacy in linear causal modeling
Gianluca Bontempi
 
A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...
Gianluca Bontempi
 
A Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series predictionA Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series prediction
Gianluca Bontempi
 

More from Gianluca Bontempi (10)

A statistical criterion for reducing indeterminacy in linear causal modeling
A statistical criterion for reducing indeterminacy in linear causal modelingA statistical criterion for reducing indeterminacy in linear causal modeling
A statistical criterion for reducing indeterminacy in linear causal modeling
 
Adaptive model selection in Wireless Sensor Networks
Adaptive model selection in Wireless Sensor NetworksAdaptive model selection in Wireless Sensor Networks
Adaptive model selection in Wireless Sensor Networks
 
A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...
 
Machine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series PredictionMachine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series Prediction
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
A Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series predictionA Monte Carlo strategy for structure multiple-step-head time series prediction
A Monte Carlo strategy for structure multiple-step-head time series prediction
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
FP7 evaluation & selection: the point of view of an evaluator
FP7 evaluation & selection: the point of view of an evaluatorFP7 evaluation & selection: the point of view of an evaluator
FP7 evaluation & selection: the point of view of an evaluator
 
Perspective of feature selection in bioinformatics
Perspective of feature selection in bioinformaticsPerspective of feature selection in bioinformatics
Perspective of feature selection in bioinformatics
 
Computational Intelligence for Time Series Prediction
Computational Intelligence for Time Series PredictionComputational Intelligence for Time Series Prediction
Computational Intelligence for Time Series Prediction
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection

  • 1. Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection Gianluca Bontempi, Mauro Birattari, Patrick E. Meyer {gbonte,mbiro,pmeyer}@ulb.ac.be ULB, Université Libre de Bruxelles Boulevard de Triomphe - CP 212 Bruxelles, Belgium http://www.ulb.ac.be/di/mlg Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 1/2
  • 2. Outline • Local vs. global modeling • Wrapper feature selection and local modeling • F-Racing and subsampling • Experimental results Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 2/2
  • 3. The global modeling approach x y q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Input-output regression problem. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
  • 4. The global modeling approach 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x y q 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 0 0 1 1 01 01 01 01 01 01 Training data set. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
  • 5. The global modeling approach 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x y q 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 Global model fitting. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
  • 6. The global modeling approach 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y q x Prediction by discarding the data and using the fitted global model. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
  • 7. The global modeling approach 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x y q Another prediction by using the fitted global model. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 3/2
  • 8. The local modeling approach x y q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Input-output regression problem. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
  • 9. The local modeling approach 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x y q 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 0 0 1 1 01 01 01 01 01 01 Training data set. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
  • 10. The local modeling approach 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x y q 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 01 0 0 1 101 0 0 1 101 0 0 1 101 0 0 1 101 0 0 0 1 1 1 01 01 01 01 01 01 0 0 1 1 01 Ranking of data according to a metric, selection of neighbours, local fitting and prediction. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
  • 11. The local modeling approach 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 0 0 1 1 01 01 01 01 01 01 01 01 01 01 01 01 0 0 1 1 0 0 1 1 01 01 01 01 0 0 1 1 01 01 01 01 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x y q 01 01 01 01 Another prediction: again ranking of data according to a metric, selection of neighbours, local fitting and prediction Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 4/2
  • 12. Global models: pros and cons • Examples of global models are linear regression models and neural networks. • PRO: even for huge datasets, a parametric model can be stored in a small memory. • CON: • in the nonlinear case learning procedures are typically slow and analytically intractable. • validation methods, which address the problem of assessing a global model on the basis of a finite amount of noisy samples, are computationally prohibitive. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 5/2
  • 13. Local models: pros and cons • Examples of local models are locally weighted regression and nearest neighbours • We will consider here a Lazy Learning algorithm [2, 5, 4] published in previous works. • PRO: fast and easy local linear learning procedures for parametric identification and validation. • CON: • the dataset of observed input/output data must always be kept in memory. • Each prediction requires a repetition of the learning procedure. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 6/2
  • 14. Complexity in global and local modeling • Consider a nonlinear regression problem where we have N training samples, n given features and Q query points (i.e. Q predictions to be performed). • Let us compare the computational cost of a nonlinear global learner (e.g. a neural network) and a local learner (with k << N neighbors). • Suppose that the nonlinear global learning procedure relies on a nonlinear parametric identification step (e.g. backpropagation to compute the weights) and a structural identification step (e.g. K-fold cross-validation to define the number of hidden nodes). • Suppose that the local learning relies on a local leave-one-out linear criterion (PRESS statistic). Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 7/2
  • 15. Complexity in global and local modeling GLOBAL LOCAL Parametric identification CNLS O(Nn)+CLS Structural identification by K-fold cross-validation KCNLS small Cost of Q predictions (K + 1)CNLS Q(O(Nn) + CLS) where CNLS and CLS represent the cost of a nonlinear and a linear least squares, respectively. The global modeling approach is computationally advantageous wrt to the local modeling one when the same model is expected to be used for many predictions. Otherwise, a local approach is to be preferred. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 8/2
  • 16. Feature selection • In recent years many applications of data mining (text mining, bioinformatics, sensor networks) deal with a very large number n of features (e.g. tens or hundreds of thousands of variables) and often comparably few samples. • In these cases, it is common practice to adopt feature selection algorithms [7] to improve the generalization accuracy. • Several techniques exist for feature selection: we focus here on wrapper search techniques. • Wrapper methods assess subsets of variables according to their usefulness to a given learning machine. These methods conducts a search for a good subset using the learning algorithm itself as part of the evaluation function. The problem boils down to a problem of stochastic state space search. • Well-known example of greedy wrapper search is forward selection. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 9/2
  • 17. Why being local in feature selection? • Suppose that we have F feature set candidates, N training samples and that the assessment is perfomed by leave-one-out. • The conventional approach is to to test all the F leave-one-out models on all the N samples ans choose the best. • This requires the training of F ∗ N different models, each one used for a single prediction. • The use of a global model demands a huge cost of retraining. • Local approaches appear to be an effective alternative. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 10/2
  • 18. Racing and subsampling: an analogy • You are a national team football trainer who has to select the goalkeeper among a set of four candidates for the next World Cup, starting the next month. • You have available only twenty days of training session and eight days to let the players play matches. • Two options: 1. (i) Train all the candidates during the first twenty days, (ii) test all of them with matches the last eight days, and (iii) make a decision. 2. (i) Alternate each week of training with two matches, (ii) after each week, assess the candidates and if there is someone significantly worse than the others discard him (iii) keep selecting the others. • In our analogy the players are the feature subsets, the training days are the training data, the matches are the test data.Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 11/2
  • 19. The racing idea • Suppose that we have F feature set candidates, N training samples and that the assessment is perfomed by leave-one-out. • The conventional approach is to to test all the F models on all the N samples and eventually choose the best. • The racing idea [8] is to test each feature set on one point at the time. • After only a small number of points, by using statistical tests, we can detect that some feature sets are significantly worse than others. • We can discard them and keep focusing on the others. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 12/2
  • 20. Non racing approach Consider this simple example: we have F = 5 feature subsets and N = 10 samples to select the best feature set by leave-one-out corss validation. Squared error 0.1 0.4 0.3 0.7 0.5 2 0.1 4 3.2 4 1.5ESTIMATED 0.3 0.6 1.7 2.5 2 3.1 4 5.2 4 4 0.2 0.5 0.4 1.2 1 2.7 3.5 5.3 3.9 4 2.7 2.2 0.0 0.1 0.1 0.9 0.4 1.9 0.0 3.5 3.4 0.2 1.0 0.05 0.2 0.4 0.8 0.5 2.4 3.0 8.4 4.2 3.9 2.4 WINNER MSE F1 F2 F3 F4 F5 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 After 50 training and test procedures, we have the best candidate. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 13/2
  • 21. Racing approach Squared error 0.1 0.4 0.3 0.6 0.2 0.5 0.0 0.1 0.05 0.2 F1 F2 F3 F4 F5 i=1 i=2 OUT After only 33 training and test procedures, we have the best candidate. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
  • 22. Racing approach Squared error 0.1 0.4 0.3 0.7 0.5 0.3 0.6 0.2 0.5 0.4 1.2 1 0.0 0.1 0.1 0.9 0.4 0.05 0.2 0.4 0.8 0.5 F1 F2 F3 F4 F5 i=1 i=2 i=3 i=4 i=5 OUT OUT After only 33 training and test procedures, we have the best candidate. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
  • 23. Racing approach Squared error 0.1 0.4 0.3 0.7 0.5 2 0.3 0.6 0.2 0.5 0.4 1.2 1 0.0 0.1 0.1 0.9 0.4 1.9 0.05 0.2 0.4 0.8 0.5 2.4 F1 F2 F3 F4 F5 i=1 i=2 i=3 i=4 i=5 i=6 OUT OUT OUT After only 33 training and test procedures, we have the best candidate. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
  • 24. Racing approach MSE 0.1 0.4 0.3 0.7 0.5 2 0.1 4 3.2 4 0.3 0.6 0.2 0.5 0.4 1.2 1 0.0 0.1 0.1 0.9 0.4 1.9 0.0 3.5 3.4 0.2 1.0 0.05 0.2 0.4 0.8 0.5 2.4 WINNER F1 F2 F3 F4 F5 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 OUT OUT OUT OUT Squared error After only 33 training and test procedures, we have the best candidate. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 14/2
  • 25. F-racing for feature selection • We propose a nonparametric multiple test, the Friedman test [6], to compare different configurations of input variables and to select the ones to be eliminated from the race. • The use of the Friedman test for racing was proposed first by one of the authors in the context of a technique for comparing metaheuristics for combinatorial optimization problems [3]. This is the first time that the technique is used in a feature selection setting. • The main merit of this nonparametric approach is that it does not require to formulate hypotheses on the distribution of the observations. • The idea of F-racing techniques consists in using blocking and paired multiple test to compare different models in similar conditions and discard as soon as possible the worst ones. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 15/2
  • 26. Sub-sampling and LL • The goal of feature selection is to find the best subset in a set of alternatives. • Given a set of alternative subsets, what we expect is a correct ranking of their generalization accuracy (eg F2 > F3 > F5 > F1> F4). • By subsampling we mean using a random subset of the training set to perform the assessment of the different feature sets. • The rationale of subsampling is that by reducing the training set size N, we deteriorate the accuracy of each single feature subset without affecting their ranking. • In LL reducing the training set size N reduces the computational cost. • This makes more competitive the LL approach Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 16/2
  • 27. RACSAM for feature selection We proposed the RACSAM (RACing+SAMpling) algorithm 1. Define an initial group of promising feature subsets. 2. Start with small training and test sets. 3. Discard by racing all the feature subsets that appear as significantly worse than the others. 4. Increase the training and test size until at most W winners models remain. 5. Update the group with new candidates proposed by the search strategy and go back to step 3. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 17/2
  • 28. Experimental session • We compare the performance accuracy of the LL algorithm enhanced by the RACSAM procedure to the the accuracy of two state-of-art algorithms, a SVM for regression and a regression tree (RTREE). • Two version of the RACSAM algorithm were tested: the first (LL-RAC1) takes as feature set the best one (in terms of estimate Mean absolute Error (MAE)) among the W winning candidates : the second (LL-RAC2) averages the predictions of the best W LL predictors. • W = 5, and p-value is 0.01. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 18/2
  • 29. Experimental results Five-fold cross-validation on six real datasets of high dimensionality: Ailerons (N = 14308, n = 40), Pole (N = 15000, n = 48), Elevators (N = 16599, n = 18), Triazines (N = 186, n = 60), Wisconsin (N = 194, n = 32) and Census (N = 22784, n = 137). Dataset AIL POL ELE TRI WIS CEN LL-RAC1 9.7e-5 3.12 1.6e-3 0.21 27.39 0.17 LL-RAC2 9.0e-5 3.13 1.5e-3 0.12 27.41 0.16 SVM 1.3e-4 26.5 1.9e-3 0.11 29.91 0.21 RTREE 1.8e-4 8.80 3.1e-3 0.11 33.02 0.17 Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 19/2
  • 30. Statistical significativity • LL-RAC1 vs. LL-RAC2: • LL-RAC2 is significantly better than LL-RAC1 3 times out of 6 • LL-RAC2 is never significantly worse than LL-RAC1. • LL-RAC2 vs.state-of-the-art techniques: • LL-RAC2 approach is never significantly worse than SVM and/or RTREE • LL-RAC2 5 times out of 6 significantly better than SVM and 6 times out of 6 significantly better than RTREE. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 20/2
  • 31. Software • MATLAB toolbox on Lazy Learning [1]. • R contributed packages: • lazy package. • racing package. • Web page: http://iridia.ulb.ac.be/~lazy. • About 5000 accesses since October 2002. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 21/2
  • 32. Conclusions • Wrapper strategies asks for a huge number of assessments. It is important to make this process faster and less prone to instability. • Local strategies reduce the computational cost of training models that has to be used for few predictions. • Ranking speeds up the evaluation by discarding bad candidates as soon as they appear to be statistically significantly worse than others. • Sub-sampling combined with local learning can speed up the training phase in preliminary phases when it is important to discard the highest number of bad candidates. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 22/2
  • 33. ULB Machine Learning Group (MLG) • 7 researchers (1 prof, 6 PhD students), 4 graduate students). • Research topics: Local learning, Classification, Computational statistics, Data mining, Regression, Time series prediction, Sensor networks, Bioinformatics. • Computing facilities: cluster of 16 processors, LEGO Robotics Lab. • Website: www.ulb.ac.be/di/mlg. • Scientific collaborations in ULB: IRIDIA (Sciences Appliquées), Physiologie Moléculaire de la Cellule (IBMM), Conformation des Macromolécules Biologiques et Bioinformatique (IBMM), CENOLI (Sciences), Microarray Unit (Hopital Jules Bordet), Service d’Anesthesie (ERASME). • Scientific collaborations outside ULB: UCL Machine Learning Group (B), Politecnico di Milano (I), Universitá del Sannio (I), George Mason University (US). • The MLG is part to the "Groupe de Contact FNRS" on Machine Learning. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 23/2
  • 34. ULB-MLG: running projects 1. "Integrating experimental and theoretical approaches to decipher the molecular networks of nitrogen utilisation in yeast": ARC (Action de Recherche Concertée) funded by the Communauté Fran ˛Açaise de Belgique (2004-2009). Partners: IBMM (Gosselies and La Plaine), CENOLI. 2. "COMP2 SYS" (COMPutational intelligence methods for COMPlex SYStems) MARIE CURIE Early Stage Research Training funded by the European Union (2004-2008). Main contractor: IRIDIA (ULB). 3. "Predictive data mining techniques in anaesthesia": FIRST Europe Objectif 1 funded by the Région wallonne and the Fonds Social Européen (2004-2009). Partners: Service d’anesthesie (ERASME). 4. "AIDAR - Adressage et Indexation de Documents Multimédias Assistés par des techniques de Reconnaissance Vocale": funded by Région Bruxelles-Capitale (2004-2006). Partners: Voice Insight, RTBF, Titan. Combining Lazy Learning, Racing and Subsamplingfor Effective Feature Selection – p. 24/2
  • 35. References [1] M. Birattari and G. Bontempi. The lazy learning toolbox, for use with matlab. Technical Report TR/IRIDIA/99-7, IRIDIA- ULB, Brussels, Belgium, 1999. [2] M. Birattari, G. Bontempi, and H. Bersini. Lazy learn- ing meets the recursive least-squares algorithm. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, NIPS 11, pages 375–381, Cambridge, 1999. MIT Press. [3] M. Birattari, T. Stützle, L. Paquete, and K. Varrentrapp. A racing algorithm for configuring metaheuristics. In W. B. Langdon, editor, GECCO 2002, pages 11–18. Morgan Kaufmann, 2002. [4] G. Bontempi, M. Birattari, and H. Bersini. Lazy learning for modeling and control design. International Journal of Control, 72(7/8):643–658, 1999. [5] G. Bontempi, M. Birattari, and H. Bersini. A model selection approach for local learning. Artificial Intelligence Commu- nications, 121(1), 2000. [6] W. J. Conover. Practical Nonparametric Statistics. John Wiley & Sons, New York, NY, USA, third edition, 1999. 24-1
  • 36. [7] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003. [8] O. Maron and A. Moore. The racing algorithm: Model selec- tion for lazy learners. Artificial Intelligence Review, 11(1– 5):193–225, 1997. 24-2