SlideShare a Scribd company logo
1 of 72
Cross-validation aggregation for
forecasting
www.lancs.ac.uk
Devon K. Barrow
Sven F. Crone
1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Motivation 1
• Scenario:
– The statistician constructs a model and wishes to estimate the error
rate of this model when used to predict future values
Motivation
Cross validation aggregation for forecasting Motivation 2
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
1996 - Breiman introduces bootstrapping and aggregation
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
(a) Published items in each year (b) Citations in Each Year
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Bagging for time series
forecasting:
• Forecasting with many
predictors (Watson 2005)
• Macro-economic time series
e.g. consumer price inflation
(Inoue & Kilian 2008)
• Volatility prediction (Hillebrand &
M. C. Medeiros 2010)
• Small datasets – few
observations (Langella 2010)
• With other approaches e.g.
feature selection – PCA (Lin and
Zhu 2007)
Citation results for publications on bagging for time series
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Research gap:
In contrast to bootstrapping, cross-validation has not been used for forecasts
aggregation
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Research contribution:
We propose to combine the benefits of cross-validation and forecast
aggregation – Crogging
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Research gap:
In contrast to bootstrapping, cross-validation has not been used for forecasts
aggregation
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
NN
yyyS ,x,...,,x,,x 2211
K
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)Sk
S
NN
yyyS ,x,...,,x,,x 2211
K
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }xˆ k
m iik
ym xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }xˆ k
m iik
ym xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }xˆ k
m iik
ym xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }
• Combine model to obtain:
xˆ k
m iik
ym xˆ
K
k
k
m
K
M
1
xˆ
1
xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
1.
2. Cross-validation and model selection
3.
4.
5.
Outline
Cross validation aggregation for forecasting Cross-validation 4
• Cross validation is a widely used strategy:
– Estimating the predictive accuracy of a model
– Performing model selection e.g.:
• Choosing among variables in a regression or the degrees of
freedom of a nonparametric model (selection for identification)
• Parameter estimation and tuning (selection for estimation)
Cross validation aggregation for forecasting Cross-validation 5
Cross-validation: Background
• Main features:
– Main idea: test the model on data not used in estimation
– Split data once or several times
– Part of data is used for training each model (the training
sample), and the remaining part is used for estimating the
prediction error of the model (the validation sample)
Cross validation aggregation for forecasting Cross-validation 5
Cross-validation: Background
• K-fold cross-validation:
Cross-validation: How it works?
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
K samples (one or more observations)
Cross-validation: How it works?
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
…
K
t
i
m
e
s
K samples (one or more observations)
Cross-validation: How it works?
• k-fold cross-validation
– Divides the data into k none-overlapping and mutually
exclusive sub-samples of approximately equal size.
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
• k-fold cross-validation
– Divides the data into k none-overlapping and mutually
exclusive sub-samples of approximately equal size.
– If k=2, 2-Fold cross validation
– If k=10, 10-Fold cross validation
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
• If k=N, Leave-one-out cross-validation (LOOCV)
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
• Monte-carlo cross-validation
– Randomly split the data into two sub-samples (training and
validation) multiple times, each time randomly drawing
without replacement
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
• Hold-out method
– A single split into two data sub-samples
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
• Goal: select a model having the smallest generalisation
error
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
• Goal: select a model having the smallest generalisation
error
• Compute an approximation of the generalisation error
defined as follows: N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
• Estimate model m on the training set, and calculate the
error on the validation set for sample k is:
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
• Estimate the generalisation error after K repetitions as the
average error across all repetitions:
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
K
mE
mE
K
k
k
gen
1ˆ
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
K
mE
mE
K
k
k
gen
1ˆ
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
Select the model with the smallest generalisation error
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
K
mE
mE
K
k
k
gen
1ˆ
What about the K models estimated on the different data sets?
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
Select the model with the smallest generalisation error
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
K
mE
mE
K
k
k
gen
1ˆ
What about the K models estimated on the different data sets?
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
Select the model with the smallest generalisation error
1.
2.
3. Cross-validation aggregation
4.
5.
Outline
Cross validation aggregation for forecasting Cross-validation aggregation 9
• In model selection, the model obtained is the one built on all the
data (no data reserved for validation)
– However predictive accuracy is adjudged on models built on different
parts of the data
– These supplementary models are thrown away after they have served
their purpose
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
• The proposed approach:
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
• The proposed approach:
– We save the predictions made by the K estimated models
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
• The proposed approach:
– This gives us a prediction for every observation in the training sample
derived from a model that was built when that observation was in the
validation sample
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
• The proposed approach:
– We then average across the predictions from the K models to produce
a final prediction.
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
• The proposed approach:
– In the case of neural networks, we also use the validation samples for
early stop training
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
• The proposed approach:
– In the case of neural networks, we also use the validation samples for
early stop training
– We average across multiple initialisations together with cross
validation aggregation (to reduce variance)
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
1.
2.
3.
4. Empirical evaluation
5.
Outline
Cross validation aggregation for forecasting Empirical evaluation 11
Complete Dataset
Reduced Dataset
Short Long Normal Difficult SUM
Non-Seasonal
25
(NS)
25
(NL)
4
(NN)
3
(ND)
57
Seasonal
25
(SS)
25
(SL)
4
(SN)
- 54
SUM 50 50 8 3 111
Summary description of NN3 competition time series dataset
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
• Time series data
• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon,
and Nikolopoulos 2011)
20 40 60 80 100 120 140
4000
5000
6000
NN3_101
20 40 60 80 100 120 140
0
5000
10000
NN3_102
20 40 60 80 100 120 140
0
5
10
x 10
4
NN3_103
20 40 60 80 100 120
0
5000
10000
NN3_104
20 40 60 80 100 120 140
2000
4000
6000
NN3_105
20 40 60 80 100 120 140
0
5000
10000
NN3_106
4000
5000
NN3_107
5000
10000
NN3_108Plot of 10 time series from the NN3 dataset
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
• Time series data
• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon,
and Nikolopoulos 2011)
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
•
• The following experimental setup is used:
– Forecast horizon: 12 months
– Holdout period: 18 months
– Error Measures: SMAPE and MASE.
– Rolling origin evaluation (Tashman,2000).
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
•
• Neural network specification:
– A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags.
– Each MLP network contains a single hidden layer; two hidden nodes; and a single
output node with a linear identity function. The hyperbolic tangent transfer
function is used.
• Across all time series
– On validation set Monte carlo cross-validation is always best
– All Crogging variants outperform the benchmark Bagging algorithm
and hold-out method (NN model averaging)
Method Train Validation Test
BESTMLP 1.25 0.96 1.49
HOLDOUT 0.64 0.75 1.20
BAG 0.76 0.70 1.21
MONTECV 0.76 0.41 1.16
10FOLDCV 0.69 0.45 1.07
2FOLDCV 0.73 0.60 1.15
Method Train Validation Test
BESTMLP 12.36 11.10 17.89
HOLDOUT 11.78 12.57 16.08
BAG 12.95 13.17 16.32
MONTECV 13.81 8.29 15.35
10FOLDCV 12.65 8.94 15.52
2FOLDCV 13.68 11.19 15.29
MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 13
MASE SMAPE
Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference
represents the median value of the distributions.
• Across all time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 13
Length Method
Forecast Horizon
1-3 4-12 13-18 1-18
Long BESTMLP 10.79 16.59 20.02 16.77
HOLDOUT 9.34 14.96 16.20 14.43
BAG 9.74 15.46 16.38 14.81
MONTECV 10.86 15.16 15.43 14.54
10FOLDCV 10.39 14.04 14.82 13.69
2FOLDCV 9.03 14.64 15.69 14.06
SMAPE on test set averaged over long time series for short, medium and long forecast horizon
• Data conditions:
– Long time series: 10-fold cross-validation has the smallest error for
medium to long horizons, and over forecast lead times 1-18
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
Length Method
Forecast Horizon
1-3 4-12 13-18 1-18
Short BESTMLP 16.83 17.03 20.66 18.20
HOLDOUT 17.59 17.04 20.12 18.16
BAG 17.20 17.27 20.96 18.49
MONTECV 15.47 14.71 19.05 16.28
10FOLDCV 16.00 15.91 20.25 17.37
2FOLDCV 15.86 14.51 18.95 16.21
SMAPE on test set averaged over short time series for short, medium and long forecast horizon
• Data conditions:
– Short time series: 2-fold cross validation and Monte-carlo cross-
validation outperform 10-fold cross-validation for all forecast horizons
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
• Data conditions:
Boxplots of the SMAPE averaged across long (left) and short (right) time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
Average errors Ranking all methods Ranking NN/CI
SMAPE MASE SMAPE MASE SMAPE MASE
B09 Wildi 14.84 1.13 1 2 − −
B07 Theta 14.89 1.13 2 2 − −
C27 Illies 15.18 1.25 3 9 1 7
** 2FOLDCV 15.29 1.15 4 3 2 2
** MONTECV 15.35 1.16 5 4 3 3
B03 ForecastPro 15.44 1.17 6 5 − −
… … … … … … … …
** BAG 16.32 1.21 13 8 7 5
… … … … … … … …
B00 AutomatANN 16.81 1.21 14 8 8 5
** MLP 17.89 1.50 15 10 9 6
• NN3 Competition:
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 15
Average errors Ranking all methods Ranking NN/CI
SMAPE MASE SMAPE MASE SMAPE MASE
B09 Wildi 14.84 1.13 1 2 − −
B07 Theta 14.89 1.13 2 2 − −
C27 Illies 15.18 1.25 3 9 1 7
** 2FOLDCV 15.29 1.15 4 3 2 2
** MONTECV 15.35 1.16 5 4 3 3
B03 ForecastPro 15.44 1.17 6 5 − −
… … … … … … … …
** BAG 16.32 1.21 13 8 7 5
… … … … … … … …
B00 AutomatANN 16.81 1.21 14 8 8 5
** MLP 17.89 1.50 15 10 9 6
• NN3 Competition:
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 15
Average errors Ranking all methods Ranking NN/CI
SMAPE MASE SMAPE MASE SMAPE MASE
B09 Wildi 14.84 1.13 1 2 − −
B07 Theta 14.89 1.13 2 2 − −
C27 Illies 15.18 1.25 3 9 1 7
** 2FOLDCV 15.29 1.15 4 3 2 2
** MONTECV 15.35 1.16 5 4 3 3
B03 ForecastPro 15.44 1.17 6 5 − −
… … … … … … … …
** BAG 16.32 1.21 13 8 7 5
… … … … … … … …
B00 AutomatANN 16.81 1.21 14 8 8 5
** MLP 17.89 1.50 15 10 9 6
• NN3 Competition:
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 15
Average errors Ranking all methods Ranking NN/CI
SMAPE MASE SMAPE MASE SMAPE MASE
B09 Wildi 14.84 1.13 1 2 − −
B07 Theta 14.89 1.13 2 2 − −
C27 Illies 15.18 1.25 3 9 1 7
** 2FOLDCV 15.29 1.15 4 3 2 2
** MONTECV 15.35 1.16 5 4 3 3
B03 ForecastPro 15.44 1.17 6 5 − −
… … … … … … … …
** BAG 16.32 1.21 13 8 7 5
… … … … … … … …
B00 AutomatANN 16.81 1.21 14 8 8 5
** MLP 17.89 1.50 15 10 9 6
• NN3 Competition:
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 15
1.
2.
3.
4.
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Conclusions and future work 16
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
Not a Forecasting Method!
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
A general method for
improving the accuracy of a
forecast model
• Conclusion
– Cross-validation aggregation outperforms model selection, Bagging
and the current approaches to model averaging which uses a single
hold-out (validation sample)
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
• Conclusion
– It is especially effective when the amount of data available for training
the model is limited as shown for short time series
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
• Conclusion
– Improvements in forecast accuracy increase with forecast horizons
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
• Conclusion
– It offers promising results on the NN3 competition
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
• Future work
– Perform bias-variance decomposition and analysis
– Consider other base model types other than neural networks
– Evaluate forecast accuracy for a larger set of time series - M3
Competition Data (3003 time series, established benchmark)
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
Devon K. Barrow
Lancaster University Management School
Centre for Forecasting
Lancaster, LA1 4YX, UK
Tel.: +44 (0) 7960271368
Email: d.barrow@lancaster.ac.uk

More Related Content

Similar to Euro 2013 barrow crone - slideshare

Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
BRM Unit 2 Sampling.ppt
BRM Unit 2 Sampling.pptBRM Unit 2 Sampling.ppt
BRM Unit 2 Sampling.pptVikasRai405977
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxRajJirel
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Modeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selectionModeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selectionkknsastry
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Chapter 06
Chapter 06Chapter 06
Chapter 06bmcfad01
 
Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...Annibale Panichella
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Download It
Download ItDownload It
Download Itbutest
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
Ancestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 posterAncestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 posterSara Magliacane
 

Similar to Euro 2013 barrow crone - slideshare (20)

Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
LR 9 Estimation.pdf
LR 9 Estimation.pdfLR 9 Estimation.pdf
LR 9 Estimation.pdf
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
BRM Unit 2 Sampling.ppt
BRM Unit 2 Sampling.pptBRM Unit 2 Sampling.ppt
BRM Unit 2 Sampling.ppt
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptx
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Ensemblelearning 181220105413
Ensemblelearning 181220105413Ensemblelearning 181220105413
Ensemblelearning 181220105413
 
Modeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selectionModeling selection pressure in XCS for proportionate and tournament selection
Modeling selection pressure in XCS for proportionate and tournament selection
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Chapter 06
Chapter 06Chapter 06
Chapter 06
 
Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Download It
Download ItDownload It
Download It
 
导论1
导论1导论1
导论1
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Ancestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 posterAncestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 poster
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxEasyPrinterHelp
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 

Recently uploaded (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 

Euro 2013 barrow crone - slideshare

  • 2. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Motivation 1
  • 3. • Scenario: – The statistician constructs a model and wishes to estimate the error rate of this model when used to predict future values Motivation Cross validation aggregation for forecasting Motivation 2
  • 4. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2
  • 5. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)
  • 6. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)
  • 7. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) 1996 - Breiman introduces bootstrapping and aggregation
  • 8. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) Forecast aggregation Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples
  • 9. (a) Published items in each year (b) Citations in Each Year Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) Forecast aggregation Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples Bagging for time series forecasting: • Forecasting with many predictors (Watson 2005) • Macro-economic time series e.g. consumer price inflation (Inoue & Kilian 2008) • Volatility prediction (Hillebrand & M. C. Medeiros 2010) • Small datasets – few observations (Langella 2010) • With other approaches e.g. feature selection – PCA (Lin and Zhu 2007) Citation results for publications on bagging for time series
  • 10. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) Forecast aggregation Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples Research gap: In contrast to bootstrapping, cross-validation has not been used for forecasts aggregation
  • 11. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) Research contribution: We propose to combine the benefits of cross-validation and forecast aggregation – Crogging Forecast aggregation Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples Research gap: In contrast to bootstrapping, cross-validation has not been used for forecasts aggregation
  • 12. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = NN yyyS ,x,...,,x,,x 2211 K
  • 13. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method)Sk S NN yyyS ,x,...,,x,,x 2211 K
  • 14. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method) – Using training set estimate a model such that }xˆ k m iik ym xˆ Sk S k S NN yyyS ,x,...,,x,,x 2211 K
  • 15. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method) – Using training set estimate a model such that }xˆ k m iik ym xˆ Sk S k S NN yyyS ,x,...,,x,,x 2211 K
  • 16. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method) – Using training set estimate a model such that }xˆ k m iik ym xˆ Sk S k S NN yyyS ,x,...,,x,,x 2211 K
  • 17. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method) – Using training set estimate a model such that } • Combine model to obtain: xˆ k m iik ym xˆ K k k m K M 1 xˆ 1 xˆ Sk S k S NN yyyS ,x,...,,x,,x 2211 K
  • 18. 1. 2. Cross-validation and model selection 3. 4. 5. Outline Cross validation aggregation for forecasting Cross-validation 4
  • 19. • Cross validation is a widely used strategy: – Estimating the predictive accuracy of a model – Performing model selection e.g.: • Choosing among variables in a regression or the degrees of freedom of a nonparametric model (selection for identification) • Parameter estimation and tuning (selection for estimation) Cross validation aggregation for forecasting Cross-validation 5 Cross-validation: Background
  • 20. • Main features: – Main idea: test the model on data not used in estimation – Split data once or several times – Part of data is used for training each model (the training sample), and the remaining part is used for estimating the prediction error of the model (the validation sample) Cross validation aggregation for forecasting Cross-validation 5 Cross-validation: Background
  • 22. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K K samples (one or more observations) Cross-validation: How it works?
  • 23. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation K samples (one or more observations) Cross-validation: How it works?
  • 24. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation K samples (one or more observations) Cross-validation: How it works?
  • 25. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation K samples (one or more observations) Cross-validation: How it works?
  • 26. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation K samples (one or more observations) Cross-validation: How it works?
  • 27. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation … K t i m e s K samples (one or more observations) Cross-validation: How it works?
  • 28. • k-fold cross-validation – Divides the data into k none-overlapping and mutually exclusive sub-samples of approximately equal size. Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
  • 29. • k-fold cross-validation – Divides the data into k none-overlapping and mutually exclusive sub-samples of approximately equal size. – If k=2, 2-Fold cross validation – If k=10, 10-Fold cross validation Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
  • 30. • If k=N, Leave-one-out cross-validation (LOOCV) Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
  • 31. • Monte-carlo cross-validation – Randomly split the data into two sub-samples (training and validation) multiple times, each time randomly drawing without replacement Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
  • 32. • Hold-out method – A single split into two data sub-samples Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
  • 33. • Goal: select a model having the smallest generalisation error Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8
  • 34. • Goal: select a model having the smallest generalisation error • Compute an approximation of the generalisation error defined as follows: N i ii N gen N my mE 1 2 xˆ lim Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8
  • 35. • Estimate model m on the training set, and calculate the error on the validation set for sample k is: N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8
  • 36. • Estimate the generalisation error after K repetitions as the average error across all repetitions: N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ K mE mE K k k gen 1ˆ Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8
  • 37. N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ K mE mE K k k gen 1ˆ Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8 Select the model with the smallest generalisation error
  • 38. N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ K mE mE K k k gen 1ˆ What about the K models estimated on the different data sets? Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8 Select the model with the smallest generalisation error
  • 39. N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ K mE mE K k k gen 1ˆ What about the K models estimated on the different data sets? Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8 Select the model with the smallest generalisation error
  • 40. 1. 2. 3. Cross-validation aggregation 4. 5. Outline Cross validation aggregation for forecasting Cross-validation aggregation 9
  • 41. • In model selection, the model obtained is the one built on all the data (no data reserved for validation) – However predictive accuracy is adjudged on models built on different parts of the data – These supplementary models are thrown away after they have served their purpose Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 42. • The proposed approach: Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 43. • The proposed approach: – We save the predictions made by the K estimated models Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 44. • The proposed approach: – This gives us a prediction for every observation in the training sample derived from a model that was built when that observation was in the validation sample Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 45. • The proposed approach: – We then average across the predictions from the K models to produce a final prediction. K k tkt m K M 1 xˆ 1 xˆ Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 46. • The proposed approach: – In the case of neural networks, we also use the validation samples for early stop training K k tkt m K M 1 xˆ 1 xˆ Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 47. • The proposed approach: – In the case of neural networks, we also use the validation samples for early stop training – We average across multiple initialisations together with cross validation aggregation (to reduce variance) K k tkt m K M 1 xˆ 1 xˆ Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 48. 1. 2. 3. 4. Empirical evaluation 5. Outline Cross validation aggregation for forecasting Empirical evaluation 11
  • 49. Complete Dataset Reduced Dataset Short Long Normal Difficult SUM Non-Seasonal 25 (NS) 25 (NL) 4 (NN) 3 (ND) 57 Seasonal 25 (SS) 25 (SL) 4 (SN) - 54 SUM 50 50 8 3 111 Summary description of NN3 competition time series dataset Evaluation: Design and implementation Cross validation aggregation for forecasting Empirical evaluation 12 • Time series data • NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and Nikolopoulos 2011)
  • 50. 20 40 60 80 100 120 140 4000 5000 6000 NN3_101 20 40 60 80 100 120 140 0 5000 10000 NN3_102 20 40 60 80 100 120 140 0 5 10 x 10 4 NN3_103 20 40 60 80 100 120 0 5000 10000 NN3_104 20 40 60 80 100 120 140 2000 4000 6000 NN3_105 20 40 60 80 100 120 140 0 5000 10000 NN3_106 4000 5000 NN3_107 5000 10000 NN3_108Plot of 10 time series from the NN3 dataset Evaluation: Design and implementation Cross validation aggregation for forecasting Empirical evaluation 12 • Time series data • NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and Nikolopoulos 2011)
  • 51. Evaluation: Design and implementation Cross validation aggregation for forecasting Empirical evaluation 12 • • The following experimental setup is used: – Forecast horizon: 12 months – Holdout period: 18 months – Error Measures: SMAPE and MASE. – Rolling origin evaluation (Tashman,2000).
  • 52. Evaluation: Design and implementation Cross validation aggregation for forecasting Empirical evaluation 12 • • Neural network specification: – A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags. – Each MLP network contains a single hidden layer; two hidden nodes; and a single output node with a linear identity function. The hyperbolic tangent transfer function is used.
  • 53. • Across all time series – On validation set Monte carlo cross-validation is always best – All Crogging variants outperform the benchmark Bagging algorithm and hold-out method (NN model averaging) Method Train Validation Test BESTMLP 1.25 0.96 1.49 HOLDOUT 0.64 0.75 1.20 BAG 0.76 0.70 1.21 MONTECV 0.76 0.41 1.16 10FOLDCV 0.69 0.45 1.07 2FOLDCV 0.73 0.60 1.15 Method Train Validation Test BESTMLP 12.36 11.10 17.89 HOLDOUT 11.78 12.57 16.08 BAG 12.95 13.17 16.32 MONTECV 13.81 8.29 15.35 10FOLDCV 12.65 8.94 15.52 2FOLDCV 13.68 11.19 15.29 MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 13 MASE SMAPE
  • 54. Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference represents the median value of the distributions. • Across all time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 13
  • 55. Length Method Forecast Horizon 1-3 4-12 13-18 1-18 Long BESTMLP 10.79 16.59 20.02 16.77 HOLDOUT 9.34 14.96 16.20 14.43 BAG 9.74 15.46 16.38 14.81 MONTECV 10.86 15.16 15.43 14.54 10FOLDCV 10.39 14.04 14.82 13.69 2FOLDCV 9.03 14.64 15.69 14.06 SMAPE on test set averaged over long time series for short, medium and long forecast horizon • Data conditions: – Long time series: 10-fold cross-validation has the smallest error for medium to long horizons, and over forecast lead times 1-18 Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 14
  • 56. Length Method Forecast Horizon 1-3 4-12 13-18 1-18 Short BESTMLP 16.83 17.03 20.66 18.20 HOLDOUT 17.59 17.04 20.12 18.16 BAG 17.20 17.27 20.96 18.49 MONTECV 15.47 14.71 19.05 16.28 10FOLDCV 16.00 15.91 20.25 17.37 2FOLDCV 15.86 14.51 18.95 16.21 SMAPE on test set averaged over short time series for short, medium and long forecast horizon • Data conditions: – Short time series: 2-fold cross validation and Monte-carlo cross- validation outperform 10-fold cross-validation for all forecast horizons Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 14
  • 57. • Data conditions: Boxplots of the SMAPE averaged across long (left) and short (right) time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 14
  • 58. Average errors Ranking all methods Ranking NN/CI SMAPE MASE SMAPE MASE SMAPE MASE B09 Wildi 14.84 1.13 1 2 − − B07 Theta 14.89 1.13 2 2 − − C27 Illies 15.18 1.25 3 9 1 7 ** 2FOLDCV 15.29 1.15 4 3 2 2 ** MONTECV 15.35 1.16 5 4 3 3 B03 ForecastPro 15.44 1.17 6 5 − − … … … … … … … … ** BAG 16.32 1.21 13 8 7 5 … … … … … … … … B00 AutomatANN 16.81 1.21 14 8 8 5 ** MLP 17.89 1.50 15 10 9 6 • NN3 Competition: Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 15
  • 59. Average errors Ranking all methods Ranking NN/CI SMAPE MASE SMAPE MASE SMAPE MASE B09 Wildi 14.84 1.13 1 2 − − B07 Theta 14.89 1.13 2 2 − − C27 Illies 15.18 1.25 3 9 1 7 ** 2FOLDCV 15.29 1.15 4 3 2 2 ** MONTECV 15.35 1.16 5 4 3 3 B03 ForecastPro 15.44 1.17 6 5 − − … … … … … … … … ** BAG 16.32 1.21 13 8 7 5 … … … … … … … … B00 AutomatANN 16.81 1.21 14 8 8 5 ** MLP 17.89 1.50 15 10 9 6 • NN3 Competition: Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 15
  • 60. Average errors Ranking all methods Ranking NN/CI SMAPE MASE SMAPE MASE SMAPE MASE B09 Wildi 14.84 1.13 1 2 − − B07 Theta 14.89 1.13 2 2 − − C27 Illies 15.18 1.25 3 9 1 7 ** 2FOLDCV 15.29 1.15 4 3 2 2 ** MONTECV 15.35 1.16 5 4 3 3 B03 ForecastPro 15.44 1.17 6 5 − − … … … … … … … … ** BAG 16.32 1.21 13 8 7 5 … … … … … … … … B00 AutomatANN 16.81 1.21 14 8 8 5 ** MLP 17.89 1.50 15 10 9 6 • NN3 Competition: Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 15
  • 61. Average errors Ranking all methods Ranking NN/CI SMAPE MASE SMAPE MASE SMAPE MASE B09 Wildi 14.84 1.13 1 2 − − B07 Theta 14.89 1.13 2 2 − − C27 Illies 15.18 1.25 3 9 1 7 ** 2FOLDCV 15.29 1.15 4 3 2 2 ** MONTECV 15.35 1.16 5 4 3 3 B03 ForecastPro 15.44 1.17 6 5 − − … … … … … … … … ** BAG 16.32 1.21 13 8 7 5 … … … … … … … … B00 AutomatANN 16.81 1.21 14 8 8 5 ** MLP 17.89 1.50 15 10 9 6 • NN3 Competition: Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 15
  • 62. 1. 2. 3. 4. 5. Conclusions and future work Outline Cross validation aggregation for forecasting Conclusions and future work 16
  • 63. Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 64. Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 65. Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work Not a Forecasting Method!
  • 66. Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work A general method for improving the accuracy of a forecast model
  • 67. • Conclusion – Cross-validation aggregation outperforms model selection, Bagging and the current approaches to model averaging which uses a single hold-out (validation sample) Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 68. • Conclusion – It is especially effective when the amount of data available for training the model is limited as shown for short time series Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 69. • Conclusion – Improvements in forecast accuracy increase with forecast horizons Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 70. • Conclusion – It offers promising results on the NN3 competition Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 71. • Future work – Perform bias-variance decomposition and analysis – Consider other base model types other than neural networks – Evaluate forecast accuracy for a larger set of time series - M3 Competition Data (3003 time series, established benchmark) Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work
  • 72. Devon K. Barrow Lancaster University Management School Centre for Forecasting Lancaster, LA1 4YX, UK Tel.: +44 (0) 7960271368 Email: d.barrow@lancaster.ac.uk