SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Cross-validation aggregation for
forecasting
www.lancs.ac.uk
Devon K. Barrow
Sven F. Crone
1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Motivation 1
• Scenario:
– The statistician constructs a model and wishes to estimate the error
rate of this model when used to predict future values
(a) Published items in each year (b) Citations in Each Year
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Research contribution:
We propose to combine the benefits of cross-validation and forecast
aggregation – Crogging
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Bagging for time series
forecasting:
• Forecasting with many
predictors (Watson 2005)
• Macro-economic time series
e.g. consumer price inflation
(Inoue & Kilian 2008)
• Volatility prediction (Hillebrand &
M. C. Medeiros 2010)
• Small datasets – few
observations (Langella 2010)
• With other approaches e.g.
feature selection – PCA (Lin and
Zhu 2007)
Research gap:
In contrast to bootstrapping, cross-validation has not been used for forecasts
aggregation
Citation results for publications on bagging for time series
1996 - Breiman introduces bootstrapping and aggregation
Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }
• Combine model to obtain:
xˆ k
m iik
ym xˆ
K
k
k
m
K
M
1
xˆ
1
xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Cross-validation 4
• Cross validation is a widely used strategy:
– Estimating the predictive accuracy of a model
– Performing model selection e.g.:
• Choosing among variables in a regression or the degrees of
freedom of a nonparametric model (selection for identification)
• Parameter estimation and tuning (selection for estimation)
• Main features:
– Main idea: test the model on data not used in estimation
– Split data once or several times
– Part of data is used for training each model (the training
sample), and the remaining part is used for estimating the
prediction error of the model (the validation sample)
Cross validation aggregation for forecasting Cross-validation 5
Cross-validation: Background
• K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
…
K
t
i
m
e
s
K samples (one or more observations)
Cross-validation: How it works?
• k-fold cross-validation
– Divides the data into k none-overlapping and mutually
exclusive sub-samples of approximately equal size.
– If k=2, 2-Fold cross validation
– If k=10, 10-Fold cross validation
• If k=N, Leave-one-out cross-validation (LOOCV)
• Monte-carlo cross-validation
– Randomly split the data into two sub-samples (training and
validation) multiple times, each time randomly drawing
without replacement
• Hold-out method
– A single split into two data sub-samples
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
• Goal: select a model having the smallest generalisation
error
• Compute an approximation of the generalisation error
defined as follows:
• Estimate model m on the training set, and calculate the
error on the validation set for sample k is:
• Estimate the generalisation error after K repetitions as the
average error across all repetitions:
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
K
mE
mE
K
k
k
gen
1ˆ
What about the K models estimated on the different data sets?
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
Select the model with the smallest generalisation error
1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Cross-validation aggregation 9
• In model selection, the model obtained is the one built on all the
data (no data reserved for validation)
– However predictive accuracy is adjudged on models built on different
parts of the data
– These supplementary models are thrown away after they have served
their purpose
• The proposed approach:
– We save the predictions made by the K estimated models
– This gives us a prediction for every observation in the training sample
derived from a model that was built when that observation was in the
validation sample
– We then average across the predictions from the K models to produce
a final prediction.
– In the case of neural networks, we also use the validation samples for
early stop training
– We average across multiple initialisations together with cross
validation aggregation (to reduce variance)
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Empirical evaluation 11
Complete Dataset
Reduced Dataset
Short Long Normal Difficult SUM
Non-Seasonal
25
(NS)
25
(NL)
4
(NN)
3
(ND)
57
Seasonal
25
(SS)
25
(SL)
4
(SN)
- 54
SUM 50 50 8 3 111
20 40 60 80 100 120 140
4000
5000
6000
NN3_101
20 40 60 80 100 120 140
0
5000
10000
NN3_102
20 40 60 80 100 120 140
0
5
10
x 10
4
NN3_103
20 40 60 80 100 120
0
5000
10000
NN3_104
20 40 60 80 100 120 140
2000
4000
6000
NN3_105
20 40 60 80 100 120 140
0
5000
10000
NN3_106
4000
5000
NN3_107
5000
10000
NN3_108
Summary description of NN3 competition time series dataset
Plot of 10 time series from the NN3 dataset
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
• Time series data
• NN3 dataset: 111 time series from the NN3 competition
(Crone, Hibon, and Nikolopoulos 2011)
• The following experimental setup is used:
– Forecast horizon: 12 months
– Holdout period: 18 months
– Error Measures: SMAPE and MASE.
– Rolling origin evaluation (Tashman,2000).
• Neural network specification:
– A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags.
– Each MLP network contains a single hidden layer; two hidden nodes; and a single
output node with a linear identity function. The hyperbolic tangent transfer
function is used.
Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference
represents the median value of the distributions.
• Across all time series
– On validation set Monte carlo cross-validation is always best
– All Crogging variants outperform the benchmark Bagging algorithm
and hold-out method (NN model averaging)
Method Train Validation Test
BESTMLP 1.25 0.96 1.49
HOLDOUT 0.64 0.75 1.20
BAG 0.76 0.70 1.21
MONTECV 0.76 0.41 1.16
10FOLDCV 0.69 0.45 1.07
2FOLDCV 0.73 0.60 1.15
Method Train Validation Test
BESTMLP 12.36 11.10 17.89
HOLDOUT 11.78 12.57 16.08
BAG 12.95 13.17 16.32
MONTECV 13.81 8.29 15.35
10FOLDCV 12.65 8.94 15.52
2FOLDCV 13.68 11.19 15.29
MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 13
MASE SMAPE
Length Method
Forecast Horizon
1-3 4-12 13-18 1-18
Long BESTMLP 10.79 16.59 20.02 16.77
HOLDOUT 9.34 14.96 16.20 14.43
BAG 9.74 15.46 16.38 14.81
MONTECV 10.86 15.16 15.43 14.54
10FOLDCV 10.39 14.04 14.82 13.69
2FOLDCV 9.03 14.64 15.69 14.06
Length Method
Forecast Horizon
1-3 4-12 13-18 1-18
Short BESTMLP 16.83 17.03 20.66 18.20
HOLDOUT 17.59 17.04 20.12 18.16
BAG 17.20 17.27 20.96 18.49
MONTECV 15.47 14.71 19.05 16.28
10FOLDCV 16.00 15.91 20.25 17.37
2FOLDCV 15.86 14.51 18.95 16.21
SMAPE on test set averaged over long time series for short, medium and long forecast horizonSMAPE on test set averaged over short time series for short, medium and long forecast horizon
• Data conditions:
– Long time series: 10-fold cross-validation has the smallest error for
medium to long horizons, and over forecast lead times 1-18
– Short time series: 2-fold cross validation and Monte-carlo cross-
validation outperform 10-fold cross-validation for all forecast horizons
Boxplots of the SMAPE averaged across long (left) and short (right) time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
Average errors Ranking all methods Ranking NN/CI
SMAPE MASE SMAPE MASE SMAPE MASE
B09 Wildi 14.84 1.13 1 2 − −
B07 Theta 14.89 1.13 2 2 − −
C27 Illies 15.18 1.25 3 9 1 7
** 2FOLDCV 15.29 1.15 4 3 2 2
** MONTECV 15.35 1.16 5 4 3 3
B03 ForecastPro 15.44 1.17 6 5 − −
… … … … … … … …
** BAG 16.32 1.21 13 8 7 5
… … … … … … … …
B00 AutomatANN 16.81 1.21 14 8 8 5
** MLP 17.89 1.50 15 10 9 6
• NN3 Competition:
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 15
1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Conclusions and future work 16
• Conclusion
– Cross-validation aggregation outperforms model selection, Bagging
and the current approaches to model averaging which uses a single
hold-out (validation sample)
– It is especially effective when the amount of data available for training
the model is limited as shown for short time series
– Improvements in forecast accuracy increase with forecast horizons
– It offers promising results on the NN3 competition
• Future work
– Perform bias-variance decomposition and analysis
– Consider other base model types other than neural networks
– Evaluate forecast accuracy for a larger set of time series - M3
Competition Data (3003 time series, established benchmark)
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
Not a Forecasting Method!
A general method for
improving the accuracy of a
forecast model
Devon K. Barrow
Lancaster University Management School
Centre for Forecasting
Lancaster, LA1 4YX, UK
Tel.: +44 (0) 7960271368
Email: d.barrow@lancaster.ac.uk

Weitere ähnliche Inhalte

Was ist angesagt?

Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introductionDaeJin Kim
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selectionAndrea Dal Pozzolo
 
Progressive identification of true labels for partial label learning
Progressive identification of true labels for partial label learningProgressive identification of true labels for partial label learning
Progressive identification of true labels for partial label learningtaeseon ryu
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliMDO_Lab
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliMDO_Lab
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Cross validation
Cross validationCross validation
Cross validationRidhaAfrawe
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop QuantUniversity
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...gregoryg
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAminaRepo
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selectionchenhm
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selectionMarco Meoni
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning termS Rulez
 

Was ist angesagt? (20)

Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
Progressive identification of true labels for partial label learning
Progressive identification of true labels for partial label learningProgressive identification of true labels for partial label learning
Progressive identification of true labels for partial label learning
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_Ali
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_Ali
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Cross validation
Cross validationCross validation
Cross validation
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selection
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
 

Ähnlich wie Cross-validation aggregation for forecasting

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_JunMDO_Lab
 
AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012OptiModel
 
Quantitative Forecasting Techniques in SCM
Quantitative Forecasting Techniques in SCMQuantitative Forecasting Techniques in SCM
Quantitative Forecasting Techniques in SCMYountek1
 
Classification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMGClassification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMGPriyanka Reddy
 
Grouped time-series forecasting: Application to regional infant mortality counts
Grouped time-series forecasting: Application to regional infant mortality countsGrouped time-series forecasting: Application to regional infant mortality counts
Grouped time-series forecasting: Application to regional infant mortality countshanshang
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
 
Euro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshareEuro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshareDevon Barrow
 
Uber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectUber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectKushal417
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_JieMDO_Lab
 
ThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersollThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersollKyle Ingersoll
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfasdfasdf214078
 
HiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentationHiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentationVEDLIoT Project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 

Ähnlich wie Cross-validation aggregation for forecasting (20)

Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_Jun
 
AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012
 
Lower back pain Regression models
Lower back pain Regression modelsLower back pain Regression models
Lower back pain Regression models
 
Quantitative Forecasting Techniques in SCM
Quantitative Forecasting Techniques in SCMQuantitative Forecasting Techniques in SCM
Quantitative Forecasting Techniques in SCM
 
Monte Carlo and Markov Chain
Monte Carlo and Markov ChainMonte Carlo and Markov Chain
Monte Carlo and Markov Chain
 
Classification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMGClassification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMG
 
Grouped time-series forecasting: Application to regional infant mortality counts
Grouped time-series forecasting: Application to regional infant mortality countsGrouped time-series forecasting: Application to regional infant mortality counts
Grouped time-series forecasting: Application to regional infant mortality counts
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
 
Euro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshareEuro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshare
 
Uber Data Analysis - SAS Project
Uber Data Analysis - SAS ProjectUber Data Analysis - SAS Project
Uber Data Analysis - SAS Project
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
Model selection
Model selectionModel selection
Model selection
 
ThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersollThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersoll
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
HiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentationHiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentation
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 

Kürzlich hochgeladen

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Kürzlich hochgeladen (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Cross-validation aggregation for forecasting

  • 2. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Motivation 1
  • 3. • Scenario: – The statistician constructs a model and wishes to estimate the error rate of this model when used to predict future values (a) Published items in each year (b) Citations in Each Year Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) Research contribution: We propose to combine the benefits of cross-validation and forecast aggregation – Crogging Forecast aggregation Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples Bagging for time series forecasting: • Forecasting with many predictors (Watson 2005) • Macro-economic time series e.g. consumer price inflation (Inoue & Kilian 2008) • Volatility prediction (Hillebrand & M. C. Medeiros 2010) • Small datasets – few observations (Langella 2010) • With other approaches e.g. feature selection – PCA (Lin and Zhu 2007) Research gap: In contrast to bootstrapping, cross-validation has not been used for forecasts aggregation Citation results for publications on bagging for time series 1996 - Breiman introduces bootstrapping and aggregation
  • 4. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method) – Using training set estimate a model such that } • Combine model to obtain: xˆ k m iik ym xˆ K k k m K M 1 xˆ 1 xˆ Sk S k S NN yyyS ,x,...,,x,,x 2211 K
  • 5. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Cross-validation 4
  • 6. • Cross validation is a widely used strategy: – Estimating the predictive accuracy of a model – Performing model selection e.g.: • Choosing among variables in a regression or the degrees of freedom of a nonparametric model (selection for identification) • Parameter estimation and tuning (selection for estimation) • Main features: – Main idea: test the model on data not used in estimation – Split data once or several times – Part of data is used for training each model (the training sample), and the remaining part is used for estimating the prediction error of the model (the validation sample) Cross validation aggregation for forecasting Cross-validation 5 Cross-validation: Background
  • 7. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation … K t i m e s K samples (one or more observations) Cross-validation: How it works?
  • 8. • k-fold cross-validation – Divides the data into k none-overlapping and mutually exclusive sub-samples of approximately equal size. – If k=2, 2-Fold cross validation – If k=10, 10-Fold cross validation • If k=N, Leave-one-out cross-validation (LOOCV) • Monte-carlo cross-validation – Randomly split the data into two sub-samples (training and validation) multiple times, each time randomly drawing without replacement • Hold-out method – A single split into two data sub-samples Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
  • 9. • Goal: select a model having the smallest generalisation error • Compute an approximation of the generalisation error defined as follows: • Estimate model m on the training set, and calculate the error on the validation set for sample k is: • Estimate the generalisation error after K repetitions as the average error across all repetitions: N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ K mE mE K k k gen 1ˆ What about the K models estimated on the different data sets? Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8 Select the model with the smallest generalisation error
  • 10. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Cross-validation aggregation 9
  • 11. • In model selection, the model obtained is the one built on all the data (no data reserved for validation) – However predictive accuracy is adjudged on models built on different parts of the data – These supplementary models are thrown away after they have served their purpose • The proposed approach: – We save the predictions made by the K estimated models – This gives us a prediction for every observation in the training sample derived from a model that was built when that observation was in the validation sample – We then average across the predictions from the K models to produce a final prediction. – In the case of neural networks, we also use the validation samples for early stop training – We average across multiple initialisations together with cross validation aggregation (to reduce variance) K k tkt m K M 1 xˆ 1 xˆ Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
  • 12. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Empirical evaluation 11
  • 13. Complete Dataset Reduced Dataset Short Long Normal Difficult SUM Non-Seasonal 25 (NS) 25 (NL) 4 (NN) 3 (ND) 57 Seasonal 25 (SS) 25 (SL) 4 (SN) - 54 SUM 50 50 8 3 111 20 40 60 80 100 120 140 4000 5000 6000 NN3_101 20 40 60 80 100 120 140 0 5000 10000 NN3_102 20 40 60 80 100 120 140 0 5 10 x 10 4 NN3_103 20 40 60 80 100 120 0 5000 10000 NN3_104 20 40 60 80 100 120 140 2000 4000 6000 NN3_105 20 40 60 80 100 120 140 0 5000 10000 NN3_106 4000 5000 NN3_107 5000 10000 NN3_108 Summary description of NN3 competition time series dataset Plot of 10 time series from the NN3 dataset Evaluation: Design and implementation Cross validation aggregation for forecasting Empirical evaluation 12 • Time series data • NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and Nikolopoulos 2011) • The following experimental setup is used: – Forecast horizon: 12 months – Holdout period: 18 months – Error Measures: SMAPE and MASE. – Rolling origin evaluation (Tashman,2000). • Neural network specification: – A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags. – Each MLP network contains a single hidden layer; two hidden nodes; and a single output node with a linear identity function. The hyperbolic tangent transfer function is used.
  • 14. Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference represents the median value of the distributions. • Across all time series – On validation set Monte carlo cross-validation is always best – All Crogging variants outperform the benchmark Bagging algorithm and hold-out method (NN model averaging) Method Train Validation Test BESTMLP 1.25 0.96 1.49 HOLDOUT 0.64 0.75 1.20 BAG 0.76 0.70 1.21 MONTECV 0.76 0.41 1.16 10FOLDCV 0.69 0.45 1.07 2FOLDCV 0.73 0.60 1.15 Method Train Validation Test BESTMLP 12.36 11.10 17.89 HOLDOUT 11.78 12.57 16.08 BAG 12.95 13.17 16.32 MONTECV 13.81 8.29 15.35 10FOLDCV 12.65 8.94 15.52 2FOLDCV 13.68 11.19 15.29 MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 13 MASE SMAPE
  • 15. Length Method Forecast Horizon 1-3 4-12 13-18 1-18 Long BESTMLP 10.79 16.59 20.02 16.77 HOLDOUT 9.34 14.96 16.20 14.43 BAG 9.74 15.46 16.38 14.81 MONTECV 10.86 15.16 15.43 14.54 10FOLDCV 10.39 14.04 14.82 13.69 2FOLDCV 9.03 14.64 15.69 14.06 Length Method Forecast Horizon 1-3 4-12 13-18 1-18 Short BESTMLP 16.83 17.03 20.66 18.20 HOLDOUT 17.59 17.04 20.12 18.16 BAG 17.20 17.27 20.96 18.49 MONTECV 15.47 14.71 19.05 16.28 10FOLDCV 16.00 15.91 20.25 17.37 2FOLDCV 15.86 14.51 18.95 16.21 SMAPE on test set averaged over long time series for short, medium and long forecast horizonSMAPE on test set averaged over short time series for short, medium and long forecast horizon • Data conditions: – Long time series: 10-fold cross-validation has the smallest error for medium to long horizons, and over forecast lead times 1-18 – Short time series: 2-fold cross validation and Monte-carlo cross- validation outperform 10-fold cross-validation for all forecast horizons Boxplots of the SMAPE averaged across long (left) and short (right) time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 14
  • 16. Average errors Ranking all methods Ranking NN/CI SMAPE MASE SMAPE MASE SMAPE MASE B09 Wildi 14.84 1.13 1 2 − − B07 Theta 14.89 1.13 2 2 − − C27 Illies 15.18 1.25 3 9 1 7 ** 2FOLDCV 15.29 1.15 4 3 2 2 ** MONTECV 15.35 1.16 5 4 3 3 B03 ForecastPro 15.44 1.17 6 5 − − … … … … … … … … ** BAG 16.32 1.21 13 8 7 5 … … … … … … … … B00 AutomatANN 16.81 1.21 14 8 8 5 ** MLP 17.89 1.50 15 10 9 6 • NN3 Competition: Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 15
  • 17. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Conclusions and future work 16
  • 18. • Conclusion – Cross-validation aggregation outperforms model selection, Bagging and the current approaches to model averaging which uses a single hold-out (validation sample) – It is especially effective when the amount of data available for training the model is limited as shown for short time series – Improvements in forecast accuracy increase with forecast horizons – It offers promising results on the NN3 competition • Future work – Perform bias-variance decomposition and analysis – Consider other base model types other than neural networks – Evaluate forecast accuracy for a larger set of time series - M3 Competition Data (3003 time series, established benchmark) Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work Not a Forecasting Method! A general method for improving the accuracy of a forecast model
  • 19. Devon K. Barrow Lancaster University Management School Centre for Forecasting Lancaster, LA1 4YX, UK Tel.: +44 (0) 7960271368 Email: d.barrow@lancaster.ac.uk