This session was recorded in NYC on October 22nd, 2019 and can be viewed here: https://www.youtube.com/watch?v=eF4Oa0ZzXdQ&list=PLNtMya54qvOE3AvWRCNF2tybxNobUbAYp&index=6&t=3s
Time Series in H2O Driverless AI
Time series is a unique field in predictive modelling where standard feature engineering techniques and models are employed to get the most accurate results. In this session we will examine some of the most important features of Driverless AI’s newest recipe regarding Time Series. It will cover validation strategies, feature engineering, feature selection and modelling. The capabilities will be showcased through several cases.
Bio: Dmitry has more than 10 years of experience in IT. Starting with data warehousing and BI, now in big data and data science.He has a lot of experience in predictive analytics software development for different domains and tasks.
He is also a Kaggle Grandmaster who loves to use his machine learning and data science skills on Kaggle competitions.
3. • Some input data
• A target variable
• An objective (or a success metric) like RMSE or MAE
• Some allocated resources (time and hardware)
e.g.salesx1 x2 x3 x4 y
0.14 0.69 0.01 0.71 300
0.22 0.44 0.45 0.69 100
0.12 0.35 0.51 0.23 40
0.22 0.42 0.79 0.60 23
0.93 0.82 0.72 0.50 1900
0.32 0.58 0.28 0.22 231
0.95 0.59 0.68 0.09 700
0.34 0.58 0.35 0.81 423
0.05 0.80 0.28 0.86 222
0.23 0.49 0.63 0.03 190
0.05 0.34 0.53 0.73 890
0.74 0.02 0.33 0.56 1000
Driverless AI Process
- Data visualization (AutoViz)
- Feature engineering & selection
- Automated Modeling
- Model interpretability (MLI)
- Scoring pipeline (predictions)
4. 0
50
100
150
200
250
300
350
400
12/31/2017 1/2/2018 1/4/2018 1/6/2018 1/8/2018 1/10/2018 1/12/2018 1/14/2018
Sales over time
Linear relationshipNonlinear (seasonal) relationship
What is a Time Series Problem?
0
50
100
150
200
250
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018
Sales over time
5. 0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales per per day (all groups)
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales by group
group 1 group 2 group 3
time groups sales
01/01/2018 group1 30
01/01/2018 group2 100
01/01/2018 group3 10
02/01/2018 group1 60.2
02/01/2018 group2 200.2
02/01/2018 group3 20.2
03/01/2018 group1 90.3
03/01/2018 group2 300.3
03/01/2018 group3 30.3
04/01/2018 group1 120.4
04/01/2018 group2 400.4
04/01/2018 group3 40.4
Time Groups
6. Modeling Foundation
1 2 3 4 5 6 7 8 9 10 11 12
[Gap]
1 2 3 4 5 6 7 8 9 10 11 12
[Gap] [Gap]
testtrain
tvs train tvs valid test
time:
Gap | Forecast Horizon
invalid lag size
valid lag size
time:
8. Date Sales
1/1/2018 100
2/1/2018 150
3/1/2018 160
4/1/2018 200
5/1/2018 210
6/1/2018 150
7/1/2018 160
8/1/2018 120
9/1/2018 80
10/1/2018 70
Lag1 Lag2
- -
100 -
150 100
160 150
200 160
210 200
150 210
160 150
120 160
80 120
Moving Average
-
100
125
155
180
205
180
155
140
100
Feature Engineering (cont.)
• Lags on subsets of the specified group columns (e.g. {Store, Department} vs. {Department} vs. {Store})
• Exponentially Weighted Moving Averages (EWMA) of n-th order differentiated lags
• Aggregation of lags (mean, std, sums, etc.)
• Interactions of lags (e.g. Lag2 - Lag1)
• Linear regression on lags (taking slope and/or intercept as new features)
9. What’s new?
Training Holdout Predictions / Backtesting
• Final pipeline will be refitted on various train/valid splits to generate holdout
predictions:
Split
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14
2 1 2 3 4 5 6 7 8 9 10 11 12
3 1 2 3 4 5 6 7 8 9 10
4 1 2 3 4 5 6 7 8
5 1 2 3 4 5 6
x
x
x
training data
validation/holdout data
Training & Validation/Holdout Data
optional training data
time
10. Test Time Augmentation & Rolling Predictions
• If test set is larger than the forecast horizon the predictions beyond are inferior
• Lookup tables to create features (lags, aggregates etc.) are missing the necessary data hence
models operate with many missing values
missing data
present data
11. 1st Solution: Test Time Augmentation (TTA)
Model stays the same, only
the memory of the fitted
transformers is updated (TTA).
Keep rolling the prediction
window over the whole test
set to get valid predictions.
Pros: no model changes, fast
Cons: model degradation over
time
12. 2nd Solution: Extend Train and Refit
To get valid predictions for
whole test set we extend the
train set with latest data and
refit the original model to
generate precise predictions.
Roll the prediction window
step by step over the whole
test period and keep
retraining models.
Pros: most precise
Cons: time consuming
13. What’s new?
Bring Your Own Recipe (BYOR)
• Custom time series transformers or models to be used within Driverless AI
• Interface to bring in domain specific (or just additional) feature transformers
• Interface to bring in popular algorithms like ARIMA, LSTM, Prophet etc.
• Either as custom models or as feature transformations
(i.e. using their predictions as input features for DAI)
• Example implementions available
• FBProphetModel
• ExponentialSmoothingModel
• AutoArimaTransformer
• ProphetTransformer
• …
https://github.com/h2oai/driverlessai-recipes/tree/master/transformers/timeseries
https://github.com/h2oai/driverlessai-recipes/tree/master/models/timeseries
14. Will be released soon:
Unknown Features at Prediction Time
• Some features might not be known at the time a prediction is made
• Driverless will make sure that only historical information for these features are used
15. Will be released soon:
Time Aware Target Transformations
• Detrending
• Fast linear (least squares)
• Robust linear (RANSAC regression)
• Logistic growth
• Centering
• y‘(t) = y(t) – c
• Differencing
• y‘(t) = y(t) – y(t - k)
• Ratio
• y‘(t) = y(t) / y(t - k)
16. Time Aware Target Transformations (cont.)
• Example: Capture trends with tree based models
Without detrending With detrending
17. Will be released soon:
Prediction Intervals
• Basend on the method from Williams & Goodman (1971)
• Very general approach:
• Makes no assumptions about the distribution of forecast errors
• Makes no assumptions about the model used to create forecasts
• General idea:
• Using time based holdout predictions to determine real forecast errors
• Constructing empirical prediction intervals based on forecast error quantiles