SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Time series in Driverless AIDmitry Larko
Sr. Data Scientist
H2O.ai
Background
• Some input data
• A target variable
• An objective (or a success metric) like RMSE or MAE
• Some allocated resources (time and hardware)
e.g.salesx1 x2 x3 x4 y
0.14 0.69 0.01 0.71 300
0.22 0.44 0.45 0.69 100
0.12 0.35 0.51 0.23 40
0.22 0.42 0.79 0.60 23
0.93 0.82 0.72 0.50 1900
0.32 0.58 0.28 0.22 231
0.95 0.59 0.68 0.09 700
0.34 0.58 0.35 0.81 423
0.05 0.80 0.28 0.86 222
0.23 0.49 0.63 0.03 190
0.05 0.34 0.53 0.73 890
0.74 0.02 0.33 0.56 1000
Driverless AI Process
- Data visualization (AutoViz)
- Feature engineering & selection
- Automated Modeling
- Model interpretability (MLI)
- Scoring pipeline (predictions)
0
50
100
150
200
250
300
350
400
12/31/2017 1/2/2018 1/4/2018 1/6/2018 1/8/2018 1/10/2018 1/12/2018 1/14/2018
Sales over time
Linear relationshipNonlinear (seasonal) relationship
What is a Time Series Problem?
0
50
100
150
200
250
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018
Sales over time
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales per per day (all groups)
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales by group
group 1 group 2 group 3
time groups sales
01/01/2018 group1 30
01/01/2018 group2 100
01/01/2018 group3 10
02/01/2018 group1 60.2
02/01/2018 group2 200.2
02/01/2018 group3 20.2
03/01/2018 group1 90.3
03/01/2018 group2 300.3
03/01/2018 group3 30.3
04/01/2018 group1 120.4
04/01/2018 group2 400.4
04/01/2018 group3 40.4
Time Groups
Modeling Foundation
1 2 3 4 5 6 7 8 9 10 11 12
[Gap]
1 2 3 4 5 6 7 8 9 10 11 12
[Gap] [Gap]
testtrain
tvs train tvs valid test
time:
Gap | Forecast Horizon
invalid lag size
valid lag size
time:
Date
1/1/2018
2/1/2018
3/1/2018
4/1/2018
5/1/2018
6/1/2018
7/1/2018
8/1/2018
9/1/2018
10/1/2018
Day Month Year Weekday Weeknum IsHoliday
1 1 2018 2 1 1
2 1 2018 3 1 0
3 1 2018 4 1 0
4 1 2018 5 1 0
5 1 2018 6 1 0
6 1 2018 7 1 0
7 1 2018 1 2 0
8 1 2018 2 2 0
9 1 2018 3 2 0
10 1 2018 4 2 0
Feature Engineering
Date Sales
1/1/2018 100
2/1/2018 150
3/1/2018 160
4/1/2018 200
5/1/2018 210
6/1/2018 150
7/1/2018 160
8/1/2018 120
9/1/2018 80
10/1/2018 70
Lag1 Lag2
- -
100 -
150 100
160 150
200 160
210 200
150 210
160 150
120 160
80 120
Moving Average
-
100
125
155
180
205
180
155
140
100
Feature Engineering (cont.)
• Lags on subsets of the specified group columns (e.g. {Store, Department} vs. {Department} vs. {Store})
• Exponentially Weighted Moving Averages (EWMA) of n-th order differentiated lags
• Aggregation of lags (mean, std, sums, etc.)
• Interactions of lags (e.g. Lag2 - Lag1)
• Linear regression on lags (taking slope and/or intercept as new features)
What’s new?
Training Holdout Predictions / Backtesting
• Final pipeline will be refitted on various train/valid splits to generate holdout
predictions:
Split
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14
2 1 2 3 4 5 6 7 8 9 10 11 12
3 1 2 3 4 5 6 7 8 9 10
4 1 2 3 4 5 6 7 8
5 1 2 3 4 5 6
x
x
x
training data
validation/holdout data
Training & Validation/Holdout Data
optional training data
time
Test Time Augmentation & Rolling Predictions
• If test set is larger than the forecast horizon the predictions beyond are inferior
• Lookup tables to create features (lags, aggregates etc.) are missing the necessary data hence
models operate with many missing values
missing data
present data
1st Solution: Test Time Augmentation (TTA)
Model stays the same, only
the memory of the fitted
transformers is updated (TTA).
Keep rolling the prediction
window over the whole test
set to get valid predictions.
Pros: no model changes, fast
Cons: model degradation over
time
2nd Solution: Extend Train and Refit
To get valid predictions for
whole test set we extend the
train set with latest data and
refit the original model to
generate precise predictions.
Roll the prediction window
step by step over the whole
test period and keep
retraining models.
Pros: most precise
Cons: time consuming
What’s new?
Bring Your Own Recipe (BYOR)
• Custom time series transformers or models to be used within Driverless AI
• Interface to bring in domain specific (or just additional) feature transformers
• Interface to bring in popular algorithms like ARIMA, LSTM, Prophet etc.
• Either as custom models or as feature transformations
(i.e. using their predictions as input features for DAI)
• Example implementions available
• FBProphetModel
• ExponentialSmoothingModel
• AutoArimaTransformer
• ProphetTransformer
• …
https://github.com/h2oai/driverlessai-recipes/tree/master/transformers/timeseries
https://github.com/h2oai/driverlessai-recipes/tree/master/models/timeseries
Will be released soon:
Unknown Features at Prediction Time
• Some features might not be known at the time a prediction is made
• Driverless will make sure that only historical information for these features are used
Will be released soon:
Time Aware Target Transformations
• Detrending
• Fast linear (least squares)
• Robust linear (RANSAC regression)
• Logistic growth
• Centering
• y‘(t) = y(t) – c
• Differencing
• y‘(t) = y(t) – y(t - k)
• Ratio
• y‘(t) = y(t) / y(t - k)
Time Aware Target Transformations (cont.)
• Example: Capture trends with tree based models
Without detrending With detrending
Will be released soon:
Prediction Intervals
• Basend on the method from Williams & Goodman (1971)
• Very general approach:
• Makes no assumptions about the distribution of forecast errors
• Makes no assumptions about the model used to create forecasts
• General idea:
• Using time based holdout predictions to determine real forecast errors
• Constructing empirical prediction intervals based on forecast error quantiles
Masterminds behind DAI time series
• Data Scientists
• Former #1 & #4
Thank You
Twitter: @DmitryLarko

Weitere ähnliche Inhalte

Mehr von Sri Ambati

Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
Sri Ambati
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing
Sri Ambati
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
Sri Ambati
 
AI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsAI and AutoML: Debunking Myths
AI and AutoML: Debunking Myths
Sri Ambati
 

Mehr von Sri Ambati (20)

Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
Getting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AIGetting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AI
 
AI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsAI and AutoML: Debunking Myths
AI and AutoML: Debunking Myths
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Dmitry Larko, H2O.ai - Time Series in H2O Driverless AI - #H2OWorld 2019 NYC

  • 1. Time series in Driverless AIDmitry Larko Sr. Data Scientist H2O.ai
  • 3. • Some input data • A target variable • An objective (or a success metric) like RMSE or MAE • Some allocated resources (time and hardware) e.g.salesx1 x2 x3 x4 y 0.14 0.69 0.01 0.71 300 0.22 0.44 0.45 0.69 100 0.12 0.35 0.51 0.23 40 0.22 0.42 0.79 0.60 23 0.93 0.82 0.72 0.50 1900 0.32 0.58 0.28 0.22 231 0.95 0.59 0.68 0.09 700 0.34 0.58 0.35 0.81 423 0.05 0.80 0.28 0.86 222 0.23 0.49 0.63 0.03 190 0.05 0.34 0.53 0.73 890 0.74 0.02 0.33 0.56 1000 Driverless AI Process - Data visualization (AutoViz) - Feature engineering & selection - Automated Modeling - Model interpretability (MLI) - Scoring pipeline (predictions)
  • 4. 0 50 100 150 200 250 300 350 400 12/31/2017 1/2/2018 1/4/2018 1/6/2018 1/8/2018 1/10/2018 1/12/2018 1/14/2018 Sales over time Linear relationshipNonlinear (seasonal) relationship What is a Time Series Problem? 0 50 100 150 200 250 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 Sales over time
  • 5. 0 100 200 300 400 500 600 700 800 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018 sales per per day (all groups) 0 100 200 300 400 500 600 700 800 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018 sales by group group 1 group 2 group 3 time groups sales 01/01/2018 group1 30 01/01/2018 group2 100 01/01/2018 group3 10 02/01/2018 group1 60.2 02/01/2018 group2 200.2 02/01/2018 group3 20.2 03/01/2018 group1 90.3 03/01/2018 group2 300.3 03/01/2018 group3 30.3 04/01/2018 group1 120.4 04/01/2018 group2 400.4 04/01/2018 group3 40.4 Time Groups
  • 6. Modeling Foundation 1 2 3 4 5 6 7 8 9 10 11 12 [Gap] 1 2 3 4 5 6 7 8 9 10 11 12 [Gap] [Gap] testtrain tvs train tvs valid test time: Gap | Forecast Horizon invalid lag size valid lag size time:
  • 7. Date 1/1/2018 2/1/2018 3/1/2018 4/1/2018 5/1/2018 6/1/2018 7/1/2018 8/1/2018 9/1/2018 10/1/2018 Day Month Year Weekday Weeknum IsHoliday 1 1 2018 2 1 1 2 1 2018 3 1 0 3 1 2018 4 1 0 4 1 2018 5 1 0 5 1 2018 6 1 0 6 1 2018 7 1 0 7 1 2018 1 2 0 8 1 2018 2 2 0 9 1 2018 3 2 0 10 1 2018 4 2 0 Feature Engineering
  • 8. Date Sales 1/1/2018 100 2/1/2018 150 3/1/2018 160 4/1/2018 200 5/1/2018 210 6/1/2018 150 7/1/2018 160 8/1/2018 120 9/1/2018 80 10/1/2018 70 Lag1 Lag2 - - 100 - 150 100 160 150 200 160 210 200 150 210 160 150 120 160 80 120 Moving Average - 100 125 155 180 205 180 155 140 100 Feature Engineering (cont.) • Lags on subsets of the specified group columns (e.g. {Store, Department} vs. {Department} vs. {Store}) • Exponentially Weighted Moving Averages (EWMA) of n-th order differentiated lags • Aggregation of lags (mean, std, sums, etc.) • Interactions of lags (e.g. Lag2 - Lag1) • Linear regression on lags (taking slope and/or intercept as new features)
  • 9. What’s new? Training Holdout Predictions / Backtesting • Final pipeline will be refitted on various train/valid splits to generate holdout predictions: Split 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 2 1 2 3 4 5 6 7 8 9 10 11 12 3 1 2 3 4 5 6 7 8 9 10 4 1 2 3 4 5 6 7 8 5 1 2 3 4 5 6 x x x training data validation/holdout data Training & Validation/Holdout Data optional training data time
  • 10. Test Time Augmentation & Rolling Predictions • If test set is larger than the forecast horizon the predictions beyond are inferior • Lookup tables to create features (lags, aggregates etc.) are missing the necessary data hence models operate with many missing values missing data present data
  • 11. 1st Solution: Test Time Augmentation (TTA) Model stays the same, only the memory of the fitted transformers is updated (TTA). Keep rolling the prediction window over the whole test set to get valid predictions. Pros: no model changes, fast Cons: model degradation over time
  • 12. 2nd Solution: Extend Train and Refit To get valid predictions for whole test set we extend the train set with latest data and refit the original model to generate precise predictions. Roll the prediction window step by step over the whole test period and keep retraining models. Pros: most precise Cons: time consuming
  • 13. What’s new? Bring Your Own Recipe (BYOR) • Custom time series transformers or models to be used within Driverless AI • Interface to bring in domain specific (or just additional) feature transformers • Interface to bring in popular algorithms like ARIMA, LSTM, Prophet etc. • Either as custom models or as feature transformations (i.e. using their predictions as input features for DAI) • Example implementions available • FBProphetModel • ExponentialSmoothingModel • AutoArimaTransformer • ProphetTransformer • … https://github.com/h2oai/driverlessai-recipes/tree/master/transformers/timeseries https://github.com/h2oai/driverlessai-recipes/tree/master/models/timeseries
  • 14. Will be released soon: Unknown Features at Prediction Time • Some features might not be known at the time a prediction is made • Driverless will make sure that only historical information for these features are used
  • 15. Will be released soon: Time Aware Target Transformations • Detrending • Fast linear (least squares) • Robust linear (RANSAC regression) • Logistic growth • Centering • y‘(t) = y(t) – c • Differencing • y‘(t) = y(t) – y(t - k) • Ratio • y‘(t) = y(t) / y(t - k)
  • 16. Time Aware Target Transformations (cont.) • Example: Capture trends with tree based models Without detrending With detrending
  • 17. Will be released soon: Prediction Intervals • Basend on the method from Williams & Goodman (1971) • Very general approach: • Makes no assumptions about the distribution of forecast errors • Makes no assumptions about the model used to create forecasts • General idea: • Using time based holdout predictions to determine real forecast errors • Constructing empirical prediction intervals based on forecast error quantiles
  • 18. Masterminds behind DAI time series • Data Scientists • Former #1 & #4