SlideShare ist ein Scribd-Unternehmen logo
1 von 64
For Time Series Forecasting
ARUN KEJARIWAL
IRA COHEN
Sequence-2-Sequence Learning
ABOUT US
TIME SERIES
FORECASTING
3
Meteorology Machine Translation
Operations
Transportation
Econometrics Marketing, Sales
Finance Speech
Synthesis
4
AN
EXAMPLE
# Figure borrowed from Brockwell and Davis.
#
TITLE HERE
# *
Heteroscedasticity
STRUCTURAL
CHARACTERISTICS
*FigureborrowedfromHyndmanetal.2015.
Changepoint
Anomalies, Extreme Values Trend + Seasonality
FLAVORS
TIMES SERIES
FORECASTING
6
# Figure borrowed from Tao et al. 2018.
#
[Faullkner,
Comstock, Fossum]
[Craw]
[Brockwell, Davis]
[Chatfield]
[Bowerman,
O’Connell, Koehler]
[Granger, Newbold]
Long History
Research
Books
8
[Gilchrist]
[Hyndman, Athanasopoulos ]
[Box et al.]
[Wilson, Keating]
[Makridarkis et al.]
[Mallios]
[Montgomery et al.]
[Pankratz]
WHY
DEEP LEARNING?
9
10
Seasonality
Multiple levels: weekly, monthly, yearly or Non-seasonal (aperiodic)
Stationarity
Time varying mean and variance (heteroskedasticity), Exogenous shocks
Structural
Unevenly Spaced, Missing Data, Anomalies, Changepoints, Small sample size,
Skewness, Kurtosis, Chaos, Noise
Trend
Growth, Virality (network effects), Non-linearity
PROPERTIES
TEMPORAL CREDIT
ASSIGNEMENT
11
(TCA)
DEEP LEARNING
UBIQUITOUS
12
S2S
13
# http://karpathy.github.io/2015/05/21/rnn-effectiveness/
#
[2014]
14
BACKPROPAGATION
THROUGH TIME
15
BACKPROPAGATION
THROUGH TIME
# Figure borrowed from Lillicrap and Santoro, 2019.
#
16
BACKPROPAGATION
THROUGH TIME
[1986]
[1990]
[1986]
[EARLY WORK]
[1990]
17
REAL-TIME RECURRENT
LEARNING#*
# A Learning Algorithm for Continually Running Fully Recurrent Neural Networks [Williams and Zipser, 1989]
* A Method for Improving the Real-Time Recurrent Learning Algorithm [Catfolis, 1993]
UORO
A
APPROXIMATE
RTRL
UORO
[Unbiased Online Recurrent Optimization]
Works in a streaming fashion
Online, Memoryless
Avoids backtracking through past
activations and inputs
Low-rank approximation to forward-
mode automatic differentiation
Reduced computation and storage
KF-RTRL
[Kronecker Factored RTRL]
Kronecker product decomposition to
approximate the gradients
Reduces noise in the approximation
Asymptotically, smaller by a factor of n
Memory requirement equivalent to UORO
Higher computation than UORO
Not applicable to arbitrary architectures
# Unbiased Online Recurrent Optimization [Tallec and Ollivier, 2017]
#
* Approximating Real-Time Recurrent Learning with Random Kronecker Factors
[Mujika et al. 2018]
*
MEMORY
BASED
ATTENTION
BASED
19
ARCHITECTURE TYPES
OF RNNs
MEMORY-BASED RNN
ARCHITECTURES
20
BRNN: Bi-directional RNN
[Schuster and Paliwal, 1997]
GLU: Gated Linear Unit
[Dauphin et al. 2016]
Long Short-Term Memory: LSTM
[Hochreiter and Schmidhuber, 1996]
Gated Recurrent Unit: GRU
[Cho et al. 2014]
Gated Highway Network: GHN
[Zilly et al. 2017]
Neural Computation, 1997
* Figure borrowed from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
(a) Forget gate (b) Input gate
(c) Output gate
St: hidden state
“The LSTM’s main idea is that, instead of compu7ng St
from St-1 directly with a matrix-vector product followed
by a nonlinearity, the LSTM directly computes St, which
is then added to St-1 to obtain St.” [Jozefowicz et al.
2015]
Resistant to vanishing gradient problem
Achieve better results when dropout is used
Adding bias of 1 to LSTM’s forget gate
*
Stacking d RNNs
Recurrence depth d
LONG CREDIT ASSIGNMENT
PATHS
Incorporates Highway layers inside the recurrent
transition
Highway layers in RHNs perform adaptive computation
Transform
Carry
H, T, C: Non-linear transforms
Regularization
Variational inference based dropout
* Figure borrowed from Silly et al. 2017
*
*
23
NEW FLAVORS
OF RNNs
# Figure borrowed from https://distill.pub/2016/augmented-rnns/
#
What caught your eye at first glance?
24
And this one?
25
* Figure borrowed from Golub et al. 2012
26
Psychology, Neuroscience, Cognitive Sciences
[1959]
[1974]
[1956]
Span of absolute judgement
ATTENTION
27
#
[2014]
[2017]
28
# Figure borrowed from https://distill.pub/2016/augmented-rnns/
#
ATTENTION
MECHANISM
29
ATTENTION
MECHANISM
# Figure borrowed from Lillicrap and Santoro, 2019.
#
CONTENT
BASED
LOCATION
BASED
30ATTENTION
31
Self
Relates different positions of a single sequence in order to compute a
representation of the same sequence
Also referred to as intra-attention
Global vs. Local
Global: alignment weights at are inferred from the current target state and all
the source states
Local: alignment weights at are inferred from the current target state and those
source states in the window.
Soft vs. Hard
Soft: Alignment weights are learned and placed “softly” over all patches in the
source image
Hard: only selects one patch of the image to attend to at a time
ATTENTION
FAMILY
ATTENTION-BASED
Models
32
Sparse
Attentive Backpropagation
[Ke et al. 2018]
Hierarchical
Attention-Based RHN
[Tao et al. 2018]
Long Short-Term
Memory-Networks
[Cheng et al. 2016]
Self-Attention GAN
[Zhang et al. 2018]
[A SNAPSHOT]
33
HIERARCHICAL ATTENTION-BASED
RECURRENT HIGHWAY NETWORK
# Figure borrowed from Tao et al. 2018.
#
✦ Inspired by the cognitive analogy of reminding
๏ Designed to retrieve one or very few past states
✦ Incorporates a differentiable, sparse (hard) attention mechanism to select from past states
34SPARSE ATTENTIVE BACKTRACKING
TCA THROUGH
REMINDING
# Figure borrowed from Ke et al. 2018.
#
35
HEALTH
CARE
# Figure borrowed from Song et al. 2018.
Multi-head Attention
Additional masking to enable causality
Inference
Diagnoses, Length of stay
Future illness, Mortality
Temporal ordering
Positional Encoding & Dense interpolation embedding
MULTI-VARIATE
Sensor measurement, Test results
Irregular sampling, Missing values and measurement errors
Heterogeneous, Presence of long range dependencies
#
TIME SERIES
FORECASTING:
ON THE ROLE OF
PRE-PROCESSING
TO GET IT RIGHT
Auto ML
Trend Anomaly Root Cause Forecast What If Optimization
Real-timeNo Code
Business Monitoring Business Forecast
No Data Scientist
ANODOT MISSION: MAKING BI AUTONOMOUS
GAMING
ECOMMERCE
AD TECH
TELCOMENTERPRISE
INTERNET
IOTFINTECH
SOME OF OUR CUSTOMERS
BIG SOCIAL
NETWORK
4
FINTECH
/ TREASURY DEPARTMENT
TRANSPORTATION
/ DATA SCIENCE DEPARTMENT
How many drivers
will I need tomorrow?
DEMAND FORECAST GROWTH FORECAST
Anticipate demand for inventory, products,
service calls and much more.
Anticipate revenue growth, expenses,
cash flow and other KPIs.
How many funds do I need
to allocate per currency?
Will we hit our targets
next quarter?
F O R E C A S T U S E C A S E S
FINTECH / TREASURY DEPARTMENT
X ?
X ?
TRANSPORTATION / BUSINESS OPERATIONS ALL INDUSTRIES / FINANCE DEPARTMENT
AI-POWERED FORECASTING IN A TURN-KEY
EXPERIENCE
Correlate with Public Data
PRODUCT COMPONENTS
CONSIDERATION FOR ACCURATE FORECAST
Discovering influencing
metrics and events
1.
Ensemble of models2.
Identify and account
for data anomalies
3.
Identify and account
for different time
series behaviors
4.
HOW TO DISCOVER
INFLUENCING METRICS/EVENTS?
• Target time series +
forecast horizon
• Millions of
measures/events that can
used as features
INPUT:
• Step 1 is computationally
expensive for long
sequences: Use LSH for
speed
• Which correlation function
to use?
CHALLENGES:
STEP 1
Compute correlation
between target and each
measure/event (shifted by
the horizons)
STEP 2
Choose X most correlated
measures
STEP 3
Train forecast model
PROCEDURE:
THE EFFECT ON ACCURACY
IDENTIFYING AND ACCOUNTING
FOR DATA ANOMALIES
ANOMALIES
DEGRADE
FORECASTING
ACCURACY
How to remedy the
situation?
Discover anomalies and
use the information to
create new features:
Case 1: Anomalies can be explained by
external factors – enhance the anomalies
Case 2: Anomalies can’t be explained by
external factors – weight down the anomalies
•
•
•
IDENTIFYING AND ACCOUNTING
FOR DATA ANOMALIES
Case 1: Anomalies can be explained by
external factors – enhance the anomalies
Case 2: Anomalies can’t be explained by
external factors – weight down the anomalies
Discover anomalies and
use the information to
create new features:
SOLUTION:
IDENTIFYING AND ACCOUNTING FOR
DATA ANOMALIES: RESULT
1-15% accuracy
improvement
Varying behaviors:
• Seasonality (length/strength)
• Stationarity
• Trends
• Sparseness
• Spikiness
• …
length
trend
Seasonal strength
linearity
curvature
e_acf1
e_acf10
peak
trough
stability
entropy
x_acf1
x_acf10
diff1_acf1
diff1_acf10
diff2_acf1
diff2_acf10
seas_acf1
arch_acf
garch_acf
arch_r2
garch_r2
hurst
lumpiness
spike
max_level_shift
time_level_shift
max_var_shift
time_var_shift
max_kl_shift
time_kl_shift
unitroot_kpss
unitroot_pp
x_pacf5
diff1x_pacf5
diff2x_pacf5
seas_pacf
crossing_points
flat_spots
*Rob Hyndman - tsfeatures
https://pkg.robjhyndman.com/tsfeatures
Handling
Variations
In Time
Series
Behaviors
POTENTIAL ADVANTAGES
● Train one model for many time series
● Less data required per time series
OPEN QUESTIONS
● Will a single model be more accurate than individual ones?
● Which types of differing behaviors impact the ability to train a
single model adversely, and which do not?
A SINGLE MODEL FOR THEM ALL?
A SINGLE MODEL FOR THEM ALL?
TESTING THE IMPACT OF EACH BEHAVIOR TYPE
LSTMsLSTMsLSTM for
each TS
One LSTM for
all TS
LSTMsLSTMsLSTM for
each TS
One LSTM for
all TS
One LSTM for
all TS
Train
Benchmark
Forecast
Horizontal line
Bench mark loss (absolute error)
Score 5
(model loss/
benchmark
loss)
Score 4
Score 3
Score 2
Score 1
Dataset
Compute
strength of
behavior for
each TS
High
strength
TS
Low
strength
TS
(by feature)
Score 5
high
Score 3
low/high
Score 1
low
Score 2
low
Score 4
high
Impact of the behavior to
mixed training
Impact of the behavior
on ability to forecast
A SINGLE MODEL FOR THEM ALL?
TESTING THE IMPACT OF EACH BEHAVIOR TYPE
Impact on accuracy
for joint training
Impact on accuracy for
variability of the behavior
seasonal_strength
curvature
x_pacf5
linearity
hurst
x_acf1
entropy
max_level_shift
time_level_shift
max_var_shift
time_kl_shift
unitroot_kpss
unitroot_pp
seasonal frequency
arch acf
garch_acf
seas_pacf
trough
peak
stability
lumpiness
diff2_acf10 e_acf10
diff1_acf10
x_acf10
arch_acf
max_kl_shift
2
Seasonality
Homodesdacity
1
MAIN
CONCLUSIONS /
SOLUTIONS
Impact on
accuracy
for joint
training
Impact on accuracy for
variability of the behavior
● Two main factors preventing simple training
of single models
● Seasonality: The frequency is the important
factor, no shape
● homoscedasticity (same variance): prevents
mixing, but strength of it impacts accuracy
overall
● Other behaviors have lower mixing impact
SOLUTIONS
● Separate TS for training based on behavior
● Embed behavior related features for a single
model training.
Requires efficient feature selection
1.
Preprocessing before training
boosts forecast accuracy
2.
Seasonality and homoscedasticity
are the key behaviors impacting
ability to train joint models
3.
KEY TAKEAWAYS
Discovering influencing
metrics and events
1.
Identify and account
for data anomalies
2.
Identify and account
for different time
series behaviors
3.
Thank you
36
READINGS
37
[Rosenblatt]
Principles of Neurodynamics: Perceptrons
and the theory of brain mechanisms
[Eds. Anderson and Rosenfeld]
Neurocomputing: Foundations of
Research
[Eds. Rumelhart and McClelland]
Parallel and Distributed Processing
[Werbos]
The Roots of Backpropagation: From Ordered
Derivatives to Neural Networks and Political
Forecasting
[Eds. Chauvin and Rumelhart]
Backpropagation: Theory, Architectures
and Applications
[Rojas]
Neural Networks: A Systematic
Introduction
[BOOKS]
READINGS
38
Perceptrons [Minsky and Papert, 1969]
Une procedure d'apprentissage pour reseau a seuil assymetrique [Le Cun, 1985]
The problem of serial order in behavior [Lashley, 1951]
Beyond regression: New tools for prediction and analysis in the behavioral sciences [Werbos, 1974]
Connectionist models and their properties [Feldman and Ballard, 1982]
Learning-logic [Parker, 1985]
[EARLY WORKS]
READINGS
39
Learning internal representations by error propagation [Rumelhart, Hinton, and Williams, Chapter 8 in D. Rumelhart and F. McClelland, Eds.,
Parallel Distributed Processing, Vol. 1, 1986] (Generalized Delta Rule)
Generalization of backpropagation with application to a recurrent gas market model [Werbos, 1988]
Generalization of backpropagation to recurrent and higher order networks [Pineda, 1987]
Backpropagation in perceptrons with feedback [Almeida, 1987]
Second-order backpropagation: Implementing an optimal O(n) approximation to Newton's method in an artificial neural network [Parker,
1987]
Learning phonetic features using connectionist networks: an experiment in speech recognition [Watrous and Shastri, 1987] (Time-delay NN)
[BACKPROPAGATION]
READINGS
40
Backpropagation: Past and future [Werbos, 1988]
Adaptive state representation and estimation using recurrent connectionist networks [Williams, 1990]
Generalization of back propagation to recurrent and higher order neural networks [Pineda, 1988]
Learning state space trajectories in recurrent neural networks [Pearlmutter 1989]
Parallelism, hierarchy, scaling in time-delay neural networks for spotting Japanese phonemes/CV-syllables [Sawai et al. 1989]
The role of time in natural intelligence: implications for neural network and artificial intelligence research [Klopf and Morgan, 1990]
[BACKPROPAGATION]
READINGS
41
Recurrent Neural Network Regularization [Zaremba et al. 2014]
Regularizing RNNs by Stabilizing Activations [Krueger and Memisevic, 2016]
Sampling-based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks [Chernodub and Nowicki 2016]
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks [Gal and Ghahramani, 2016]
Noisin: Unbiased Regularization for Recurrent Neural Networks [Dieng et al. 2018]
State-Regularized Recurrent Neural Networks [Wang and Niepert, 2019]
[REGULARIZATION of RNNs]
READINGS
42
A Decomposable Attention Model for Natural Language Inference [Parikh et al. 2016]
Hybrid Computing Using A Neural Network With Dynamic External Memory [Graves et al. 2017]
Image Transformer [Parmar et al. 2018]
Universal Transformers [Dehghani et al. 2019]
The Evolved Transformer [So et al. 2019]
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context [Dai et al. 2019]
[ATTENTION & TRANSFORMERS]
READINGS
43
Financial Time Series Prediction using hybrids of Chaos Theory, Multi-layer Perceptron and Multi-objective Evolutionary Algorithms [Ravi et
al. 2017]
Model-free Prediction of Noisy Chaotic Time Series by Deep Learning [Yeo, 2017]
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks [Salinas et al. 2017]
Real-Valued (Medical) Time Series Generation With Recurrent Conditional GANs [Hyland et al. 2017]
R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting [Goel et al. 2017]
Temporal Pattern Attention for Multivariate Time Series Forecasting [Shih et al. 2018]
[TIME SERIES PREDICTION]
READINGS
44
Unbiased Online Recurrent Optimization [Tallec and Ollivier, 2017]
Approximating real-time recurrent learning with random Kronecker factors [Mujika et al. 2018]
Theory and Algorithms for Forecasting Time Series [Kuznetsov and Mohri, 2018]
Foundations of Sequence-to-Sequence Modeling for Time Series [Kuznetsov and Meriet, 2018]
On the Variance Unbiased Recurrent Optimization [Cooijmans and Martens, 2019]
Backpropagation through time and the brain [Lillicrap and Santoro, 2019]
[POTPOURRI]
RESOURCES
45
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
A review of Dropout as applied to RNNs
https://medium.com/@bingobee01/a-review-of-dropout-as-applied-to-rnns-72e79ecd5b7b
https://distill.pub/2016/augmented-rnns/
https://distill.pub/2019/memorization-in-rnns/
https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
Using the latest advancements in deep learning to predict stock price movements
https://towardsdatascience.com/aifortrading-2edd6fac689d
How to Use Weight Regularization with LSTM Networks for Time Series Forecasting
https://machinelearningmastery.com/use-weight-regularization-lstm-networks-time-series-forecasting/

Weitere ähnliche Inhalte

Was ist angesagt?

K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2Ram Mohan
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017Stefan Kühn
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
 
Controversial news detection
Controversial news detectionControversial news detection
Controversial news detectionMILIND GAIKWAD
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clusteringtim_hare
 

Was ist angesagt? (9)

K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Controversial news detection
Controversial news detectionControversial news detection
Controversial news detection
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 

Ähnlich wie Sequence-to-Sequence Modeling for Time Series

Machine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentMachine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentAnant Agarwal
 
Episode 12 : Research Methodology ( Part 2 )
Episode 12 :  Research Methodology ( Part 2 )Episode 12 :  Research Methodology ( Part 2 )
Episode 12 : Research Methodology ( Part 2 )SAJJAD KHUDHUR ABBAS
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ballp6academy
 
FPP 1. Getting started
FPP 1. Getting startedFPP 1. Getting started
FPP 1. Getting startedRob Hyndman
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxAKHIL969626
 
Episode 18 : Research Methodology ( Part 8 )
Episode 18 :  Research Methodology ( Part 8 )Episode 18 :  Research Methodology ( Part 8 )
Episode 18 : Research Methodology ( Part 8 )SAJJAD KHUDHUR ABBAS
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain ManagersMichael DePue
 
Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...
Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...
Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...WilfredGitaari2
 
presentation data fusion methods ex.pptx
presentation data fusion methods ex.pptxpresentation data fusion methods ex.pptx
presentation data fusion methods ex.pptxJulius346776
 
Pillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionPillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionBenjamin Huston
 
Forecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector MachineForecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector MachineIRJET Journal
 
Final Time series analysis part 2. pptx
Final Time series analysis part 2.  pptxFinal Time series analysis part 2.  pptx
Final Time series analysis part 2. pptxSHUBHAMMBA3
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIStudio Synthesis
 
Roberti: NEON's approach to uncertainty estimation for sensor-based measurem...
Roberti:  NEON's approach to uncertainty estimation for sensor-based measurem...Roberti:  NEON's approach to uncertainty estimation for sensor-based measurem...
Roberti: NEON's approach to uncertainty estimation for sensor-based measurem...questRCN
 
The Treatment of Uncertainty in Models
The Treatment of Uncertainty in ModelsThe Treatment of Uncertainty in Models
The Treatment of Uncertainty in ModelsIES / IAQM
 

Ähnlich wie Sequence-to-Sequence Modeling for Time Series (20)

Machine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentMachine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to Deployment
 
Episode 12 : Research Methodology ( Part 2 )
Episode 12 :  Research Methodology ( Part 2 )Episode 12 :  Research Methodology ( Part 2 )
Episode 12 : Research Methodology ( Part 2 )
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ball
 
FPP 1. Getting started
FPP 1. Getting startedFPP 1. Getting started
FPP 1. Getting started
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
 
Episode 18 : Research Methodology ( Part 8 )
Episode 18 :  Research Methodology ( Part 8 )Episode 18 :  Research Methodology ( Part 8 )
Episode 18 : Research Methodology ( Part 8 )
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
 
Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...
Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...
Forecasting and Quantification - Presentation - EMBA Degree Individual Assign...
 
presentation data fusion methods ex.pptx
presentation data fusion methods ex.pptxpresentation data fusion methods ex.pptx
presentation data fusion methods ex.pptx
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Pillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionPillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted version
 
Forecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector MachineForecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector Machine
 
Qa improvement
Qa improvementQa improvement
Qa improvement
 
Final Time series analysis part 2. pptx
Final Time series analysis part 2.  pptxFinal Time series analysis part 2.  pptx
Final Time series analysis part 2. pptx
 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point Analysis
 
Ch9 slides
Ch9 slidesCh9 slides
Ch9 slides
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Ch9_slides.ppt
Ch9_slides.pptCh9_slides.ppt
Ch9_slides.ppt
 
Roberti: NEON's approach to uncertainty estimation for sensor-based measurem...
Roberti:  NEON's approach to uncertainty estimation for sensor-based measurem...Roberti:  NEON's approach to uncertainty estimation for sensor-based measurem...
Roberti: NEON's approach to uncertainty estimation for sensor-based measurem...
 
The Treatment of Uncertainty in Models
The Treatment of Uncertainty in ModelsThe Treatment of Uncertainty in Models
The Treatment of Uncertainty in Models
 

Mehr von Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly DetectionArun Kejariwal
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldArun Kejariwal
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

Mehr von Arun Kejariwal (20)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly Detection
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Sequence-to-Sequence Modeling for Time Series

  • 1. For Time Series Forecasting ARUN KEJARIWAL IRA COHEN Sequence-2-Sequence Learning
  • 3. TIME SERIES FORECASTING 3 Meteorology Machine Translation Operations Transportation Econometrics Marketing, Sales Finance Speech Synthesis
  • 4. 4 AN EXAMPLE # Figure borrowed from Brockwell and Davis. #
  • 6. FLAVORS TIMES SERIES FORECASTING 6 # Figure borrowed from Tao et al. 2018. #
  • 8. 8 [Gilchrist] [Hyndman, Athanasopoulos ] [Box et al.] [Wilson, Keating] [Makridarkis et al.] [Mallios] [Montgomery et al.] [Pankratz]
  • 10. 10 Seasonality Multiple levels: weekly, monthly, yearly or Non-seasonal (aperiodic) Stationarity Time varying mean and variance (heteroskedasticity), Exogenous shocks Structural Unevenly Spaced, Missing Data, Anomalies, Changepoints, Small sample size, Skewness, Kurtosis, Chaos, Noise Trend Growth, Virality (network effects), Non-linearity PROPERTIES
  • 15. 15 BACKPROPAGATION THROUGH TIME # Figure borrowed from Lillicrap and Santoro, 2019. #
  • 17. 17 REAL-TIME RECURRENT LEARNING#* # A Learning Algorithm for Continually Running Fully Recurrent Neural Networks [Williams and Zipser, 1989] * A Method for Improving the Real-Time Recurrent Learning Algorithm [Catfolis, 1993]
  • 18. UORO A APPROXIMATE RTRL UORO [Unbiased Online Recurrent Optimization] Works in a streaming fashion Online, Memoryless Avoids backtracking through past activations and inputs Low-rank approximation to forward- mode automatic differentiation Reduced computation and storage KF-RTRL [Kronecker Factored RTRL] Kronecker product decomposition to approximate the gradients Reduces noise in the approximation Asymptotically, smaller by a factor of n Memory requirement equivalent to UORO Higher computation than UORO Not applicable to arbitrary architectures # Unbiased Online Recurrent Optimization [Tallec and Ollivier, 2017] # * Approximating Real-Time Recurrent Learning with Random Kronecker Factors [Mujika et al. 2018] *
  • 20. MEMORY-BASED RNN ARCHITECTURES 20 BRNN: Bi-directional RNN [Schuster and Paliwal, 1997] GLU: Gated Linear Unit [Dauphin et al. 2016] Long Short-Term Memory: LSTM [Hochreiter and Schmidhuber, 1996] Gated Recurrent Unit: GRU [Cho et al. 2014] Gated Highway Network: GHN [Zilly et al. 2017]
  • 21. Neural Computation, 1997 * Figure borrowed from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (a) Forget gate (b) Input gate (c) Output gate St: hidden state “The LSTM’s main idea is that, instead of compu7ng St from St-1 directly with a matrix-vector product followed by a nonlinearity, the LSTM directly computes St, which is then added to St-1 to obtain St.” [Jozefowicz et al. 2015] Resistant to vanishing gradient problem Achieve better results when dropout is used Adding bias of 1 to LSTM’s forget gate *
  • 22. Stacking d RNNs Recurrence depth d LONG CREDIT ASSIGNMENT PATHS Incorporates Highway layers inside the recurrent transition Highway layers in RHNs perform adaptive computation Transform Carry H, T, C: Non-linear transforms Regularization Variational inference based dropout * Figure borrowed from Silly et al. 2017 * *
  • 23. 23 NEW FLAVORS OF RNNs # Figure borrowed from https://distill.pub/2016/augmented-rnns/ #
  • 24. What caught your eye at first glance? 24
  • 25. And this one? 25 * Figure borrowed from Golub et al. 2012
  • 26. 26 Psychology, Neuroscience, Cognitive Sciences [1959] [1974] [1956] Span of absolute judgement
  • 28. 28 # Figure borrowed from https://distill.pub/2016/augmented-rnns/ # ATTENTION MECHANISM
  • 29. 29 ATTENTION MECHANISM # Figure borrowed from Lillicrap and Santoro, 2019. #
  • 31. 31 Self Relates different positions of a single sequence in order to compute a representation of the same sequence Also referred to as intra-attention Global vs. Local Global: alignment weights at are inferred from the current target state and all the source states Local: alignment weights at are inferred from the current target state and those source states in the window. Soft vs. Hard Soft: Alignment weights are learned and placed “softly” over all patches in the source image Hard: only selects one patch of the image to attend to at a time ATTENTION FAMILY
  • 32. ATTENTION-BASED Models 32 Sparse Attentive Backpropagation [Ke et al. 2018] Hierarchical Attention-Based RHN [Tao et al. 2018] Long Short-Term Memory-Networks [Cheng et al. 2016] Self-Attention GAN [Zhang et al. 2018] [A SNAPSHOT]
  • 33. 33 HIERARCHICAL ATTENTION-BASED RECURRENT HIGHWAY NETWORK # Figure borrowed from Tao et al. 2018. #
  • 34. ✦ Inspired by the cognitive analogy of reminding ๏ Designed to retrieve one or very few past states ✦ Incorporates a differentiable, sparse (hard) attention mechanism to select from past states 34SPARSE ATTENTIVE BACKTRACKING TCA THROUGH REMINDING # Figure borrowed from Ke et al. 2018. #
  • 35. 35 HEALTH CARE # Figure borrowed from Song et al. 2018. Multi-head Attention Additional masking to enable causality Inference Diagnoses, Length of stay Future illness, Mortality Temporal ordering Positional Encoding & Dense interpolation embedding MULTI-VARIATE Sensor measurement, Test results Irregular sampling, Missing values and measurement errors Heterogeneous, Presence of long range dependencies #
  • 36. TIME SERIES FORECASTING: ON THE ROLE OF PRE-PROCESSING TO GET IT RIGHT
  • 37. Auto ML Trend Anomaly Root Cause Forecast What If Optimization Real-timeNo Code Business Monitoring Business Forecast No Data Scientist ANODOT MISSION: MAKING BI AUTONOMOUS
  • 39. 4 FINTECH / TREASURY DEPARTMENT TRANSPORTATION / DATA SCIENCE DEPARTMENT How many drivers will I need tomorrow? DEMAND FORECAST GROWTH FORECAST Anticipate demand for inventory, products, service calls and much more. Anticipate revenue growth, expenses, cash flow and other KPIs. How many funds do I need to allocate per currency? Will we hit our targets next quarter? F O R E C A S T U S E C A S E S FINTECH / TREASURY DEPARTMENT X ? X ? TRANSPORTATION / BUSINESS OPERATIONS ALL INDUSTRIES / FINANCE DEPARTMENT
  • 40. AI-POWERED FORECASTING IN A TURN-KEY EXPERIENCE
  • 41. Correlate with Public Data PRODUCT COMPONENTS
  • 42. CONSIDERATION FOR ACCURATE FORECAST Discovering influencing metrics and events 1. Ensemble of models2. Identify and account for data anomalies 3. Identify and account for different time series behaviors 4.
  • 43. HOW TO DISCOVER INFLUENCING METRICS/EVENTS? • Target time series + forecast horizon • Millions of measures/events that can used as features INPUT: • Step 1 is computationally expensive for long sequences: Use LSH for speed • Which correlation function to use? CHALLENGES: STEP 1 Compute correlation between target and each measure/event (shifted by the horizons) STEP 2 Choose X most correlated measures STEP 3 Train forecast model PROCEDURE:
  • 44. THE EFFECT ON ACCURACY
  • 45. IDENTIFYING AND ACCOUNTING FOR DATA ANOMALIES ANOMALIES DEGRADE FORECASTING ACCURACY How to remedy the situation? Discover anomalies and use the information to create new features: Case 1: Anomalies can be explained by external factors – enhance the anomalies Case 2: Anomalies can’t be explained by external factors – weight down the anomalies
  • 46. • • • IDENTIFYING AND ACCOUNTING FOR DATA ANOMALIES Case 1: Anomalies can be explained by external factors – enhance the anomalies Case 2: Anomalies can’t be explained by external factors – weight down the anomalies Discover anomalies and use the information to create new features: SOLUTION:
  • 47. IDENTIFYING AND ACCOUNTING FOR DATA ANOMALIES: RESULT 1-15% accuracy improvement
  • 48. Varying behaviors: • Seasonality (length/strength) • Stationarity • Trends • Sparseness • Spikiness • … length trend Seasonal strength linearity curvature e_acf1 e_acf10 peak trough stability entropy x_acf1 x_acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 seas_acf1 arch_acf garch_acf arch_r2 garch_r2 hurst lumpiness spike max_level_shift time_level_shift max_var_shift time_var_shift max_kl_shift time_kl_shift unitroot_kpss unitroot_pp x_pacf5 diff1x_pacf5 diff2x_pacf5 seas_pacf crossing_points flat_spots *Rob Hyndman - tsfeatures https://pkg.robjhyndman.com/tsfeatures Handling Variations In Time Series Behaviors
  • 49. POTENTIAL ADVANTAGES ● Train one model for many time series ● Less data required per time series OPEN QUESTIONS ● Will a single model be more accurate than individual ones? ● Which types of differing behaviors impact the ability to train a single model adversely, and which do not? A SINGLE MODEL FOR THEM ALL?
  • 50. A SINGLE MODEL FOR THEM ALL? TESTING THE IMPACT OF EACH BEHAVIOR TYPE LSTMsLSTMsLSTM for each TS One LSTM for all TS LSTMsLSTMsLSTM for each TS One LSTM for all TS One LSTM for all TS Train Benchmark Forecast Horizontal line Bench mark loss (absolute error) Score 5 (model loss/ benchmark loss) Score 4 Score 3 Score 2 Score 1 Dataset Compute strength of behavior for each TS High strength TS Low strength TS
  • 51. (by feature) Score 5 high Score 3 low/high Score 1 low Score 2 low Score 4 high Impact of the behavior to mixed training Impact of the behavior on ability to forecast A SINGLE MODEL FOR THEM ALL? TESTING THE IMPACT OF EACH BEHAVIOR TYPE
  • 52. Impact on accuracy for joint training Impact on accuracy for variability of the behavior seasonal_strength curvature x_pacf5 linearity hurst x_acf1 entropy max_level_shift time_level_shift max_var_shift time_kl_shift unitroot_kpss unitroot_pp seasonal frequency arch acf garch_acf seas_pacf trough peak stability lumpiness diff2_acf10 e_acf10 diff1_acf10 x_acf10 arch_acf max_kl_shift 2 Seasonality Homodesdacity 1
  • 53. MAIN CONCLUSIONS / SOLUTIONS Impact on accuracy for joint training Impact on accuracy for variability of the behavior ● Two main factors preventing simple training of single models ● Seasonality: The frequency is the important factor, no shape ● homoscedasticity (same variance): prevents mixing, but strength of it impacts accuracy overall ● Other behaviors have lower mixing impact SOLUTIONS ● Separate TS for training based on behavior ● Embed behavior related features for a single model training.
  • 54. Requires efficient feature selection 1. Preprocessing before training boosts forecast accuracy 2. Seasonality and homoscedasticity are the key behaviors impacting ability to train joint models 3. KEY TAKEAWAYS Discovering influencing metrics and events 1. Identify and account for data anomalies 2. Identify and account for different time series behaviors 3.
  • 56. READINGS 37 [Rosenblatt] Principles of Neurodynamics: Perceptrons and the theory of brain mechanisms [Eds. Anderson and Rosenfeld] Neurocomputing: Foundations of Research [Eds. Rumelhart and McClelland] Parallel and Distributed Processing [Werbos] The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting [Eds. Chauvin and Rumelhart] Backpropagation: Theory, Architectures and Applications [Rojas] Neural Networks: A Systematic Introduction [BOOKS]
  • 57. READINGS 38 Perceptrons [Minsky and Papert, 1969] Une procedure d'apprentissage pour reseau a seuil assymetrique [Le Cun, 1985] The problem of serial order in behavior [Lashley, 1951] Beyond regression: New tools for prediction and analysis in the behavioral sciences [Werbos, 1974] Connectionist models and their properties [Feldman and Ballard, 1982] Learning-logic [Parker, 1985] [EARLY WORKS]
  • 58. READINGS 39 Learning internal representations by error propagation [Rumelhart, Hinton, and Williams, Chapter 8 in D. Rumelhart and F. McClelland, Eds., Parallel Distributed Processing, Vol. 1, 1986] (Generalized Delta Rule) Generalization of backpropagation with application to a recurrent gas market model [Werbos, 1988] Generalization of backpropagation to recurrent and higher order networks [Pineda, 1987] Backpropagation in perceptrons with feedback [Almeida, 1987] Second-order backpropagation: Implementing an optimal O(n) approximation to Newton's method in an artificial neural network [Parker, 1987] Learning phonetic features using connectionist networks: an experiment in speech recognition [Watrous and Shastri, 1987] (Time-delay NN) [BACKPROPAGATION]
  • 59. READINGS 40 Backpropagation: Past and future [Werbos, 1988] Adaptive state representation and estimation using recurrent connectionist networks [Williams, 1990] Generalization of back propagation to recurrent and higher order neural networks [Pineda, 1988] Learning state space trajectories in recurrent neural networks [Pearlmutter 1989] Parallelism, hierarchy, scaling in time-delay neural networks for spotting Japanese phonemes/CV-syllables [Sawai et al. 1989] The role of time in natural intelligence: implications for neural network and artificial intelligence research [Klopf and Morgan, 1990] [BACKPROPAGATION]
  • 60. READINGS 41 Recurrent Neural Network Regularization [Zaremba et al. 2014] Regularizing RNNs by Stabilizing Activations [Krueger and Memisevic, 2016] Sampling-based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks [Chernodub and Nowicki 2016] A Theoretically Grounded Application of Dropout in Recurrent Neural Networks [Gal and Ghahramani, 2016] Noisin: Unbiased Regularization for Recurrent Neural Networks [Dieng et al. 2018] State-Regularized Recurrent Neural Networks [Wang and Niepert, 2019] [REGULARIZATION of RNNs]
  • 61. READINGS 42 A Decomposable Attention Model for Natural Language Inference [Parikh et al. 2016] Hybrid Computing Using A Neural Network With Dynamic External Memory [Graves et al. 2017] Image Transformer [Parmar et al. 2018] Universal Transformers [Dehghani et al. 2019] The Evolved Transformer [So et al. 2019] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context [Dai et al. 2019] [ATTENTION & TRANSFORMERS]
  • 62. READINGS 43 Financial Time Series Prediction using hybrids of Chaos Theory, Multi-layer Perceptron and Multi-objective Evolutionary Algorithms [Ravi et al. 2017] Model-free Prediction of Noisy Chaotic Time Series by Deep Learning [Yeo, 2017] DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks [Salinas et al. 2017] Real-Valued (Medical) Time Series Generation With Recurrent Conditional GANs [Hyland et al. 2017] R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting [Goel et al. 2017] Temporal Pattern Attention for Multivariate Time Series Forecasting [Shih et al. 2018] [TIME SERIES PREDICTION]
  • 63. READINGS 44 Unbiased Online Recurrent Optimization [Tallec and Ollivier, 2017] Approximating real-time recurrent learning with random Kronecker factors [Mujika et al. 2018] Theory and Algorithms for Forecasting Time Series [Kuznetsov and Mohri, 2018] Foundations of Sequence-to-Sequence Modeling for Time Series [Kuznetsov and Meriet, 2018] On the Variance Unbiased Recurrent Optimization [Cooijmans and Martens, 2019] Backpropagation through time and the brain [Lillicrap and Santoro, 2019] [POTPOURRI]
  • 64. RESOURCES 45 http://colah.github.io/posts/2015-08-Understanding-LSTMs/ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ A review of Dropout as applied to RNNs https://medium.com/@bingobee01/a-review-of-dropout-as-applied-to-rnns-72e79ecd5b7b https://distill.pub/2016/augmented-rnns/ https://distill.pub/2019/memorization-in-rnns/ https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html Using the latest advancements in deep learning to predict stock price movements https://towardsdatascience.com/aifortrading-2edd6fac689d How to Use Weight Regularization with LSTM Networks for Time Series Forecasting https://machinelearningmastery.com/use-weight-regularization-lstm-networks-time-series-forecasting/