SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Data Analysis Course
Time Series Analysis & Forecasting
Venkat Reddy
Contents
• ARIMA
• Stationarity
• AR process
• MA process
• Main steps in ARIMA
• Forecasting using ARIMA model
• Goodness of fit
DataAnalysisCourse
VenkatReddy
2
Drawbacks of the use of traditional
models
• There is no systematic approach for the identification and
selection of an appropriate model, and therefore, the
identification process is mainly trial-and-error
• There is difficulty in verifying the validity of the model
• Most traditional methods were developed from intuitive and
practical considerations rather than from a statistical foundation
DataAnalysisCourse
VenkatReddy
3
ARIMA Models
• Autoregressive Integrated Moving-average
• A “stochastic” modeling approach that can be used to
calculate the probability of a future value lying between two
specified limits
DataAnalysisCourse
VenkatReddy
4
AR & MA Models
• Autoregressive AR process:
• Series current values depend on its own previous values
• AR(p) - Current values depend on its own p-previous values
• P is the order of AR process
• Moving average MA process:
• The current deviation from mean depends on previous deviations
• MA(q) - The current deviation from mean depends on q- previous
deviations
• q is the order of MA process
• Autoregressive Moving average ARMA process
DataAnalysisCourse
VenkatReddy
5
AR Process
DataAnalysisCourse
VenkatReddy
6AR(1) yt = a1* yt-1
AR(2) yt = a1* yt-1 +a2* yt-2
AR(3) yt = a1* yt-1 + a2* yt-2 +a3* yt-2
MA Process
DataAnalysisCourse
VenkatReddy
7
MA(1) εt = b1*εt-1
MA(2) εt = b1*εt-1 + b2*εt-2
MA(3) εt = b1*εt-1 + b2*εt-2+ b3*εt-3
ARIMA Models
• Autoregressive (AR) process:
• Series current values depend on its own previous values
• Moving average (MA) process:
• The current deviation from mean depends on previous deviations
• Autoregressive Moving average (ARMA) process
• Autoregressive Integrated Moving average
(ARIMA)process.
• ARIMA is also known as Box-Jenkins approach. It is popular because of its
generality;
• It can handle any series, with or without seasonal elements, and it has well-
documented computer programs
DataAnalysisCourse
VenkatReddy
8
ARIMA Model
→ AR filter → Integration filter → MA filter → εt
(long term) (stochastic trend) (short term) (white noise error)
ARIMA (2,0,1) yt = a1yt-1 + a2yt-2 + b1εt-1
ARIMA (3,0,1) yt = a1yt-1 + a2yt-2 + a3yt-3 + b1εt-1
ARIMA (1,1,0) Δyt = a1 Δyt-1 + εt , where Δyt = yt - yt-1
ARIMA (2,1,0) Δyt = a1 Δyt-1 + a2Δ yt-2 + εt where Δyt = yt - yt-1
To build a time series model issuing ARIMA, we need to study the time
series and identify p,d,q
DataAnalysisCourse
VenkatReddy
9
ARIMA equations
• ARIMA(1,0,0)
• yt = a1yt-1 + εt
• ARIMA(2,0,0)
• yt = a1yt-1 + a2yt-2 + εt
• ARIMA (2,1,1)
• Δyt = a1 Δyt-1 + a2Δ yt-2 + b1εt-1 where Δyt = yt - yt-1
DataAnalysisCourse
VenkatReddy
10
Overall Time series Analysis &
Forecasting Process
• Prepare the data for model building- Make it stationary
• Identify the model type
• Estimate the parameters
• Forecast the future values
DataAnalysisCourse
VenkatReddy
11
ARIMA (p,d,q) modeling
To build a time series model issuing ARIMA, we need to study the time
series and identify p,d,q
• Ensuring Stationarity
• Determine the appropriate values of d
• Identification:
• Determine the appropriate values of p & q using the ACF, PACF, and
unit root tests
• p is the AR order, d is the integration order, q is the MA order
• Estimation :
• Estimate an ARIMA model using values of p, d, & q you think are
appropriate.
• Diagnostic checking:
• Check residuals of estimated ARIMA model(s) to see if they are white
noise; pick best model with well behaved residuals.
• Forecasting:
• Produce out of sample forecasts or set aside last few data points for
in-sample forecasting.
DataAnalysisCourse
VenkatReddy
12
The Box-Jenkins Approach
DataAnalysisCourse
VenkatReddy
13
1.Differencing the
series to achieve
stationary
2.Identify the model
3.Estimate the
parameters of the
model
Diagnostic checking.
Is the model
adequate?
No
Yes4. Use Model for forecasting
Step-1 : Stationarity
DataAnalysisCourse
VenkatReddy
14
Some non stationary series
DataAnalysisCourse
VenkatReddy
15
1 2
3 4
Stationarity
• In order to model a time series with the Box-Jenkins approach,
the series has to be stationary
• In practical terms, the series is stationary if tends to wonder
more or less uniformly about some fixed level
• In statistical terms, a stationary process is assumed to be in a
particular state of statistical equilibrium, i.e., p(xt) is the same
for all t
• In particular, if zt is a stationary process, then the first
difference zt = zt - zt-1and higher differences
d
zt are
stationary
DataAnalysisCourse
VenkatReddy
16
Testing Stationarity
• Dickey-Fuller test
• P value has to be less than 0.05 or 5%
• If p value is greater than 0.05 or 5%, you accept the null hypothesis,
you conclude that the time series has a unit root.
• In that case, you should first difference the series before proceeding
with analysis.
• What DF test ?
• Imagine a series where a fraction of the current value is depending
on a fraction of previous value of the series.
• DF builds a regression line between fraction of the current value Δyt
and fraction of previous value δyt-1
• The usual t-statistic is not valid, thus D-F developed appropriate
critical values. If P value of DF test is <5% then the series is
stationary
DataAnalysisCourse
VenkatReddy
17
Demo: Testing Stationarity
• Sales_1 data
DataAnalysisCourse
VenkatReddy
18
Stochastic trend: Inexplicable changes in direction
Demo: Testing Stationarity
DataAnalysisCourse
VenkatReddy
19
Augmented Dickey-Fuller Unit Root Tests
Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F
Zero
Mean
0 0.3251 0.7547 0.74 0.8695
1 0.3768 0.7678 1.26 0.9435
2 0.3262 0.7539 1.05 0.9180
Single
Mean
0 -6.9175 0.2432 -1.77 0.3858 2.05 0.5618
1 -3.5970 0.5662 -1.06 0.7163 1.52 0.6913
2 -3.7030 0.5522 -0.88 0.7783 1.02 0.8116
Trend 0 -11.8936 0.2428 -2.50 0.3250 3.16 0.5624
1 -7.1620 0.6017 -1.60 0.7658 1.34 0.9063
2 -9.0903 0.4290 -1.53 0.7920 1.35 0.9041
Achieving Stationarity
• Differencing : Transformation of the series to a new time series
where the values are the differences between consecutive values
• Procedure may be applied consecutively more than once, giving rise
to the "first differences", "second differences", etc.
• Regular differencing (RD)
(1st order) xt = xt – xt-1
(2nd order) 2xt = ( xt - xt-1 )=xt – 2xt-1 + xt-2
• It is unlikely that more than two regular differencing would ever be
needed
• Sometimes regular differencing by itself is not sufficient and prior
transformation is also needed
DataAnalysisCourse
VenkatReddy
20
Differentiation
DataAnalysisCourse
VenkatReddy
21
Actual Series
Series After
Differentiation
Demo: Achieving Stationarity
DataAnalysisCourse
VenkatReddy
22
data lagsales_1;
set sales_1;
sales1=sales-lag1(sales);
run;
Augmented Dickey-Fuller Unit Root Tests
Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F
Zero
Mean
0 -37.7155 <.0001 -7.46 <.0001
1 -32.4406 <.0001 -3.93 0.0003
2 -19.3900 0.0006 -2.38 0.0191
Single
Mean
0 -38.9718 <.0001 -7.71 0.0002 29.70 0.0010
1 -37.3049 <.0001 -4.10 0.0036 8.43 0.0010
2 -25.6253 0.0002 -2.63 0.0992 3.50 0.2081
Trend 0 -39.0703 <.0001 -7.58 0.0001 28.72 0.0010
1 -37.9046 <.0001 -4.08 0.0180 8.35 0.0163
2 -25.7179 0.0023 -2.59 0.2875 3.37 0.5234
Demo: Achieving Stationarity
DataAnalysisCourse
VenkatReddy
23
Achieving Stationarity-Othermethods
• Is the trend stochastic or deterministic?
• If stochastic (inexplicable changes in direction): use differencing
• If deterministic(plausible physical explanation for a trend or
seasonal cycle) : use regression
• Check if there is variance that changes with time
• YES : make variance constant with log or square root
transformation
• Remove the trend in mean with:
• 1st/2nd order differencing
• Smoothing and differencing (seasonality)
• If there is seasonality in the data:
• Moving average and differencing
• Smoothing
DataAnalysisCourse
VenkatReddy
24
Step2 : Identification
DataAnalysisCourse
VenkatReddy
25
Identification of orders p and q
• Identification starts with d
• ARIMA(p,d,q)
• What is Integration here?
• First we need to make the time series stationary
• We need to learn about ACF & PACF to identify p,q
• Once we are working with a stationary time series, we can
examine the ACF and PACF to help identify the proper number
of lagged y (AR) terms and ε (MA) terms.
DataAnalysisCourse
VenkatReddy
26
Autocorrelation Function (ACF)
• Autocorrelation is a correlation coefficient. However, instead
of correlation between two different variables, the correlation
is between two values of the same variable at times Xi and
Xi+k.
• Correlation with lag-1, lag2, lag3 etc.,
• The ACF represents the degree of persistence over respective
lags of a variable.
ρk = γk / γ0 = covariance at lag k/ variance
ρk = E[(yt – μ)(yt-k – μ)]2
E[(yt – μ)2]
ACF (0) = 1, ACF (k) = ACF (-k)
DataAnalysisCourse
VenkatReddy
27
ACF Graph
DataAnalysisCourse
VenkatReddy
28
-0.50
0.000.501.00
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
Partial Autocorrelation Function (PACF)
• The exclusive correlation coefficient
• Partial regression coefficient - The lag k partial autocorrelation is
the partial regression coefficient, θkk in the kth order auto regression
• In general, the "partial" correlation between two variables is the
amount of correlation between them which is not explained by their
mutual correlations with a specified set of other variables.
• For example, if we are regressing a variable Y on other variables X1,
X2, and X3, the partial correlation between Y and X3 is the amount
of correlation between Y and X3 that is not explained by their
common correlations with X1 and X2.
• yt = θk1yt-1 + θk2yt-2 + …+ θkkyt-k + εt
• Partial correlation measures the degree of association between
two random variables, with the effect of a set of controlling random
variables removed.
DataAnalysisCourse
VenkatReddy
29
PACF Graph
DataAnalysisCourse
VenkatReddy
30
-0.50
0.000.501.00
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
Identificationof AR Processes& its order -p
• For AR models, the ACF will dampen exponentially
• The PACF will identify the order of the AR model:
• The AR(1) model (yt = a1yt-1 + εt) would have one significant spike
at lag 1 on the PACF.
• The AR(3) model (yt = a1yt-1+a2yt-2+a3yt-3+εt) would have
significant spikes on the PACF at lags 1, 2, & 3.
DataAnalysisCourse
VenkatReddy
31
AR(1) model
DataAnalysisCourse
VenkatReddy
32
yt = 0.8yt-1 + εt
AR(1) model
DataAnalysisCourse
VenkatReddy
33
yt = 0.77yt-1 + εt
AR(1) model
DataAnalysisCourse
VenkatReddy
34
yt = 0.95yt-1 + εt
AR(2) model
DataAnalysisCourse
VenkatReddy
35
yt = 0.44yt-1 + 0.4yt-2 + εt
AR(2) model
DataAnalysisCourse
VenkatReddy
36
yt = 0.5yt-1 + 0.2yt-2 + εt
AR(3) model
DataAnalysisCourse
VenkatReddy
37
yt = 0.3yt-1 + 0.3yt-2 + 0.1yt-3 +εt
Once again
DataAnalysisCourse
VenkatReddy
38
Identificationof MA Processes& its order - q
• Recall that a MA(q) can be represented as an AR(∞), thus we expect the
opposite patterns for MA processes.
• The PACF will dampen exponentially.
• The ACF will be used to identify the order of the MA process.
• MA(1) (yt = εt + b1 εt-1) has one significant spike in the ACF at lag 1.
• MA (3) (yt = εt + b1 εt-1 + b2 εt-2 + b3 εt-3) has three significant spikes in the
ACF at lags 1, 2, & 3.
DataAnalysisCourse
VenkatReddy
39
MA(1)
DataAnalysisCourse
VenkatReddy
40
yt = -0.9εt-1
MA(1)
DataAnalysisCourse
VenkatReddy
41
yt = 0.7εt-1
MA(1)
DataAnalysisCourse
VenkatReddy
42
yt = 0.99εt-1
MA(2)
DataAnalysisCourse
VenkatReddy
43
yt = 0.5εt-1 + 0.5εt-2
MA(2)
DataAnalysisCourse
VenkatReddy
44
yt = 0.8εt-1 + 0.9εt-2
MA(3)
DataAnalysisCourse
VenkatReddy
45
yt = 0.8εt-1 + 0.9εt-2 + 0.6εt-3
Once again
DataAnalysisCourse
VenkatReddy
46
ARMA(1,1)
DataAnalysisCourse
VenkatReddy
47
yt = 0.6yt-1 + 0.8εt-1
ARMA(1,1)
DataAnalysisCourse
VenkatReddy
48
yt = 0.78yt-1 + 0.9εt-1
ARIMA(2,1)
DataAnalysisCourse
VenkatReddy
49
yt = 0.4yt-1 + 0.3yt-2 + 0.9εt-1
ARMA(1,2)
DataAnalysisCourse
VenkatReddy
50
yt = 0.8yt-1 + 0.4εt-1 + 0.55εt-2
ARMA Model Identification
DataAnalysisCourse
VenkatReddy
51
Demo1: Identification of the model
• ACF is dampening, PCF graph cuts off. - Perfect example of an AR process
DataAnalysisCourse
VenkatReddy
52
proc arima data= chem_readings plots=all;
identify var=reading scan esacf center ;
run;
Demo: Identification of the model
PACF cuts off after lag 2
1. d = 0, p =2, q= 0
DataAnalysisCourse
VenkatReddy
53
SAS ARMA(p+d,q) Tentative Order Selection Tests
SCAN ESACF
p+d q p+d q
2 0 2 3
1 5 4 4
5 3
yt = a1yt-1 + a2yt-2 + εt
LAB: Identification of model
• Download web views data
• Use sgplot to create a trend chart
• What does ACF & PACF graphs say?
• Identify the model using below table
• Write the model equation
DataAnalysisCourse
VenkatReddy
54
Step3 : Estimation
DataAnalysisCourse
VenkatReddy
55
Parameter Estimate
• We already know the model equation. AR(1,0,0) or AR(2,1,0)
or ARIMA(2,1,1)
• We need to estimate the coefficients using Least squares.
Minimizing the sum of squares of deviations
DataAnalysisCourse
VenkatReddy
56
Demo1: Parameter Estimation
• Chemical reading data
DataAnalysisCourse
VenkatReddy
57
proc arima data=chem_readings;
identify var=reading scan esacf center;
estimate p=2 q=0 noint method=ml;
run;
yt = 0. 424yt-1 + 0.2532yt-2 + εt
Maximum Likelihood Estimation
Parameter Estimate Standard
Error
t Value Approx
Pr > |t|
Lag
AR1,1 0.42444 0.06928 6.13 <.0001 1
AR1,2 0.25315 0.06928 3.65 0.0003 2
Lab: Parameter Estimation
• Estimate the parameters for webview data
DataAnalysisCourse
VenkatReddy
58
Step4 : Forecasting
DataAnalysisCourse
VenkatReddy
59
Forecasting
• Now the model is ready
• We simply need to use this model for forecasting
DataAnalysisCourse
VenkatReddy
60
proc arima data=chem_readings;
identify var=reading scan esacf center;
estimate p=2 q=0 noint method=ml;
forecast lead=4 ;
run;
Forecasts for variable Reading
Obs Forecast Std Error 95% Confidence Limits
198 17.2405 0.3178 16.6178 17.8633
199 17.2235 0.3452 16.5469 17.9000
200 17.1759 0.3716 16.4475 17.9043
201 17.1514 0.3830 16.4007 17.9020
LAB: Forecasting using ARIMA
• Forecast the number of sunspots for next three hours
DataAnalysisCourse
VenkatReddy
61
Validation: How good is my model?
• Does our model really give an adequate description of the
data
• Two criteria to check the goodness of fit
• Akaike information criterion (AIC)
• Schwartz Bayesiancriterion (SBC)/Bayesian information criterion
(BIC).
• These two measures are useful in comparing two models.
• The smaller the AIC & SBC the better the model
DataAnalysisCourse
VenkatReddy
62
Goodness of fit
• Remember… Residual analysis and Mean deviation, Mean
Absolute Deviation and Root Mean Square errors?
• Four common techniques are the:
• Mean absolute deviation,
• Mean absolute percent error
• Mean square error,
• Root mean square error.
DataAnalysisCourse
VenkatReddy
63
n
i
ii
n1
YˆY
=MAD
n
i
ii
n1
2
YˆY
=MSE
MSERMSE
n
i i
ii
n 1 Y
YˆY100
=MAPE
Lab: Overall Steps on sunspot
example
• Import the time series data
• Prepare the data for model building- Make it stationary
• Identify the model type
• Estimate the parameters
• Forecast the future values
DataAnalysisCourse
VenkatReddy
64
Thank you
DataAnalysisCourse
VenkatReddy
65

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
 
Time series
Time seriesTime series
Time series
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
time series analysis
time series analysistime series analysis
time series analysis
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
 
Lesson 4 ar-ma
Lesson 4 ar-maLesson 4 ar-ma
Lesson 4 ar-ma
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
Time Series - 1
Time Series - 1Time Series - 1
Time Series - 1
 
Time Series Forecasting Project Presentation.
Time Series Forecasting Project  Presentation.Time Series Forecasting Project  Presentation.
Time Series Forecasting Project Presentation.
 
Time Series, Moving Average
Time Series, Moving AverageTime Series, Moving Average
Time Series, Moving Average
 
Time series.ppt
Time series.pptTime series.ppt
Time series.ppt
 
AR model
AR modelAR model
AR model
 
Arch & Garch Processes
Arch & Garch ProcessesArch & Garch Processes
Arch & Garch Processes
 
Time Series Analysis.pptx
Time Series Analysis.pptxTime Series Analysis.pptx
Time Series Analysis.pptx
 
Lesson 1 introduction_to_time_series
Lesson 1 introduction_to_time_seriesLesson 1 introduction_to_time_series
Lesson 1 introduction_to_time_series
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Time series Forecasting
Time series ForecastingTime series Forecasting
Time series Forecasting
 
Non Linear Equation
Non Linear EquationNon Linear Equation
Non Linear Equation
 

Ähnlich wie ARIMA

2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Paulo Faria
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
Liang Xie, PhD
 

Ähnlich wie ARIMA (20)

6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf
 
ARIMA Model.ppt
ARIMA Model.pptARIMA Model.ppt
ARIMA Model.ppt
 
ARIMA Model for analysis of time series data.ppt
ARIMA Model for analysis of time series data.pptARIMA Model for analysis of time series data.ppt
ARIMA Model for analysis of time series data.ppt
 
ARIMA Model.ppt
ARIMA Model.pptARIMA Model.ppt
ARIMA Model.ppt
 
ARIMA.pptx
ARIMA.pptxARIMA.pptx
ARIMA.pptx
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Andres hernandez ai_machine_learning_london_nov2017
Andres hernandez ai_machine_learning_london_nov2017Andres hernandez ai_machine_learning_london_nov2017
Andres hernandez ai_machine_learning_london_nov2017
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
timeseries cheat sheet with example code for R
timeseries cheat sheet with example code for Rtimeseries cheat sheet with example code for R
timeseries cheat sheet with example code for R
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
forecasting model
forecasting modelforecasting model
forecasting model
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
11 whiteboxtesting
11 whiteboxtesting11 whiteboxtesting
11 whiteboxtesting
 

Mehr von Venkata Reddy Konasani

Mehr von Venkata Reddy Konasani (20)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 

Kürzlich hochgeladen

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 

Kürzlich hochgeladen (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 

ARIMA

  • 1. Data Analysis Course Time Series Analysis & Forecasting Venkat Reddy
  • 2. Contents • ARIMA • Stationarity • AR process • MA process • Main steps in ARIMA • Forecasting using ARIMA model • Goodness of fit DataAnalysisCourse VenkatReddy 2
  • 3. Drawbacks of the use of traditional models • There is no systematic approach for the identification and selection of an appropriate model, and therefore, the identification process is mainly trial-and-error • There is difficulty in verifying the validity of the model • Most traditional methods were developed from intuitive and practical considerations rather than from a statistical foundation DataAnalysisCourse VenkatReddy 3
  • 4. ARIMA Models • Autoregressive Integrated Moving-average • A “stochastic” modeling approach that can be used to calculate the probability of a future value lying between two specified limits DataAnalysisCourse VenkatReddy 4
  • 5. AR & MA Models • Autoregressive AR process: • Series current values depend on its own previous values • AR(p) - Current values depend on its own p-previous values • P is the order of AR process • Moving average MA process: • The current deviation from mean depends on previous deviations • MA(q) - The current deviation from mean depends on q- previous deviations • q is the order of MA process • Autoregressive Moving average ARMA process DataAnalysisCourse VenkatReddy 5
  • 6. AR Process DataAnalysisCourse VenkatReddy 6AR(1) yt = a1* yt-1 AR(2) yt = a1* yt-1 +a2* yt-2 AR(3) yt = a1* yt-1 + a2* yt-2 +a3* yt-2
  • 7. MA Process DataAnalysisCourse VenkatReddy 7 MA(1) εt = b1*εt-1 MA(2) εt = b1*εt-1 + b2*εt-2 MA(3) εt = b1*εt-1 + b2*εt-2+ b3*εt-3
  • 8. ARIMA Models • Autoregressive (AR) process: • Series current values depend on its own previous values • Moving average (MA) process: • The current deviation from mean depends on previous deviations • Autoregressive Moving average (ARMA) process • Autoregressive Integrated Moving average (ARIMA)process. • ARIMA is also known as Box-Jenkins approach. It is popular because of its generality; • It can handle any series, with or without seasonal elements, and it has well- documented computer programs DataAnalysisCourse VenkatReddy 8
  • 9. ARIMA Model → AR filter → Integration filter → MA filter → εt (long term) (stochastic trend) (short term) (white noise error) ARIMA (2,0,1) yt = a1yt-1 + a2yt-2 + b1εt-1 ARIMA (3,0,1) yt = a1yt-1 + a2yt-2 + a3yt-3 + b1εt-1 ARIMA (1,1,0) Δyt = a1 Δyt-1 + εt , where Δyt = yt - yt-1 ARIMA (2,1,0) Δyt = a1 Δyt-1 + a2Δ yt-2 + εt where Δyt = yt - yt-1 To build a time series model issuing ARIMA, we need to study the time series and identify p,d,q DataAnalysisCourse VenkatReddy 9
  • 10. ARIMA equations • ARIMA(1,0,0) • yt = a1yt-1 + εt • ARIMA(2,0,0) • yt = a1yt-1 + a2yt-2 + εt • ARIMA (2,1,1) • Δyt = a1 Δyt-1 + a2Δ yt-2 + b1εt-1 where Δyt = yt - yt-1 DataAnalysisCourse VenkatReddy 10
  • 11. Overall Time series Analysis & Forecasting Process • Prepare the data for model building- Make it stationary • Identify the model type • Estimate the parameters • Forecast the future values DataAnalysisCourse VenkatReddy 11
  • 12. ARIMA (p,d,q) modeling To build a time series model issuing ARIMA, we need to study the time series and identify p,d,q • Ensuring Stationarity • Determine the appropriate values of d • Identification: • Determine the appropriate values of p & q using the ACF, PACF, and unit root tests • p is the AR order, d is the integration order, q is the MA order • Estimation : • Estimate an ARIMA model using values of p, d, & q you think are appropriate. • Diagnostic checking: • Check residuals of estimated ARIMA model(s) to see if they are white noise; pick best model with well behaved residuals. • Forecasting: • Produce out of sample forecasts or set aside last few data points for in-sample forecasting. DataAnalysisCourse VenkatReddy 12
  • 13. The Box-Jenkins Approach DataAnalysisCourse VenkatReddy 13 1.Differencing the series to achieve stationary 2.Identify the model 3.Estimate the parameters of the model Diagnostic checking. Is the model adequate? No Yes4. Use Model for forecasting
  • 15. Some non stationary series DataAnalysisCourse VenkatReddy 15 1 2 3 4
  • 16. Stationarity • In order to model a time series with the Box-Jenkins approach, the series has to be stationary • In practical terms, the series is stationary if tends to wonder more or less uniformly about some fixed level • In statistical terms, a stationary process is assumed to be in a particular state of statistical equilibrium, i.e., p(xt) is the same for all t • In particular, if zt is a stationary process, then the first difference zt = zt - zt-1and higher differences d zt are stationary DataAnalysisCourse VenkatReddy 16
  • 17. Testing Stationarity • Dickey-Fuller test • P value has to be less than 0.05 or 5% • If p value is greater than 0.05 or 5%, you accept the null hypothesis, you conclude that the time series has a unit root. • In that case, you should first difference the series before proceeding with analysis. • What DF test ? • Imagine a series where a fraction of the current value is depending on a fraction of previous value of the series. • DF builds a regression line between fraction of the current value Δyt and fraction of previous value δyt-1 • The usual t-statistic is not valid, thus D-F developed appropriate critical values. If P value of DF test is <5% then the series is stationary DataAnalysisCourse VenkatReddy 17
  • 18. Demo: Testing Stationarity • Sales_1 data DataAnalysisCourse VenkatReddy 18 Stochastic trend: Inexplicable changes in direction
  • 19. Demo: Testing Stationarity DataAnalysisCourse VenkatReddy 19 Augmented Dickey-Fuller Unit Root Tests Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F Zero Mean 0 0.3251 0.7547 0.74 0.8695 1 0.3768 0.7678 1.26 0.9435 2 0.3262 0.7539 1.05 0.9180 Single Mean 0 -6.9175 0.2432 -1.77 0.3858 2.05 0.5618 1 -3.5970 0.5662 -1.06 0.7163 1.52 0.6913 2 -3.7030 0.5522 -0.88 0.7783 1.02 0.8116 Trend 0 -11.8936 0.2428 -2.50 0.3250 3.16 0.5624 1 -7.1620 0.6017 -1.60 0.7658 1.34 0.9063 2 -9.0903 0.4290 -1.53 0.7920 1.35 0.9041
  • 20. Achieving Stationarity • Differencing : Transformation of the series to a new time series where the values are the differences between consecutive values • Procedure may be applied consecutively more than once, giving rise to the "first differences", "second differences", etc. • Regular differencing (RD) (1st order) xt = xt – xt-1 (2nd order) 2xt = ( xt - xt-1 )=xt – 2xt-1 + xt-2 • It is unlikely that more than two regular differencing would ever be needed • Sometimes regular differencing by itself is not sufficient and prior transformation is also needed DataAnalysisCourse VenkatReddy 20
  • 22. Demo: Achieving Stationarity DataAnalysisCourse VenkatReddy 22 data lagsales_1; set sales_1; sales1=sales-lag1(sales); run; Augmented Dickey-Fuller Unit Root Tests Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F Zero Mean 0 -37.7155 <.0001 -7.46 <.0001 1 -32.4406 <.0001 -3.93 0.0003 2 -19.3900 0.0006 -2.38 0.0191 Single Mean 0 -38.9718 <.0001 -7.71 0.0002 29.70 0.0010 1 -37.3049 <.0001 -4.10 0.0036 8.43 0.0010 2 -25.6253 0.0002 -2.63 0.0992 3.50 0.2081 Trend 0 -39.0703 <.0001 -7.58 0.0001 28.72 0.0010 1 -37.9046 <.0001 -4.08 0.0180 8.35 0.0163 2 -25.7179 0.0023 -2.59 0.2875 3.37 0.5234
  • 24. Achieving Stationarity-Othermethods • Is the trend stochastic or deterministic? • If stochastic (inexplicable changes in direction): use differencing • If deterministic(plausible physical explanation for a trend or seasonal cycle) : use regression • Check if there is variance that changes with time • YES : make variance constant with log or square root transformation • Remove the trend in mean with: • 1st/2nd order differencing • Smoothing and differencing (seasonality) • If there is seasonality in the data: • Moving average and differencing • Smoothing DataAnalysisCourse VenkatReddy 24
  • 26. Identification of orders p and q • Identification starts with d • ARIMA(p,d,q) • What is Integration here? • First we need to make the time series stationary • We need to learn about ACF & PACF to identify p,q • Once we are working with a stationary time series, we can examine the ACF and PACF to help identify the proper number of lagged y (AR) terms and ε (MA) terms. DataAnalysisCourse VenkatReddy 26
  • 27. Autocorrelation Function (ACF) • Autocorrelation is a correlation coefficient. However, instead of correlation between two different variables, the correlation is between two values of the same variable at times Xi and Xi+k. • Correlation with lag-1, lag2, lag3 etc., • The ACF represents the degree of persistence over respective lags of a variable. ρk = γk / γ0 = covariance at lag k/ variance ρk = E[(yt – μ)(yt-k – μ)]2 E[(yt – μ)2] ACF (0) = 1, ACF (k) = ACF (-k) DataAnalysisCourse VenkatReddy 27
  • 28. ACF Graph DataAnalysisCourse VenkatReddy 28 -0.50 0.000.501.00 0 10 20 30 40 Lag Bartlett's formula for MA(q) 95% confidence bands
  • 29. Partial Autocorrelation Function (PACF) • The exclusive correlation coefficient • Partial regression coefficient - The lag k partial autocorrelation is the partial regression coefficient, θkk in the kth order auto regression • In general, the "partial" correlation between two variables is the amount of correlation between them which is not explained by their mutual correlations with a specified set of other variables. • For example, if we are regressing a variable Y on other variables X1, X2, and X3, the partial correlation between Y and X3 is the amount of correlation between Y and X3 that is not explained by their common correlations with X1 and X2. • yt = θk1yt-1 + θk2yt-2 + …+ θkkyt-k + εt • Partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. DataAnalysisCourse VenkatReddy 29
  • 30. PACF Graph DataAnalysisCourse VenkatReddy 30 -0.50 0.000.501.00 0 10 20 30 40 Lag 95% Confidence bands [se = 1/sqrt(n)]
  • 31. Identificationof AR Processes& its order -p • For AR models, the ACF will dampen exponentially • The PACF will identify the order of the AR model: • The AR(1) model (yt = a1yt-1 + εt) would have one significant spike at lag 1 on the PACF. • The AR(3) model (yt = a1yt-1+a2yt-2+a3yt-3+εt) would have significant spikes on the PACF at lags 1, 2, & 3. DataAnalysisCourse VenkatReddy 31
  • 37. AR(3) model DataAnalysisCourse VenkatReddy 37 yt = 0.3yt-1 + 0.3yt-2 + 0.1yt-3 +εt
  • 39. Identificationof MA Processes& its order - q • Recall that a MA(q) can be represented as an AR(∞), thus we expect the opposite patterns for MA processes. • The PACF will dampen exponentially. • The ACF will be used to identify the order of the MA process. • MA(1) (yt = εt + b1 εt-1) has one significant spike in the ACF at lag 1. • MA (3) (yt = εt + b1 εt-1 + b2 εt-2 + b3 εt-3) has three significant spikes in the ACF at lags 1, 2, & 3. DataAnalysisCourse VenkatReddy 39
  • 52. Demo1: Identification of the model • ACF is dampening, PCF graph cuts off. - Perfect example of an AR process DataAnalysisCourse VenkatReddy 52 proc arima data= chem_readings plots=all; identify var=reading scan esacf center ; run;
  • 53. Demo: Identification of the model PACF cuts off after lag 2 1. d = 0, p =2, q= 0 DataAnalysisCourse VenkatReddy 53 SAS ARMA(p+d,q) Tentative Order Selection Tests SCAN ESACF p+d q p+d q 2 0 2 3 1 5 4 4 5 3 yt = a1yt-1 + a2yt-2 + εt
  • 54. LAB: Identification of model • Download web views data • Use sgplot to create a trend chart • What does ACF & PACF graphs say? • Identify the model using below table • Write the model equation DataAnalysisCourse VenkatReddy 54
  • 56. Parameter Estimate • We already know the model equation. AR(1,0,0) or AR(2,1,0) or ARIMA(2,1,1) • We need to estimate the coefficients using Least squares. Minimizing the sum of squares of deviations DataAnalysisCourse VenkatReddy 56
  • 57. Demo1: Parameter Estimation • Chemical reading data DataAnalysisCourse VenkatReddy 57 proc arima data=chem_readings; identify var=reading scan esacf center; estimate p=2 q=0 noint method=ml; run; yt = 0. 424yt-1 + 0.2532yt-2 + εt Maximum Likelihood Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag AR1,1 0.42444 0.06928 6.13 <.0001 1 AR1,2 0.25315 0.06928 3.65 0.0003 2
  • 58. Lab: Parameter Estimation • Estimate the parameters for webview data DataAnalysisCourse VenkatReddy 58
  • 60. Forecasting • Now the model is ready • We simply need to use this model for forecasting DataAnalysisCourse VenkatReddy 60 proc arima data=chem_readings; identify var=reading scan esacf center; estimate p=2 q=0 noint method=ml; forecast lead=4 ; run; Forecasts for variable Reading Obs Forecast Std Error 95% Confidence Limits 198 17.2405 0.3178 16.6178 17.8633 199 17.2235 0.3452 16.5469 17.9000 200 17.1759 0.3716 16.4475 17.9043 201 17.1514 0.3830 16.4007 17.9020
  • 61. LAB: Forecasting using ARIMA • Forecast the number of sunspots for next three hours DataAnalysisCourse VenkatReddy 61
  • 62. Validation: How good is my model? • Does our model really give an adequate description of the data • Two criteria to check the goodness of fit • Akaike information criterion (AIC) • Schwartz Bayesiancriterion (SBC)/Bayesian information criterion (BIC). • These two measures are useful in comparing two models. • The smaller the AIC & SBC the better the model DataAnalysisCourse VenkatReddy 62
  • 63. Goodness of fit • Remember… Residual analysis and Mean deviation, Mean Absolute Deviation and Root Mean Square errors? • Four common techniques are the: • Mean absolute deviation, • Mean absolute percent error • Mean square error, • Root mean square error. DataAnalysisCourse VenkatReddy 63 n i ii n1 YˆY =MAD n i ii n1 2 YˆY =MSE MSERMSE n i i ii n 1 Y YˆY100 =MAPE
  • 64. Lab: Overall Steps on sunspot example • Import the time series data • Prepare the data for model building- Make it stationary • Identify the model type • Estimate the parameters • Forecast the future values DataAnalysisCourse VenkatReddy 64