1. State space methods for signal extraction,
parameter estimation and forecasting
Siem Jan Koopman
http://personal.vu.nl/s.j.koopman
Department of Econometrics
VU University Amsterdam
Tinbergen Institute
2012
2. Regression Model in time series
For a time series of a dependent variable yt and a set of regressors
Xt , we can consider the regression model
yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n.
For a large n, is it reasonable to assume constant coefficients µ, β
and σ 2 ?
Assume we have strong reasons to believe that µ and β are
time-varying, the model may be written as
yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .
This is an example of a linear state space model.
2 / 35
3. State Space Model
Linear Gaussian state space model is defined in three parts:
→ State equation:
αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),
→ Observation equation:
yt = Zt αt + εt , εt ∼ NID(0, Ht ),
→ Initial state distribution α1 ∼ N (a1 , P1 ).
Notice that
• ζt and εs independent for all t, s, and independent from α1 ;
• observation yt can be multivariate;
• state vector αt is unobserved;
• matrices Tt , Zt , Rt , Qt , Ht determine structure of model.
3 / 35
4. State Space Model
• state space model is linear and Gaussian: therefore properties
and results of multivariate normal distribution apply;
• state vector αt evolves as a VAR(1) process;
• system matrices usually contain unknown parameters;
• estimation has therefore two aspects:
• measuring the unobservable state (prediction, filtering and
smoothing);
• estimation of unknown parameters (maximum likelihood
estimation);
• state space methods offer a unified approach to a wide range
of models and techniques: dynamic regression, ARIMA, UC
models, latent variable models, spline-fitting and many ad-hoc
filters;
• next, some well-known model specifications in state space
form ...
4 / 35
5. Regression with time-varying coefficients
General state space model:
αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),
yt = Zt αt + εt , εt ∼ NID(0, Ht ).
Put regressors in Zt (that is Zt = Xt ),
Tt = I , Rt = I ,
Result is regression model yt = Xt βt + εt with time-varying
coefficient βt = αt following a random walk.
In case Qt = 0, the time-varying regression model reduces to the
standard linear regression model yt = Xt β + εt with β = α1 .
Many dynamic specifications for the regression coefficient can be
considered, see below.
5 / 35
6. AR(1) process in State Space Form
Consider AR(1) model yt+1 = φyt + ζt , in state space form:
αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),
yt = Zt αt + εt , εt ∼ NID(0, Ht ).
with state vector αt and system matrices
Zt = 1, Ht = 0,
Tt = φ, Rt = 1, Qt = σ 2
we have
• Zt and Ht = 0 imply that α1t = yt ;
• State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 );
• This is the AR(1) model !
6 / 35
7. AR(2) in State Space Form
Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space
form:
αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),
yt = Zt αt + εt , εt ∼ NID(0, Ht ).
with 2 × 1 state vector αt and system matrices:
Zt = 1 0 , Ht = 0
φ1 1 1
Tt = , Rt = , Qt = σ 2
φ2 0 0
we have
• Zt and Ht = 0 imply that α1t = yt ;
• First state equation implies yt+1 = φ1 yt + α2t + ζt with
ζt ∼ NID(0, σ 2 );
• Second state equation implies α2,t+1 = φ2 yt ;
7 / 35
8. MA(1) in State Space Form
Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form:
αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),
yt = Zt αt + εt , εt ∼ NID(0, Ht ).
with 2 × 1 state vector αt and system matrices:
Zt = 1 0 , Ht = 0
0 1 1
Tt = , Rt = , Qt = σ 2
0 0 θ
we have
• Zt and Ht = 0 imply that α1t = yt ;
• First state equation implies yt+1 = α2t + ζt with
ζt ∼ NID(0, σ 2 );
• Second state equation implies α2,t+1 = θζt ;
8 / 35
9. General ARMA in State Space Form
Consider ARMA(2,1) model
yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1
in state space form
yt
αt =
φ2 yt−1 + θζt
Zt = 1 0 , Ht = 0,
φ1 1 1
Tt = , Rt = , Qt = σ 2
φ2 0 θ
All ARIMA(p, d, q) models have a (non-unique) state space
representation.
9 / 35
12. Kalman Filter
• The Kalman filter calculates the mean and variance of the
unobserved state, given the observations.
• The state is Gaussian: the complete distribution is
characterized by the mean and variance.
• The filter is a recursive algorithm; the current best estimate is
updated whenever a new observation is obtained.
• To start the recursion, we need a1 and P1 , which we assumed
given.
• There are various ways to initialize when a1 and P1 are
unknown, which we will not discuss here.
12 / 35
13. Kalman Filter
The unobserved state αt can be estimated from the observations
with the Kalman filter:
vt = yt − Z t a t ,
Ft = Zt Pt Zt′ + Ht ,
Kt = Tt Pt Zt′ Ft−1 ,
at+1 = Tt at + Kt vt ,
Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,
for t = 1, . . . , n and starting with given values for a1 and P1 .
• Writing Yt = {y1 , . . . , yt },
at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).
13 / 35
14. Lemma I : multivariate Normal regression
Suppose x and y are jointly Normally distributed vectors with
means E(x) and E(y ), respectively, with variance matrices Σxx and
Σyy , respectively, and covariance matrix Σxy . Then
−1
E(x|y ) = µx + Σxy Σyy (y − µy ),
Var(x|y ) = Σxx − Σxy Σyy Σ′ .
−1
xy
Define e = x − E(x|y ). Then
−1
Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ])
= Σxx − Σxy Σyy Σ′
−1
xy
= Var(x|y ),
and
−1
Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y )
= Σxy − Σxy = 0.
14 / 35
15. Lemma II : an extension
The proof of the Kalman filter uses a slight extension of the lemma
for multivariate Normal regression theory.
Suppose x, y and z are jointly Normally distributed vectors with
E(z) = 0 and Σyz = 0. Then
−1
E(x|y , z) = E(x|y ) + Σxz Σzz z,
Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ ,
−1
xz
Can you give a proof of Lemma II ?
Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ .
15 / 35
16. Kalman Filter : derivation
State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .
• Writing Yt = {y1 , . . . , yt }, define
at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt );
• The prediction error is
vt = yt − E(yt |Yt−1 )
= yt − E(Zt αt + εt |Yt−1 )
= yt − Zt E(αt |Yt−1 )
= yt − Z t a t ;
• It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0;
• The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht .
16 / 35
17. Kalman Filter : derivation
State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .
• We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for
j = 1, . . . , t − 1;
• Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 ,
zz
y = Yt−1 and z = vt = Zt (αt − at ) + εt ;
• It follows that E(αt+1 |Yt−1 ) = Tt at ;
• Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ;
• We carry out lemma and obtain the state update
at+1 = E(αt+1 |Yt−1 , yt )
= Tt at + Tt Pt Zt′ Ft−1 vt
= Tt at + Kt vt ;
with Kt = Tt Pt Zt′ Ft−1
17 / 35
18. Kalman Filter : derivation
• Our best prediction of yt is Zt at . When the actual
observation arrives, calculate the prediction error
vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new
best estimates of the state mean is based on both the old
estimate at and the new information vt :
at+1 = Tt at + Kt vt ,
similarly for the variance:
Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ .
Can you derive the updating equation for Pt+1 ?
• The Kalman gain
Kt = Tt Pt Zt′ Ft−1
is the optimal weighting matrix for the new evidence.
• You should be able to replicate the proof of the Kalman filter
for the Local Level Model (DK, Chapter 2).
18 / 35
20. Prediction and Filtering
This version of the Kalman filter computes the predictions of the
states directly:
at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).
The filtered estimates are defined by
at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ),
for t = 1, . . . , n.
The Kalman filter can be “re-ordered” to compute both:
′
at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt ,
at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,
with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .
Compare Mt with Kt . Verify this version of KF using earlier
derivations.
20 / 35
21. Smoothing
• The filter calculates the mean and variance conditional on Yt ;
• The Kalman smoother calculates the mean and variance
conditional on the full set of observations Yn ;
• After the filtered estimates are calculated, the smoothing
recursion starts at the last observations and runs until the
first.
αt = E(αt |Yn ),
ˆ Vt = Var(αt |Yn ),
rt = weighted sum of future innovations, Nt = Var(rt ),
Lt = Tt − Kt Zt .
Starting with rn = 0, Nn = 0, the smoothing recursions are given
by
rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt ,
t
αt = at + Pt rt−1 ,
ˆ Vt = Pt − Pt Nt−1 Pt .
21 / 35
24. Missing Observations
Missing observations are very easy to handle in Kalman filtering:
• suppose yj is missing
• put vj = 0, Kj = 0 and Fj = ∞ in the algorithm
• proceed further calculations as normal
The filter algorithm extrapolates according to the state equation
until a new observation arrives. The smoother interpolates
between observations.
24 / 35
27. Forecasting
Forecasting requires no extra theory: just treat future observations
as missing:
• put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k
• proceed further calculations as normal
• forecast for yj is Zj aj
27 / 35
29. Parameter Estimation
The system matrices in a state space model typically depends on a
parameter vector ψ. The model is completely Gaussian; we
estimate by Maximum Likelihood.
The loglikelihood af a time series is
n
log L = log p(yt |Yt−1 ).
t=1
In the state space model, p(yt |Yt−1 ) is a Gaussian density with
mean at and variance Ft :
n
nN 1
log L = − log 2π − log |Ft | + vt′ Ft−1 vt ,
2 2
t=1
with vt and Ft from the Kalman filter. This is called the prediction
error decomposition of the likelihood. Estimation proceeds by
numerically maximising log L.
29 / 35
30. ML Estimate of Nile Data
1400
observation level (q = 0)
level (q = 1000) level (q = 0.0973 = ML estimate)
1300
1200
1100
1000
900
800
700
600
500
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
30 / 35
31. Diagnostics
• Null hypothesis: standardised residuals
√
vt / F t ∼ NID(0, 1)
• Apply standard test for Normality, heteroskedasticity, serial
correlation;
• A recursive algorithm is available to calculate smoothed
disturbances (auxilliary residuals), which can be used to detect
breaks and outliers;
• Model comparison and parameter restrictions: use likelihood
based procedures (LR test, AIC, BIC).
31 / 35
33. Assignment 1 - Exercises 1 and 2
1. Formulate the following models in state space form:
• ARMA(3,1) process;
• ARMA(1,3) process;
• ARIMA(3,1,1) process;
• AR(3) plus noise model;
• Random Walk plus AR(3) process.
Please discuss the initial conditions for αt in all cases.
2. Derive a Kalman filter for the local level model
2 2
yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ),
with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s.
Also discuss the problem of missing obervations in this case.
33 / 35
34. Assignment 1 - Exercise 3
3.
Consider a time series for yt that is modelled by a state space
model and consider a time series vector of k exogenous variables
Xt . The dependent time series yt depend on the explanatory
variables Xt in a linear way.
• How would you introduce Xt as regression effects in the state
space model ?
• Can you allow the transition matrix Tt to depend on Xt too ?
• Can you allow the transition matrix Tt to depend on yt ?
For the last two questions, please also consider maximum
likelihood estimation of coefficients within the system matrices.
34 / 35
35. Assignment 1 - Computing work
• Consider Chapter 2 of Durbin and Koopman book.
• There are 8 figures in Chapter 2.
• Write computer code that can reproduce all these figures in
Chapter 2, except Fig. 2.4.
• Write a short documentation for the Ox program.
35 / 35