ETSATPWAATFU

Economic Time Series Models and their Properties with an
Application to Forecasting Unemployment
Robert Zima
April 9, 2012
1 Introduction
Time series models operate as a tool in understanding the underlying factors and theory that pro-
duce observed data. The results allow us to fit these models for predictive forecasting. Applications
include economic forecasting, stock market analysis, quality control, and yield projections.
Many models exist for time series analysis; however, the most commonly used classes include
autoregressive (AR) models and moving average (MA) models. These models are often intertwined
to generate new models. For example, autoregressive moving average (ARMA) models combine
AR and MA models, which allow for increased flexibility in model selection.
State-space representations for ARMA models are a useful tool when situations with data distur-
bances occur, specifically with missing data values and measurement error. Furthermore, state-
space representations allow the use of a mathematical formulation called the Kalman filter, which
provides a means for estimating and predicting future data recursively.
An application to forecasting unemployment data will be considered. The two data sets consist of
seasonally adjusted and non-seasonally adjusted data. An ARMA(3,3) model will characterize the
seasonally adjusted data, whereas we will apply the Kalman filter to the non-seasonally adjusted
data. Furthermore, a comparative study between forecasts of both seasonally adjusted and non-
seasonally adjusted data at varying time intervals will also be explored.
2 Compositions of ARMA models
Before we fully consider ARMA models, it is necessary to understand the underlying factors that
comprise them. These time series models are based upon the notion of cycles, or dynamic behavior
linking past to present events, and hence present to future events.
It is desired (for forecasting) that time series mean and covariance structure (covariances between
past and present values) be stable over time. This is called covariant (wide sense) stationarity.1
If a series mean is stable over time, we can write E(yt) = µ, for all t. For stability in covariance
structure, we use the autocovariance function. Autocovariance at displacement τ is the covariance
between two events yt and yt−τ . Specifically, we write
γ(t, τ) = Cov(yt, yt−τ ) = E(yt − µ)(yt−τ − µ).
1
When we say covariant stationary
1

Stability over time means that the autocovariance depends only on displacement and not time, so
we write γ(t, τ) = γ(τ) for all t.
Covariant stationarity appears to be a stringent constraint; however ARMA models require this
condition. For example, many applications in economics and business have seasonality with a
varying mean.2 Certain models have also been developed that provide special treatment to trends
and seasonality. Speciﬁcally, state-space ARMA models with Kalman ﬁltering are very good with
seasonal trends, and will be discussed in section 5.
Another useful tool that characterize relationship between two events is the correlation. In general,
correlation is a much more useful statistic because it is bounded by one in absolute value, rather
than the unbounded covariance between two variables. For example, if x and y have a covariance of
1 million, it is not necessarily clear whether the two events are very strongly associated, whereas a
correlation of .95 indicates a very strong association. For this reason, the autocorrelation function
preferred over the autocovariance function. The autocorrelation function is obtained by dividing
the autocovariance function γ(τ) by the variance γ(0). We write
ρτ =
γ(τ)
γ(0)
, τ = 0, 1, 2, . . .
The structure of autocovariance and autocorrelation functions for each time series model will be
explored in section 3.
2.1 White Noise Processes
A white noise process is a fundamental characteristic of the time series models we will consider.
White noise processes have three properties:
εt ∼ (0, σ2
).
1. E(εt) = E(εt|εt−1, εt−2, . . .) = 0.
2. Var(εt) = Var(εt|εt−1, εt−2, . . .) = σ2.
3. Cov(εtεt−j) = E(εtεt−j) = 0, j > 0.
Properties 1 and 3 describe the absence of serial correlation, or correlation over time. In other words,
we can think of white noise processes as being memoryless. A white noise process that is serially
uncorrelated is not necessarily serially independent, since zero correlation implies independence only
in the normal case. Hence, if εt is serially uncorrelated and normally distributed, εt is Gaussian
white noise.
By itself, εt is not a very interesting process, since values of εt do not correlate to εt+1. In other
words, large values for white noise at one time period do not correspond to large values of white
noise at the next time period, and vice versa. More realistic time series models are constructed
taking combinations of εt, as we see in the next section.
2.2 Basic ARMA Models
Each of the time series models we are concerned with are primarily composed of past observations
and white noise processes. That is, AR models are a class of linear regressions of past observations,
2
Data acquired and analyzed in this section has been subject to seasonal adjustments to remove the seasonality
2

whereas MA models are another class of linear regressions of unobserved white noise error terms or
shocks. More specifically, ARMA models are a linear combination of a class of both AR and MA
models. Let p ≥ 1 represent the order of an AR process, and q ≥ 1 represent the order of an MA
process. Then by definition,
AR(1) : yt = φ1yt−1 + εt
MA(1) : yt = εt + θ1εt−1
AR(p) : yt = φ1yt−1 + φ2yt−2 + . . . + φpyt−p + εt
MA(q) : yt = εt + θ1εt−1 + . . . + θqεt−q
ARMA(p, q) : yt = φ1yt−1 + . . . + φpyt−p + εt + θ1εt−1 + . . . + θqεt−q
Note that all of the above models are a class of the ARMA(p, q) processes. In other words, an
AR(1) model is just an ARMA(1,0) process. The same applies for MA processes. Thus, we can
think of every process in terms of a more general ARMA(p, q) process, which provides a much
better means of model selection and flexibility.
Model selection is based on a variety of factors. In general, p and q are selected to be minimal while
still providing an acceptable fit to the data.3 Each of these models is constructed via a sequence
{yt}, given a sequence of white noise processes {εt} and an initial value y0. Notice that each model
has mean zero; however modification of the models to include some mean µ is simple. For example,
if a series has mean µ and follows an AR(1) model, we have
(yt − µ) = φ1(yt−1 − µ) + εt
or equivalently
yt = (1 − φ1)µ + φ1yt−1 + εt,
where (1 − φ1)µ is simply a constant. From this point forward, we will consider cases with mean
zero since adding means and other deterministic trends is straightforward.
2.3 Lag Operators and Lag Polynomials
In order to appropriately represent one model in terms of another we use lag operator (sometimes
called backshift operator) notation. These operate on an element to produce the previous element.
In symbols,
Lyt = yt−1
L2
yt = L(Lyt) = Lyt−1 = yt−2
...
A lag polynomial is just a linear function of powers of L up through the mth power, where
φ(L)yt = (a0 + a1L + a2L2
+ . . . + amLm
)yt = a0yt + a1yt−1 + . . . + amyt−m
3
Among two competing models, another common practice is to select the model with the smallest Akaike Infor-
mation Criterion (AIC) or Schwarz Information Criterion (SIC), which are penalty terms for larger choices of p and
q.
3

Using this notations, we can rewrite the ARMA models above as
AR(1) : (1 − φ1L)yt = εt
MA(1) : yt = (1 + θ1L)εt
AR(p) : (1 − φ1L − φ2L2
− . . . − φpLp
)yt = εt
MA(q) : yt = (1 + θ1L + . . . + θqLq
)εt
or simply
AR : φ(L)yt = εt
MA : yt = θ(L)εt
ARMA : φ(L)yt = θ(L)εt
where φ(L) and θ(L) represent polynomials for the AR and MA terms, respectively. We will see in
the following section the applicability that lag operators and lag polynomials have for determining
stationarity.
2.4 Wold Decomposition Theorem
Subsequent analysis of the listed time series models depends primarily on the Wold Decomposition
Theorem. The theorem states that any covariant stationary discrete time series process can be
decomposed into a pair of uncorrelated processes, with one infinite order MA process and some
deterministic trend. That is, any stationary process has this seemingly special representation.
Realistically, when selecting a model, one that depends on an infinite number of parameters will
not be chosen, since the goal is to minimize values of p and q to maintain simplicity for forecasting
purposes.
Theorem 2.1. Suppose that {yt} is a covariance stationary process with E(yt) = 0 and covariance
function γ(j) = E(ytyt−j), ∀j. Then
yt =
+∞
j=0
bjεt−j
where
b0 = 1
+∞
j=0
b2
j < 0
E(ε2
t ) = σ2
E(εtεs) = 0 for t = s
This theorem is incredibly beneficial when characterizing data. In particular, any data set that
has the aforementioned properties can be represented as an MA(+∞) process. Unfortunately, this
property is not very useful when forecasting future data points. Models with a larger number of
parameters have a higher level of residual effects when forecasting. So a model must be chosen
in such a way that it effectively characterizes the data with minimal parameters for forecasting
purposes.
4

ARMA models were constructed to provide interesting models of the joint distribution of some time
series {yt}. Autocovariance and autocorrelation are a way to characterize the joint distribution of
a time series produced as a result. Hence, the correlation of yt and yt+1 is a clear measure of the
persistence or strength of an observation today affects future observations.
3 Stationarity of ARMA Models
Stationarity in ARMA(p, q) models is necessary in order to forecast models effectively. One benefit
of working with ARMA(p, q) models is that the AR(p) component is the only component required
to determine stationarity. Finite order MA processes will always be stationary.4
It is important to note that computation for stationarity in an AR(p) process is relatively straight-
forward and can be done via computational software. In general, if the characteristic function (this
can also be thought of as the lag polynomial) associated with an AR(p) model has roots strictly
outside the unit circle, the process is stationary. If any root of the characteristic function lie inside
the unit circle, the process is non-stationary and a different model must be chosen. Non-stationary
models will be explored in section 4.
Specifically, given an ARMA(p, q) process, defined as
yt = φ1yt−1 + . . . + φpyt−p + εt + θ1εt−1 + . . . + θqεt−q.
Rewriting in terms of lag polynomials, we have
(1 − φ1L − φ2L2
− . . . − φpLp
)yt = εt + θ1εt−1 + . . . + θqεt−q.
Finding the roots of this characteristic function, or 1 − φ1L − φ2L2 − . . . − φpLp will determine
the stationarity of the process. Application of lag polynomials provide a much more simplistic
understanding of the data and the model we wish to choose. Again, it is important to note that
our goal is to select appropriate models that characterize the data effectively, but in a simple
manner.
3.1 Autocovariance and Autocorrelation for ARMA models
Another means of selecting an appropriate model is to consider the autocovariance and autocorrela-
tion of each model. Recall that autocovariance measures the linear dependance between two points
on the same data set observed at different times. Autocorrelation measures the linear predictability
of a series at some time t. We will specifically identify autocovariances and autocorrelations of each
ARMA model.
3.1.1 White Noise:
Before characterizing ARMA model autocovariances and autocorrelations, we observe the underly-
ing white noise process structure. Recall that εt ∼ (0, σ2), hence
γ(0) = σ2
γ(j) = 0 for j = 0
ρ0 = 1
ρj = 0 for j = 0.
4
Infinite order MA processes are not necessarily stationary, since it needs the condition that all coefficients asso-
ciated with the process must be square summable: Hamilton(1994)
5

3.1.2 MA(1):
Recall that an MA(1) model is defined as
yt = εt + θ1εt−1.
Then the autocovariance is
γ(0) = Var(yt) = Var(εt + θ1εt−1) = σ2
+ θ2
1σ2
= σ2
(1 + θ2
1)
γ(1) = E(ytyt−1) = E[(εt + θ1εt−1)(εt−1 + θ1εt−2)] = E(θ1ε2
t−1) = σ2
θ1
γ(2) = E(ytyt−2) = E[(εt + θ1εt−1)(εt−2 + θ1εt−3)] = 0
... = 0.
The autocorrelation for an MA(1) process is
ρ0 = 1
ρ1 =
γ(1)
γ(0)
=
θ1
1 + θ2
1
ρ2 =
γ(2)
γ(0)
= 0
... = 0.
For this particular process, note that the autocovariance becomes zero as does the autocorrelation
for t > 1.
3.1.3 MA(q) and MA(+∞):
We have that MA(q) processes have q autocorrelations not zero. Recall that by definition we have
an MA(q) process
yt = θ(L)εt =
+∞
j=0
(θjLj
)εt.
The autocovariances are defined as
γ(0) = Var(yt) = σ2


+∞
j=0
θ2
j


...
γ(k) = σ2


+∞
j=0
θjθj+k

 .
Autocorrelations follow immediately from the above formulas. A very nice property here is that
MA processes have second moments that are easy to compute, primarily because E(εjεk) terms
are zero, for j = k.
6

3.1.4 AR(1):
Recall that an AR(1) model is defined as
yt = φ1yt−1 + εt.
Note that this model is equivalent to
(1 − φ1L)yt = εt =⇒ yt = (1 − φ1L)−1
εt =
+∞
j=0
φj
1εt−j.
This is beneficial for finding autocovariance, since
γ(0) = σ2


+∞
j=0
φ2j
1

 = σ2 1
1 − φ2
1
γ(1) = σ2


+∞
j=0
φj
1φj+1
1

 = σ2
φ1


+∞
j=0
φ2j
1

 = σ2 φ1
1 − φ2
1
...
γ(k) = σ2 φk
1
1 − φ2
1
.
The corresponding autocorrelation is then
ρ0 = 1
ρ1 = φ1
...
ρk = φk
1.
Another more useful way of finding autocorrelations of an AR(1) is
γ(1) = E(ytyt−1) = E((φ1yt−1 + εt)(yt−1)) = σ2
φ1; ρ1 = φ1
γ(2) = E(ytyt−2) = E((φ2
1yt−2 + φ1εt−1 + εt)(yt−2)) = σ2
φ2
1; ρ2 = φ2
1
...
...
γ(k) = E(ytyt−k) = E(φk
1yt−k + . . .)(yt−k) = σ2
φk
1; ρk = φk
1
3.2 AR(p), Yule-Walker Equations:
Using this latter method applied for an AR(1) process, we can easily find autocovariances and
autocorrelations for AR(p) processes, also called Yule-Walker Equations. The method is to apply
an AR(3) model and extrapolate the information to apply it to AR(p). Note, an AR(3) process is
defined by
yt = φ1yt−1 + φ2yt−2 + φ3yt−3 + εt.
7

Then multiplying both sides by yt, yt−1, . . ., taking expectations, and dividing by γ(0), we have
1 = φ1ρ1 + φ2ρ2 + φ3ρ3 + σ2
/γ(0)
ρ1 = φ1 + φ2ρ1 + φ3ρ2
ρ2 = φ1ρ1 + φ2 + φ3ρ1
ρ3 = φ1ρ2 + φ2ρ1 + φ3
...
ρk = φ1ρk−1 + φ2ρk−2 + φ3ρk−3.
Solving the second, third, and fourth equations for ρ1, ρ2 and ρ3 is straightforward. The remaining
equation provides ρk in terms of ρk−1 and ρk−2, which means we can solve for each ρ. Therefore,
the first equation can be solved for the variance, i.e.
γ(0) =
σ2
1 − (φ1ρ1 + φ2ρ2 + φ3ρ3)
.
3.3 Summary
MA(q) processes have q non-zero autocorrelations, with the rest zero. AR(p) processes have poten-
tially p non-zero autocorrelations with no particular pattern, with a damped sine or exponential
decay for the rest. Specifically, ARMA models do a very nice job of capturing autocorrelation
behavior. Characterizing autocorrelation functions provides a means of determining stationarity of
a particular process.
4 State Space Representations and the Kalman Filter
The concept of stationarity is crucial because when a time series is non-stationary, the mean,
variance, covariance, and correlation lose their meaning, therefore common identification and esti-
mation methods are not applicable. A particularly useful approach to forecasting non-stationary
time series is to apply the Kalman filter. Prior to discussing the Kalman filter applied to ARMA
processes, it’s necessary to explore ARMA models in state-space form.5
A state space model consists of two equations, namely
St+1 = FSt + Get (4.1)
Yt = HSt + t (4.2)
where St represents a state vector of dimension m, Yt is some observed time series, F, G, H are
parameter matrices, {et} and { t} are i.i.d. random vectors satisfying
E(et) = 0, E( t) = 0, Cov(et) = Q, Cov( t) = R
such that {et} and { t} are independent white noise processes.6 Practically, we can think of state
vectors denoting an unobserved vector describing the “status” of the system. Hence, a state vector
may be regarded as a vector with necessary information to predict future observations. For our
purposes, we will consider Yt a scalar and F, G, H constants. Generally speaking, state space models
allow for vector time series and time-varying parameters.
5
The ARMA(3,3) process that we will use is stationary for both the seasonally adjusted data as well as the
non-seasonally adjusted data
6
We will specifically be considering Gaussian white noise, as specified by the Kalman filter
8

4.1 ARMA Models
The relation between state space models and ARMA models inherently goes both ways. In other
words, an ARMA model can be put into a state space form in an infinite number of ways, but
also for any state space model in (4.1)-(4.2), there exists an ARMA model7. We now describe the
former.
ARMA to State-Space: Interestingly, ARMA models can be represented in state-space form.
As we will see in the next section, the benefits can be applied to the Kalman filter. Consider an
ARMA(p, q) model
Yt =
p
i=1
φiYt−i + at −
q
j=1
θjat−j.
where at represents a white noise process. Fix m = max{p, q}. Define φi = 0 for i > p and θj = 0
for j > q. Then, we have
Yt =
m
i=1
φiYt−i + at −
m
i=1
θiat−i.
For simplicity, we will use this max{p, q} representation. If we let ψ(L) = θ(L)
φ(L) , we can obtain
ψ-weights of the model by equating coefficients of Lj in the equation
(1 − θ1L − . . . − θmLm
) = (1 − φ1L − . . . − φmLm
)(ψ0 + ψ1L + . . . + ψmLm
+ . . .),
where ψ0 = 1. We now consider the coefficient of Lm; in particular we have
−θm = −φmψ0 − φm−1ψ1 − . . . − φ1ψm−1 + ψm.
Hence, we have that
ψm =
m
i=1
φiψm−i − θm. (4.3)
Then, from the ψ-weight representation, we have
Yt+m−i = at+m−i + ψ1at+m−i−1 + ψ2at+m−i−2 + . . .
Now, we can think of Yt+m−i|t as the (m − i)th state given t states have occurred. So, a further
extension shows
Yt+m−i|t = ψm−iat + ψm−i+1at−1 + ψm−i+2at−2 + . . .
Yt+m−i|t−1 = ψm−i+1at−1 + ψm−i+2at−2 + . . .
In other words, Yt+m−i|t can be represented as
Yt+m−i|t = Yt+m−i|t−1 + ψm−iat, where m − i > 0. (4.4)
Now we can set up a state space model. Let St = (Yt|t−1, Yt+1|t−1, . . . , Yt+m−1|t−1) . Then using
Yt = Yt|t−1 + at, our observed equation is just
Yt = [1, 0, . . . , 0]St + at.
7
Tsay, R.S. (2008)
9

The state-transition equation is obtained by (4.3) and (4.4). For the first m − 1 elements of St+1,
we can use (4.4). For the last element of St+1, we have
Yt+m|t =
m
i=1
φiYt+m−i|t − θmat.
By (4.4), we have that
Yt+m|t =
m
i=1
φi(Yt+m−i|t−1 + ψm−iat) − θmat
=
m
i=1
φiYt+m−i|t−1 +
m
i=1
φiψm−i − θm at
=
m
i=1
φiYt+m−i|t−1 + ψmat,
since the final equality applies (4.3). Then, the state-transition equation is
St+1 = FSt + Gat
where
F =





0 1 0 · · · 0
0 0 1 · · · 0
...
...
φm φm−1 · · · φ2 φ1





, G =







ψ1
ψ2
ψ3
...
ψm







This is only one of three methods by which an ARMA model can be represented as a state-space
form. Two other methods were proved by Akaike and Aoki, however this method is more commonly
used in modern literature.
4.2 Kalman Filter
A Kalman filter is a set of recursive equations, providing a relatively simple means of both predicting
and correcting information in a state space model. It sequentially decomposes some observation
into conditional mean and predictive residual. It has a wide array of use in engineering fields as
well as statistical analysis.
Derivation of Kalman recursions are simplest when applying normality assumptions. However, we
should observe that this recursion is a result of the least squared principle, not normality. This
implies that the recursion holds for non-normal cases. The difference is that the optimal solution
is only obtained within the class of linear solutions. With normality, the solution is optimal among
all possible solutions (both linear and nonlinear cases). Hence, under normality we know that a
normal prior plus normal likelihood results in a normal posterior, and if a random vector (X, Y ) is
jointly normal, that is
X
Y
∼ N
µx
µy
,
Σxx Σxy
Σyx Σyy
,
then the conditional distribution of X given Y = y is normal, or
X|Y = y ∼ N(µx + ΣxyΣ−1
yy (y − µy), Σxx − ΣxyΣ−1
yy Σyx).
10

With this information, we can now derive the Kalman filter. Let Pt+j|t be the conditional covariance
matrix of St+j given {Yt, Yt−1, . . .} for j ≥ 0 and St+j|t be the conditional mean of St+j given
{Yt, Yt−1, . . .}.
By the state space model, we have
St+1|t = FSt|t (4.5)
Yt+1|t = HSt+1|t (4.6)
Pt+1|t = FPt|tF + GQG (4.7)
Vt+1|t = HPt+1|tH + R (4.8)
Ct+1|t = HPt+1|t, (4.9)
where Vt+1|t is the conditional variance of Yt+1 given {Yt, Yt−1, . . .} and Ct+1|t denotes the condi-
tional covariance between Yt+1 and St+1. The joint conditional distribution between St+1 and Yt+1
is given as
St+1
Yt+1
∼ N
St+1|t
Yt+1|t
,
Pt+1|t Pt+1|tH
HPt+1|t HPt+1|tH + R
.
When Yt+1 is available, we use the property of normality to update the distribution of St+1, i.e.
St+1|t+1 = St+1|t + Pt+1|tH [HPt+1|tH + R]−1
(Yt+1 − Yt+1|t) (4.10)
and
Pt+1|t+1 = Pt+1|t − Pt+1|tH [HPt+1|tH + R]−1
HPt+1|t. (4.11)
Clearly, we have that
rt+1|t = Yt+1 − Yt+1|t = Yt+1 − HSt+1|t
which is the predictive residual for time t + 1. The update equation (4.10) tells us that when rt+1|t
is non-zero, there is new information about the system, hence the state vector should be modified.
Therefore, the contribution of rt+1|t to the state vector needs to be weighted by the variance of
rt+1|t and the conditional covariance matrix of St+1.
Practically, when initializing the Kalman filter, we begin with prior information, that is S0|0 and
P0|0, then predict Y1|0 and V1|0. When you have the observation for Y1, the update equation is used
to compute S1|1 and P1|1. This estimate becomes the prior information for the next observation.
Specifically, we take S0|0 and P0|0 be some initial value. From (4.5) and (4.6), we have predictions
S1|0 and Y1|0. Then from (4.7), we obtain P1|0, which will provide V1|0 and C1|0 from (4.8) and
(4.9). Then, when Y1 is observed, the residual value e1|0 = Y1 − Y1|0 can update the state vector.
In other words, we will have S1|1 and P1|1. This is a single iteration of the Kalman filter.
In summary, the Kalman filter consists of set of predicting equations: (4.5), (4.6), (4.7), and (4.8)
and a set of updating equations: (4.10) and (4.11). Note that the effect of the initial values S0|0
and P0|0 decreases as t increases. The reason is that for a stationary time series, the eigenvalues
of the coefficient matrix F are less than one in absolute value. Therefore, Kalman filter recursion
provides a very nice means of reducing the effect of initial values as t increases.
5 Analytical Summary
The use of ARMA processes to model and forecast data has many applications in business and
economics. Section 2 discussed the primary composition of ARMA models, their structure, and
11

some of the underlying properties. Time series models written as lag operators have the benefit of
identifying non-stationarity in the process, which was discussed in section 3. In particular, station-
ary processes are much more valuable for forecasting that non-stationary processes where trends
are far more evident. Furthermore, the autocovariance and autocorrelation functions characterize
processes, which provide a means of selecting a particular model.
Time series models can also be converted into an equivalent state-space representation. The benefit
in applying this method is that certain filters can be applied to the process in question. In particular,
we will emphasize use of the Kalman filter for application in forecasting.
6 Application
Two data sets will be considered for forecasting purposes: seasonally adjusted and non-seasonally
adjusted unemployment data. The goal is to identify whether a seasonal adjustment is necessary
for forecasting purposes with the traditional ARMA model, or whether the Kalman filter is a more
effective means of forecasting non-seasonally adjusted data.
The data used for modeling purposes consists of civilian unemployment rate from the U.S. De-
partment of Labor: Bureau of Labor Statistics. Data has been released on a monthly basis from
January 1948 to February 2012. The unemployment rate represents the number of unemployed as
a percentage of the labor force. Labor force data are restricted to people 16 years of age and older,
who currently reside in one of the fifty states or the District of Columbia, who do not reside in
institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty
in the Armed Forces.8
6.1 Methodology:
Two data sets of unemployment will be used. The first consists of seasonally adjusted data. An
ARMA(3,3) process9 will be used to model the data as well as behavior of residual values over the
range of time from 1948 to 2012. We will initiate a predictive forecast for 1996-199710, 2008 to
200911, and 2011 to 201212 as a baseline measurement of how well the forecast model compares to
the true model. We will then apply a forecast for 2012-2013
The second set of data consists of non-seasonally adjusted data. We will apply an ARMA(3,3) model
as we did for the data above, apply a Kalman filter for this model, and show behavior of residual
values over the same time period of 1948 to 2012. The same forecasting procedure as above will
be applied. Comparison of the seasonally adjusted ARIMA x-12 data with an ARMA(3,3) process
and the non-seasonally adjusted Kalman filtered data will provide insight into the consistency and
reliability of the two processes.
The statistical software package eViews was used to determine coefficient values for the ARMA(3,3)
model and state-space model. Furthermore, eViews has the capability of applying the Kalman filter
and allows for predictive forecasts based on the information up the the forecast initiation. The
optimal forecast method will be chosen based on comparison of means over the time interval of the
forecast. That is, if the least squares method of approximation for an ARMA(3,3) is optimal, the
8
http://research.stlouisfed.org/fred2/data/UNRATE.txt
9
This model was selected based on minimal AIC and SIC values
10
1996-1997 has a relatively stable unemployment rate
11
2008-2009 has a sharp increase due to recession
12
2011-2012 was a recovery from recession
12

difference between the mean of the true data and the mean of the forecast should have a smaller
absolute value than the difference between the mean of the true data and the mean of the forecast
for the Kalman filtered data. In symbols,
Mopt = min
1
12
T
i=t
(yi − y∗
i ) .
This is a more feasible method than comparing residuals, since the seasonally adjusted data would
naturally have smaller residual values than the non-seasonally adjusted data.
6.2 Seasonally Adjusted and Non-Seasonally Adjusted Data:
13

6.3 Forecast for 1996 (Seasonally Adjusted) using ARMA(3,3):
Applying an ARMA(3,3) model13 to seasonally adjusted data produces the following results:
It is important to note that eViews requires a backcast of MA terms in order to appropriately model
the data. In other words, the MA process order affects the data fitting. This data was backcasted
three months in order to appropriately model the behavior of the error terms. Two particular
values of interest are the Akaike information criterion (AIC) and the Schwarz information criterion
(SIC), which are quite low when compared to the non-seasonally adjusted data, which we will see
next. The forecast model based on a least squares approximation (dynamically fitted based on
coefficient values) is shown below. For this data, we find that 1
12
T
i=t(yi − y∗
i ) = 0.154426.
13
Based on AIC and SIC values
14

6.4 Forecast for 1996 (Non-Seasonally Adjusted) using Kalman Filter:
A particularly nice property of eViews is that it has a built in function specifically designed to
convert an ARMA(p, q) model into state-space form. That is, eViews uses the process listed in
section 4.1. The benefit to this approach is that the process is entirely automated, however there
are controls that one can implement to have the appropriate state-space form, specifically with the
covariance structure in both the signal and state equations. Listed below the state-space form is
the Kalman filtered data with structure and statistics listed. Notice that the AIC and SIC values
for each model are higher with the non-seasonally adjusted data than the seasonally adjusted. The
reason for this is more volatility in the data.
15

From the data, eViews has a variety of forecasting methods that can be applied to the Kalman
filtered data, specifically dynamic forecasting. Dynamic forecasting allows for more predictive
flexibility further in the forecast. In other words, forecast data at some point during the forecast
period comes from previous forecast predictions at each data point.
Notice that in both the seasonally adjusted forecast and the non-seasonally adjusted forecast,
both models performed quite well if a predictive scale of one year is used. We also find that
1
12
T
i=t(yi − y∗
i ) = 0.251177. For the 1996 forecast, it appears the ARMA(3,3) least squares
estimate is a better forecast.
The previous forecast from 1996-1997 was at a relatively stable time for unemployment. Now we
will look at data up to 2008 and create a forecast for 2008-2009, which was the beginning of the
most recent recession.
16

Note that the spike at the end of the data is a result of the program knowing what the true data
actually looks like, so the forecast model in the program corrects itself by a significant margin when
the forecast period has ended. It is important to note that this is a flaw in the program itself, not
the forecast or method applied. One of the benefits of implementing a forecast with confidence
bands is that it accounts for data fluctuations or time periods where there may be a significant
increase or decrease in the data. We find that 1
12
T
i=t(yi − y∗
i ) = 0.609830.
17

Here we will look at a Kalman ﬁlter approach to forecasting using the non-seasonally adjusted data.
Listed below is the state-space form of the ARMA(3,3) model as well as the Kalman ﬁlter applied
to the model.
18

From a visual standpoint, it appears the ARMA(3,3) least squares estimate does a much better job
of forecasting the data. Specifically, for the Kalman filtered forecast, we have 1
12
T
i=t(yi − y∗
i ) =
1.11242. Therefore, it is clear that the ARMA(3,3) least squares forecast is a better choice in this
case.
This data set is known to have a slight drop-off from the spike experienced during the recession in
2008-2009. Based on the level of volatility in the model up to this time period, we may expect a
significant deviation of the forecast from the true data, however this is not the case.
19

Accordingly, we find that 1
12
T
i=t(yi − y∗
i ) = 0.343110. Qualitatively, from appearance, this is
not a terrible forecast, as the difference between the true mean and forecast mean was less than
half a percentage point over a full year.
When we look at the data between 2010-2012, we notice there appears to be a higher level of
volatility in the non-seasonally adjusted data. Fortunately, the Kalman filter does a very good job
of transforming the data and in fact has a very good forecast for the next year, as we will see below.
20

The seasonal trend is quite evident in this graph. It is clear that Q1 of every year has a significant
increase in unemployment, while Q4 has a significant decrease. When we observe the Kalman filter
forecast, we notice that the difference between the forecasted value at the end of the forecast period
and the true value deviates by a small percentage. Accordingly, 1
12
T
i=t(yi − y∗
i ) = 0.090709.
Therefore, the optimal model in this instance is the Kalman filter.
Here is the final case in which we present a forecast for data that we currently do not have. Here
we start with the ARMA(3,3) case:
21

According to this estimate, unemployment is expected to decrease to around 7.8% be the end of
December 2012. This particular forecast was initiated in January 2012. Although we have the
numbers for February 2012, the goal was to create a forecast for the entire ﬁscal year beginning in
January and ending in December. The mean of the forecast is deﬁned as ¯y∗ = 1
12
T
i=t y∗
i = 8.06.
All previous models had the luxury of forecasts and true data associated with them. This particular
forecast is unique in that all we have is the forecast and no data.
22

The Kalman filtered forecast listed above appears to have a very large 95% confidence band associ-
ated with it. In other words, the forecast is telling us that unemployment could fluctuate anywhere
between 5.5% and 10.75%, which would be the highest level of unemployment on record. This
forecast has mean ¯y∗ = 1
12
T
i=t y∗
i = 8.36.
7 Results
The ARMA(3,3) least squares approximation for forecasting was optimal in two of the three cases
considered. Furthermore, the confidence bands were more tightly bound to the forecast in question.
This has the advantage of providing a very high level of certainty within a smaller constraint, which
is generally thought of as intrinsically better. Therefore, seasonally adjusted data appears to be
better for forecasting, which is what we would expect. Less variation in the overall structure of the
data implies less residual values, which means there is a higher probability that the forecast model
more closely resembles the true data.
The Kalman filter should not be completely disregarded as a potential forecasting method. In
the 2011 forecast, the total difference between the mean of the true data and the mean of the
forecasted data differed by less than a tenth of a percent. Qualitatively, it appears de-seasoning
the data is much more effective for forecasting, however the true non-seasonally adjusted data is
raw. Forecasting seasonal adjust data and applying a correction term to more accurately forecast
non-seasonally adjusted data may be of research interest in the future.
8 Acknolwedgements
I would like to thank my committee chair, Dr. Bozenna Pasik-Duncan for her work and constant
support over the past 3 years during my time here. I have always treasured our conversations and
discussions on education and the role that mathematics education serves. It was her guidance that
allowed me to develop a proper motivation for this particular project.
24

I would also like to thank my committee, Dr. Tyrone Duncan, Dr. Zsolt Talata, and Dr. Xuemin Tu
for their contributions that allowed me to question every aspect of this project. Professor Duncan’s
class in mathematical finance gave me an appreciation for applied mathematics that I had not
experience before. Professor Talata’s applied regression analysis course developed the basis for my
understanding of linear regressions and ultimately time series models. Professor Tu’s discussion
on filtering allowed me to consider other potential models that may serve as a basis for further
research.
I would also like to thank my office mate, Cody Clifton for carefully reading through this paper
and providing feedback for improvement. A very big “thank you” goes out to Nathan Welch for
discussing the topics with me and providing insight on the computational aspect of the project.
25

References
[1] Basu, S. and Reinsel G. C. (1996), Relationship Between Missing Data Likelihoods and Complete Data Restricted
Likelihoods for Regression Time Series Models: An Application to Total Ozone Data, Journal of the Royal
Statistical Society. Series C (Applied Statistics) 45(1), 63-72.
[2] Box, G. E. P., Jenkins G. M., and Reinsel G. C. (1994) Time Series Analysis: Forecasting and Control (3rd
Edition), Prentice-Hall Inc.
[3] Brockwell, P. J. and Davis R. A. (1987), Time Series: Theory and Models, Springer-Verlag, New York Inc.
[4] Cochrane J. H. (1997) Time Series for Macroeconomics and Finance, University of Chicago, Chicago IL.
[5] De Jong, P. (1988), The Liklihood for a State Space Model, Biometrika 75(1), 165-169.
[6] De Jong, P. (1991), The Diﬀuse Kalman Filter, Annals of Statistics 19(2), 1073-1083.
[7] De Jong, P. and Penzer, J. R. (1998), Diagnosing Shocks in Time Series, Journal of the American Statistical
Association 93(442), 796-806.
[8] De Jong, P. and Penzer, J. R. (2004), The ARMA Model in State Space Form, Statistics and Probability Letters
70(1), 119-125.
[9] Diebold, F. X. (2007), Elements of Forecasting (4th Edition), Thomson South-Western.
[10] Hamilton, J. D. (1994), Time Series Analysis, Princton: Princeton University Press.
[11] Harvey, A. C. (1993), Time Series Models (2nd Edition), London: Harvester Wheatsheaf.
[12] Harvey, A. C. and Phillips, G. D. A. (1979), Maximum Likelihood Estimation of Regression Models with
Autoregressive-Moving Average Disturbances, Biometrika 66(1), 49-58.
[13] Johnston, J. (1984), Econometric Methods (3rd Edition), Singapore: McGraw-Hill.
[14] Jones, R. H. (1980) Maximum Likelihood Fitting of ARMA Models to Time Series with Missing Observations,
Technometrics 22(3), 389-395.
[15] Kohn, R. and Ansley C. F. (1986), Estimation, Prediction, and Interpolation for ARIMA Models with Missing
Data, Journal of the American Statistical Association, 81(395), 751-761.
[16] Pearlman, J.G. (1980), An Algorithm for the Exact Likelihood of a High-Order Autoregressive-Moving Average
Process, Biometrika 67(1), 232-233.
[17] Sargent, T. J. (1979) Macroeconomic Theory, Academic Press.
[18] Tsay, R. S. (2008) Time Series Analysis for Forecasting and Model Building, University of Chicago, Booth School
of Business.
26

ETSATPWAATFU

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie ETSATPWAATFU

Ähnlich wie ETSATPWAATFU (20)

ETSATPWAATFU