Time Series Analysis III

Advanced Topics

Table of Contents

ARMAX: ARMA Analysis for Regression Residuals

Auto-Regressive Conditional Heteroscedasticity

Multi-Equation Time Series Models

State-Space Models

Readings


ARMAX: ARMA Analysis for Regression Residuals

Yt = Xtβ + εt
εt = ρ1εt-1 + ρ2εt-2 + ... + ρpεt-p - θ1ut-1 - θ2ut-2 - ... - θqut-q + ut

or

Yt = Xtβ + ρ(B)-1θ(B)ut

or

ρ(B)Yt = ρ(B)Xtβ + θ(B)ut where ut ~ nii(0,σ2).

AR(1) Process

εt = ρ εt-1 + ut

We assume |ρ| < 1 for model stability. It is clear that

σ2 = Var(ut) = (1-ρ2) Var(εt).

Denote the variable transformations Yt* = Yt - ρ Yt-1 and Xt* = Xt - ρ Xt-1. Since u1 = (1-ρ2)½ ε1, the otherwise lost first observation is kept with the transformations Y1* = (1-ρ2)½Y1 and X1* = (1-ρ2)½X1.

Thus model for estimation is

ut = Yt* - Xt*β

with the following Jacobian transformation from ut to Yt (depending on ρ only):

Jt(ρ) = |∂ut / ∂Yt| = (1-ρ2)½ for t=1
1 for t>1

Therefore, the (exact) concentrated log-likelihood function is:

ll*(β,ρ|Y,X) = -½N (1+ln(2π)-ln(N)) +½ ln(1-ρ2) -½N ln(u'u)

Extension: AR(2)

The model is defined as εt = ρ1εt-1 + ρ2εt-2 + ut with the following proper data transformation (Z is referenced as either X or Y below):

Pre-Sample Data Initialization

The alternative pre-sample data initialization may be used to transform the time series:

The resulting maximum likelihood estimation is conditional to the pre-sample data initialization.

MA(1) Process

εt = ut - θut-1

Again, we assume |θ| < 1 for model stability. The model is

ut = Yt - Xtβ + θut-1

Notice that the one-period lag of error terms, ut-1, is used to define the model error ut. A recursive calculation is needed with proper initialization of u0. For example, set the initial value u0 = E(ut) = 0 (or alternatively the sample mean of ut), then u1 = Y1-X1β and ut = Yt-Xtβ + θut-1 for t=2,...,N.

Since each log-jacobian terms vanish in this case, the (conditional) concentrated log-likelihood function is simply

ll*(β,θ|Y,X) = -½N (1+ln(2π)-ln(N)) -½N ln(u'u)

ARMA(1,1) Process

εt = ρ εt-1 + ut - θ ut-1

This is the mixed process of AR(1) and MA(1). Using the variable transformations as of AR(1) and data initialization as of MA(1), the model is written as

ut = Yt* - Xt*β + θ ut-1

and the (conditional) concentrated log-likelihood function for parameter estimation is

ll*(β,ρ,θ|Y,X) = -½N (1+ln(2π)-ln(N)) +½ ln(1-ρ2) -½N ln(u'u)

Two-Variable Transfer Function Model

ρ(B)Yt = δ + β(B)Xt + θ(B)εt

where β(B) = β0 + β1B + β2B2 + ... + βKBK. Model analysis including model identification, estimation, and forecasting is the same as (although more complicate than) the univariate ARMA analysis. Regression parameters βs and ARMA parameters ρs and θs must be simultaneously estimated through iterations of nonlinear functional (sum-of-squares or log-likelihood) optimization. For statistical reference, the degrees of freedom must be adjusted.


Auto-Regressive Conditional Heteroscedasticity

In many financial and monetary economic applications, serial correlations over time are characterized not only in the means but also in the variances. The latter is the so-called Auto-Regressive Conditional Heteroscedasticy or ARCH models. It is possible that the variance is unconditionally homogenous.

The Model

Consider the time series regression model:

Yt = Xtβ + εt

At time t, conditional to the available historical information Ht, we assume that the error structure follows a normal distribution:

εt|Ht ~ n(0,σ2t)

where σ2t = α0 + δ1σ2t-1 + ... + δpσ2t-p + α1ε2t-1 + ... + αqε2t-q
= α0 + ∑i=1,2,...pδiσ2t-i + ∑j=1,2,...qαjε2t-j

Let υt = ε2t2t, αi = 0 for i > q, δj = 0 for j > p, and m = max(p,q), the above GARCH(p,q) process may be conveniently re-written as an ARMA(m,p) model for ε2t. That is,

ε2t = α0 + ∑i=1,2,...mii2t-i - ∑j=1,2,...pδjυt-j + υt

This is the general specification of auto-regressive conditional heteroscedasticity, or GARCH(p,q), according to Bollerslev [1986]. If p = 0, then it is the GARCH(0,q) or simply ARCH(q) process:

σ2t = α0 + ∑j=1,2,...qαjε2t-j

ARCH(1) Process

The simplest case is q = 1, or ARCH(1), originated in Engle [1982] as follows:

σ2t = α0 + α1ε2t-1

ARCH(1) model can be summarized as follows:

Yt = Xtβ + εt
εt = ut0 + α1ε2t-1)½   where ut ~ nii(0,1)

Then, the conditional means E(εtt-1) = 0 and the conditional variances σ2t = E(ε2tt-1) = α0 + α1ε2t-1

Note that the unconditional variance of εt is

E(ε2t) = E(E(ε2tt-1)) = α0 + α1E(ε2t-1).

If σ2 = E(ε2t) = E(ε2t-1), then σ2 = α0/(1-α1) provided that |α1| < 1. Therefore, the model may be free of general heteroscedasticity although the conditional heteroscedasticity is assumed.

The ARCH(1) process can be generalized (therefore the name Generalized Auto-Regressive Conditional Heteroscedasticity) to:

GARCH(1,1) Process

σ2t = α0 + α1 ε2t-1 + δ1 σ2t-1

This resembles the mixed auto-regressive moving-average process ARMA(1,1) as described in autocorrelation. Presample variances and squared error terms can be initialized with ∑t=1,2,...,N ε2t/N. The following parameter restrictions are necessary to preserve stationaity of the error process:

Another extension is ARCH or GARCH in mean (ARCH-M or GARCH-M model) which adds the heteroscedastic variance term directly into the regression equation (assuming linear model):

ARCH-M(1) or GARCH-M(1,1) Model

εt = Yt - Xtβ - γσ2t

σ2t = α0 + α1 ε2t-1 (or σ2t = α0 + α1 ε2t-1 + δ1 σ2t-1)

The last variance term of the regression may be expressed in log form or in standard error σt. For example, Yt = Xtβ + γln2t) + εt. Moreover, constraints on the parameters in the conditional variance equation may be required to ensure the positivity of variances: α0 > 0, 0 ≤ α1 < 1 (or α1 + δ1 < 1, δ1 ≥ 0).

Model Identification for ARCH and GARCH Processes

Model Estimation

Recall the normal log-likelihood of a heteroscedastic regression model:

ll = -½N ln(2π) - ½ ∑t=1,2,...,Nln2t) - ½ ∑t=1,2,...,N2t / σ2t)

with the general conditional heteroscedastic variance GARCH(p,q) process:

σ2t = α0 + α1ε2t-1 + α2ε2t-2 + ... + αqε2t-q + δ1σ2t-1 + δ2σ2t-2 + ... + δpσ2t-p

The parameter vector (α δ) is estimated together with the regression parameters (e.g., εt = Yt - Xtβ) by maximizing the log-likelihood, conditional to the starting values ε02, ε2-1, ..., ε2-q, σ20, σ2-1, ..., σ2-p and satisfying the nonnegativity requirement for the estimated variances: σ2t > 0, t=1,2,...,N.

We note that the presample series: ε02, ε2-1, ..., ε2-q, σ20, σ2-1, ..., σ2-p may be initialized by the estimated (homoschedastic) unconditional variance:

1 / [1 - (∑i=1,2,...,qαi + ∑j=1,2,...,pδj)]

or by the estimated sample variance of residuals:

t=1,2,...,Nε2t/N,

GARCH(1,1) Models Based on Non-Normal Distributions

Consider the standard GARCH(1,1) model represented by:

Yt = Xtβ + εt, εt = σtut
σt2 = α0 + α1εt-12 + δ1σt-12

Generalized Exponential Distribution (GED) (Nelson, 1991)

ut ~ GED(v) with zero mean and unit variance, v is the thickness of tails for the underlying GED. If v > 2 the distribution has thinner tails than normal. If v < 2 the distribution has thicker tails than normal.

The p.d.f of GED(v) is written as:

f(ut) =
v

λ
exp[-½|ut/λ|v]

2(1+1/v)Γ(1/v)

where λ =
  Γ(1/2)1/2
|
|
22/v Γ(3/v)

Therefore,

f(εt) = f(Yt) =
v

λ
exp[-½|(Yt-Xtβ)/(λσt)|v]

    2(1+1/v)Γ(1/v)σt

The component log-likelihood function for each observation is:

llt = ln(v/λ) - (1+1/v)ln(2) - ln(Γ(1/v))
- ½|(Yt-Xtβ)/(λσt)|v - ½lnt2)

Student t-Distribution (Bollerslev, 1987)

ut ~ t(d), d > 2 is the degree of freedom of the underlying Student t distribution. The p.d.f of Student t distribution (normalized with zero mean and unit variance) is written as:

f(ut) = C
|
1 +
ut2-(d+1)/2

|
d-2

Therefore,

f(εt) = f(Yt) =
C

σt
|
1 +
1

d-2
|
Yt-Xtβ

σt
2
|
-(d+1)/2
|

where

C =
Γ((d+1)/2)

Γ(d/2)[(d-2)π]1/2

The component log-likelihood function for each observation is:

llt = ln(C) - ½lnt2) - ½(d+1)ln[1+(1/(d-2))((Yt-Xtβ)/σt))2]

where ln(C) = ln(Γ((d+1)/2)) - ln(Γ(d/2)) - ½ln(d-2) - ½ln(π)

Skewed Student t-Distribution (Hansen, 1994)

ut ~ t(d,s), d > 2 is the degree of freedom and -1< s < 1 is the skewedness of the underlying Skewed Student t-distribution. It specializes to the Student t-distribution by setting s = 0.

The p.d.f of Skewed Student t distribution (normalized with zero mean and unit variance) is written as:

f(ut) =
BC
| 1 +
1

|
d-2
A+But2

|
1+s-2s*It
-(d+1)/2
|

where

It = 1 if A+But < 0
0 if A+But ≥ 0

or, equivalently

vt = 1+s-2sIt = 1-s if A+But < 0
1+s if A+But ≥ 0

Therefore,

f(εt) = f(Yt) =
(BC/σt)
| 1 +
1

|
d-2
A+B((Yt-Xtβ)/σt)2

|
1+s-2sIt
-(d+1)/2
|

where

It = 1 if A+B((Yt-Xtβ)/σt) < 0
0 if A+B((Yt-Xtβ)/σt) ≥ 0

C =
   Γ((d+1)/2)

Γ(d/2)[(d-2)π]1/2
   A = 4sC[(d-2)/(d-1)]    B2 = 1 + 3s2 - A2

The component log-likelihood function for each observation is:

llt = ½ln(1+3s2-A2) + ln(C) - ½lnt2)
- ½(d+1) ln{1+(1/(d-2))[(A+B((Yt-Xtβ)/σt))/(1+s-2sIt)]2}

GARCH(1,1) Models with Asymmetry Behavior (Leverage Effect)

There are many evidences in the financial markets that a negative surprise (change in asset returns) tends to increase volatility (variance or risk) more than positive surprise. Therefore, not only the size of the return but also the sign (negative or positive) are important in describing the characteristics of the variance of the asset returns. Consider the following simple model:

Yt = Xtβ + εt
εt = σtut

GJR Specification (Glosten-Jagannathan-Runkle, 1993)

σt2 = α0 + α1εt-12 + δ1σt-12 + γ1t-12Dt-1)

where Dt-1 = 1     if εt-1 > 0
0     otherwise

The parameter γ1 < 0 is sometimes referred as the Leverage Effect. The non-negativity of σt2 is satisfied provided that α0 > 0, δ1 ≥ 0 α11 ≥ 0.

The asymmetric consquences of positive and negative innovations in the GARCH models can be studied based on various distributional assumptions (e.g., normal, t, GED) as described above.

EGARCH Specification (Nelson, 1991)

lnt2) = α0 + δ1lnt-12) + α11ut-1 + (|ut-1| - E|ut-1|)]

where ut = εtt is independently distributed with zero mean and unit variance. The parameter of ut-1, or α1γ1 < 0, is interpreted as the Leverage Effect. We note that the parameter of |ut-1| or α1 > 0 measures the symmetric effect while α1γ1 is the Leverage. The advantage of the Nelson's specification of the variance equation is that log of σt2 is used, then the estimated σt2 is positive no matter what is the sign of the estimated parameters.

Nelson's EGARCH(1,1) model assumes ut ~ GED(v) in which E(ut) = 0 and Var(ut) = 1. Furthermore,

E(|ut|) =
λ 21/v Γ(2/v)

   Γ(1/v)
-> (2/π)1/2 as v -> 2 (normal distribution)

We note that λ =
  Γ(1/2)1/2
|
|
22/v Γ(3/v)
and the parameter v measures the thickness of the underlying GED distribution.

Aternatively, normal EGARCH(1,1) assumes ut ~ Normal(0,1). Then the conditional variance equation is simply

lnt2) = α0 + δ1lnt-12) + α11ut-1 + |ut-1| - (2/π)1/2]

Multi-Equation Time Series Models

Vector Autocorrelation Model

Generalizing from the univariate time series AR(1) model:

Yt = μ + ρYt-1 + εt

the mutivariate system of G variables can be written as follows:

Yit = μi + ∑j=1,2,...,G ρijYj,t-1 + εit (i=1,2,...,G)

This is called Vector Autocorrelation of order 1, or VAR(1). The matrix representation of the model as a simultaneous linear equations system looks like this:

[Y1t,Y2t,...,YGt] = [μ12,...,μG] + [Y1,t-1,Y2,t-1,...,YG,t-1]
|
|
ρ11ρ21..ρG1
ρ12ρ22..ρG2
::::
ρ1Gρ2G..ρGG
|
|
+ [ε12,...,εG]
The alternative is the stacked form suitable for estimation as a system of regression equations:

|
|
Y1t
Y2t
..
YGt
|
|
=
|
|
μ1
μ2
..
μG
|
|
+
|
|
ρ11ρ12..ρ1G
ρ21ρ22..ρ2G
::::
ρG1ρG2..ρGG
|
|
|
|
Y1,t-1
Y2,t-1
..
YG,t-1
|
|
+
|
|
ε1t
ε2t
:
εGt
|
|

In a shorthand notation,

Yt = μ + ρ Yt-1 + εt

Extension: VAR(p)

First, we can write the univariate AR(p) model as the system:

Yt = μ + ρ1Yt-1 + ρ2Yt-2 + ... +ρpYt-p + εt
Yt-1 = Yt-1
Yt-2 = Yt-2
:
Yt-p+1 = Yt-p+1

Or,

|
|
Yt
Yt-1
:
Yt-p+1
|
|
=
|
|
μ
0
:
0
|
|
+
|
|
ρ1 ρ2.. ρp
10..0
::::
0..10
|
|
|
|
Yt-1
Yt-2
:
Yt-p
|
|
+
|
|
εt
0
:
0
|
|

That is,

Yt = μ + ρ Yt-1 + εt

This is a system of p equations with restricted parameters matrix. The usable time series observations are from p+1 to N (N-p in total).

Similarly, for the multivariate VAR(p) system, the model can be expressed in terms of the stacked G endogenous variables. Therefore, Yt, Yt-1, ..., and Yt-p are Gx1 vectors. The size of the problem is (N-p)Gp. Then the parameter matrix ρ of the lag variable Yt-1 is

ρ =
|
|
ρ1ρ2....ρp
I0....0
0I::0
00..I0
|
|

where, for each k = 1,2,...,p, ρk = [ρij,k (i,j=1,2...,G)]. Furthermore, I is GxG identity matrix, and 0 is GxG zeros matrix.

Impulse Response Functions

Deriving from a general VAR(1) system, Yt = μ + ρ Yt-1 + εt, we write:

[I-ρ(B)]Yt = μ + εt

where B is the backshift operator. Then,

Yt = [I-ρ]-1μ + ∑i=0,2...,∞ ρiεt-i

= Y* + (εt + ρ1εt-1 + ρ2εt-2 + ...)

Y* is the equilibrium and εt is the innovation. By shocking one element of εt, says εjt, Yt will move away from the equilibrium Y*. Note that the effect of Yt due to change of εjt is not just on the jth variable alone but also on other variables in the system. The path whereby the variables returns to equilibirum is called the Impulse Responses of a stable VAR system. The Impulse Response Function traces the effects of a one-time innovation εjt on the k-th variable over time (i=0,1,2,...) as ρikj (k,j = 1,2,...,G).

Vector Error Correction Models

To Be Added

Multivariate GARCH(1,1) Model

To consider variances and covariances of multiple markets, similar to VAR model for the mean process, the multivariate GARCH model is presented as follows (for a two-variable case):

|
σ1,t2
σ2,t2
σ12,t
|
=
|
δ1
δ2
δ3
|
+
|
α11α12α13
α21α22α23
α31α32α33
|
|
ε1,t-12
ε2,t-12
ε1,t-1ε2,t-1
|
+
|
γ11γ12γ13
γ21γ22γ23
γ31γ32γ33
|
|
σ1,t-12
σ2,t-12
σ12,t-1
|

The above VEC representation of the model consists of many unknown parameters must be estimated (21 for the 2-variable case). Some parameter restrictions are necessary to ensure the positivity of the conditional varianes and to achieve the parsimony of the model. A diagonalied version of VEC model assumes αij = γij = 0 for i≠j. This is the so-called VECH model (9 parameters for the 2-variable case):

σ1,t2 = δ1 + α11ε1,t-12 + γ11σ1,t-12
σ2,t2 = δ2 + α22ε2,t-12 + γ22σ2,t-12
σ12,t = δ3 + α33ε1,t-1ε2,t-1 + γ33σ12,t-1

A popular BEK model of Engle and Kroner (1995) ensures that the conditional variances are positive as follows:

Ht = C'C + A'εt-1εt-1'A + B'Ht-1B

where for the 2-variable case,

Ht =
σ11,tσ12,t
σ12,tσ22,t
  εt-1 =
ε1,t-1
ε2,t-1
  C =
c11c12
c12c22
  A =
α11α12
α21α22
  B =
β11β12
β21β22

In general, σij,t2 depends on the squared residuals, cross products of the residuals, and the conditional variances and covariances of all variables in the system. The model is difficult to estimate. If we define the conditional correlation as

ρij,t = σij,t/(σii,tσjj,t)½

Then the Constant Conditional Correlation (CCC) Model (see Bollerslev, 1990) is defined by

ρij,t = ρij for all t.

In a sense, the CCC model is a compromise in that the variance terms need not be diagonalized, but the covariance terms are always proportional to (σii,tσjj,t)½. Hence, the covariance equation entails only one parameter. In particular, for the 2-variable VECH case,

σ1,t2 = δ1 + α11ε1,t-12 + γ11σ1,t-12
σ2,t2 = δ2 + α22ε2,t-12 + γ22σ2,t-12
σ12,t = ρ12ii,tσjj,t)½

A more general Dynamic Conditional Correlation (DCC) model is to allow for time-varying conditional correlation ρij,t. Let the variance-covariance matrix Ht = DtRtDt', where Dt is the diagonal matrix of conditional standard errors σi,t, and Rt is the matrix of conditional correlation ρij,t. The complete time series regression model of K variables can be described as:

Yt = Xtβ + εt
εt = Htut, ut ~ Normal(0,I)
Ht = DtRtDt', Dt = Diag[σi,t], Rt = [ρij,t]

The model is estimated by the maximum likelihood method, with the log-likelihood function defined by

L = -½ ∑t [Kln(2π) + ln(|Ht|) + εtHt-1εt]

For more information, see


State-Space Models

State-space analysis deals with dynamic time series models that involve unobserved state variables such as inflation expectation, permanent income, time-varying parameters, etc.. The basic tool used to study the state-space model is the Kalman Filter, which is a recursive algorithm for estimating the unobserved component or state vector at time t, based on available information through time t-1.

Model Representation

A state-space model consists of two equations:

Conditional to the information available at time t-1, the expected value of βt is Et-1t) = ct + FtEt-1t-1). Similarly, the conditional covariance is Vart-1t) = FtVart-1t-1)Ft' + Q. For notational convenience, let βt|t-1 = Et-1t) and Ωt|t-1 = Vart-1t). Then,

βt|t-1 = ct + Ftβt-1|t-1
Ωt|t-1 = FtΩt-1|t-1Ft' + Q

Combining the measurement and transition equations, we have

Yt = (HtFtt-1 + (Htct+at) + (Htvt+ut)

Given the information at time t-1, the conditional expectation and covariance of Yt are:

Yt|t-1 = Et-1(Yt) = Htβt|t-1 + at
Σt|t-1 = Vart-1(Yt) = HtΩt|t-1Ht' + R

Since Yt is distributed according to normal(Yt|t-1t|t-1), the log-likelihood is evaluated as:

llt = - ½ ln(2πΣt|t-1) - ½ (Yt-Yt|t-1)'Σt|t-1-1(Yt-Yt|t-1)

Kalman Fileter

The computation of log-likelihood function for parameter estimation is based on the algorithm of Kalman Filter as follows:

The above basic filter (prediction and updating) is carried out iteratively from t=1 to t=T. At the end, the sum of log-likelihoods is maximized with respect to the model paramters. To begin at time t=1, the initial values β0|0 and Ω0|0 must be given. If βt is stationary, then the unconditional expectation and covariance may be used:

β0|0 = (I-F)-1c
vec(Ω0|0) = (I-F⊗F)-1vec(Q)

If βt is nonstationary, then we can use a wild guess of β0|0 (e.g. zeros vector) with large diagonal elements in the covariance matrix Ω0|0. In this case, the evaluation of log-likelihood and inference should not include the first few observations of the guess values.

As a by product of maximum likelihood estimation, we obtain the estimated (updated) parameter vector and the corresponding covariance matrix at time t: βt|t and Ωt|t, for t=1,...,T. For a better inference, the smoothed parameter vector and the corresponding covariance matrix based on all information in the sample are:

βt|T = βt|t + K*t+1t+1|T-ct+1-Ft+1βt|t)
Ωt|T = Ωt|t + K*t+1t+1|Tt+1|t)K*t+1'

where K*t+1 = Ωt|tFt+1t+1|t-1. The smoothing is performed from t=T-1 down to t=1 with the initial values βT|T and ΩT|T obtained from the last iteration of the basic filter.

Applications

Example 7

C-J. Kim and C. R. Nelson, "The Time-Varying-Parameter Model for Modeling Changing Conditional Variance: The case of the Lucas Hypothesis," Journal of Business and Economic Statistics, 1989, 433-440.

The State-Space Model Representation

Data Description (Data)

ΔM = Quarterly M1 growth rate
ΔR = Change in 3-month T-bill interest rate
ΔP = Inflation rate as measured by the CPI
SURP = Detrended full employment budget surplus

Fixed Parameters

σ2, σ02, σ12, σ22, σ32, σ42.

Time-Varying Parameters

β0t, β1t, β2t, β3t, β4t.


Last updated: 01/25/2016