Time Series Analysis III

Advanced Topics

ARMAX: ARMA Analysis for Regression Residuals

AR(1), MA(1), ARMA(1)
Two-Variable Transfer Function Model

Auto-Regressive Conditional Heteroscedasticity

The Model: ARCH(1), ARCH-M(1), GARCH(1,1)
Model Identification for ARCH Process
Model Estimation
GARCH(1,1) Model Based on Non-Normal Distributions

Multi-Equation Time Series Models

VAR Analysis
VEC Model
Multivariate GARCH Model

State-Space Models

Model Representation
Kalman Filter
Applications

Readings

R. S. Tsay, Chapter 3, 8, 10, 11.
W. Enders, Chapter 3, 5.
W. H. Green, 7th ed., Chapter 20.
Additional Readings:
- T. Bollerslev, "Generalized Autoregressive Conditional Heteroskedasticity," Journal of Econometrics 31, 1986, 307-327.
- T. Bollerslev, " A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return," Review of Economics and Statistics 69, 1987, 542-547 (Paper).
- T. Bollerslev and E. Ghysels, "Periodic Autoregressive Conditional Heterscedasticity," American Statistical Association Journal of Business and Economic Statistics 14, 1996, 139-151.
- R. F. Engle, "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica 50, 1982, 987-1006 (Paper).
- R. F. Engle, D. M. Lilien, and R. P. Robins, "Estimating Time-Varying Risk Premia in the Term Structure: the ARCH-M Model," Econometrica 55, 1987, 391-407 (Paper).
- L. R. Glosten, R. Jagannathan, and D. Runkle, "Relationship Between the Expected Value and the Volatility of the Normal Excess Return on Stocks," Journal of Finance, 48, 1993, 1779-1801 (Paper).
- D. B. Nelson, "Conditional Heteroskedasticity in Asset Returns: A New Approach," Econometrica 59, 1991, 347-370 (Paper).

ARMAX: ARMA Analysis for Regression Residuals

Y_t = X_tβ + ε_t
ε_t = ρ₁ε_t-1 + ρ₂ε_t-2 + ... + ρ_pε_t-p - θ₁u_t-1 - θ₂u_t-2 - ... - θ_qu_t-q + u_t

Y_t = X_tβ + ρ(B)^-1θ(B)u_t

ρ(B)Y_t = ρ(B)X_tβ + θ(B)u_t where u_t ~ nii(0,σ²).

AR(1) Process

ε_t = ρ ε_t-1 + u_t

We assume |ρ| < 1 for model stability. It is clear that

σ² = Var(u_t) = (1-ρ²) Var(ε_t).

Denote the variable transformations Y_t^* = Y_t - ρ Y_t-1 and X_t^* = X_t - ρ X_t-1. Since u₁ = (1-ρ²)^½ ε₁, the otherwise lost first observation is kept with the transformations Y₁^* = (1-ρ²)^½Y₁ and X₁^* = (1-ρ²)^½X₁.

Thus model for estimation is

u_t = Y_t^* - X_t^*β

with the following Jacobian transformation from u_t to Y_t (depending on ρ only):

J_t(ρ) = |∂u_t / ∂Y_t| = (1-ρ²)^½ for t=1

1 for t>1

Therefore, the (exact) concentrated log-likelihood function is:

ll^*(β,ρ|Y,X) = -½N (1+ln(2π)-ln(N)) +½ ln(1-ρ²) -½N ln(u'u)

Extension: AR(2)

The model is defined as ε_t = ρ₁ε_t-1 + ρ₂ε_t-2 + u_t with the following proper data transformation (Z is referenced as either X or Y below):

Z₁^* = [(1+ρ₂)((1-ρ₂)²-ρ₁²) / (1-ρ₂)]^½ Z₁
Z₂^* = (1-ρ₂²)^½Z₂ - [ρ₁(1-ρ₁²)^½/(1-ρ₂)]Z₁
Z_t^* = Z_t - ρ₁Z_t-1 - ρ₂z_t-2, t=3,4,...,N.

Pre-Sample Data Initialization

The alternative pre-sample data initialization may be used to transform the time series:

Y₀ = Y_-1 = ... = ∑_t=1,2,...,NY_t/N
X₀ = X_-1 = ... = ∑_t=1,2,...,NX_t/N

The resulting maximum likelihood estimation is conditional to the pre-sample data initialization.

MA(1) Process

ε_t = u_t - θu_t-1

Again, we assume |θ| < 1 for model stability. The model is

u_t = Y_t - X_tβ + θu_t-1

Notice that the one-period lag of error terms, u_t-1, is used to define the model error u_t. A recursive calculation is needed with proper initialization of u₀. For example, set the initial value u₀ = E(u_t) = 0 (or alternatively the sample mean of u_t), then u₁ = Y₁-X₁β and u_t = Y_t-X_tβ + θu_t-1 for t=2,...,N.

Since each log-jacobian terms vanish in this case, the (conditional) concentrated log-likelihood function is simply

ll^*(β,θ|Y,X) = -½N (1+ln(2π)-ln(N)) -½N ln(u'u)

ARMA(1,1) Process

ε_t = ρ ε_t-1 + u_t - θ u_t-1

This is the mixed process of AR(1) and MA(1). Using the variable transformations as of AR(1) and data initialization as of MA(1), the model is written as

u_t = Y_t^* - X_t^*β + θ u_t-1

and the (conditional) concentrated log-likelihood function for parameter estimation is

ll^*(β,ρ,θ|Y,X) = -½N (1+ln(2π)-ln(N)) +½ ln(1-ρ²) -½N ln(u'u)

Two-Variable Transfer Function Model

ρ(B)Y_t = δ + β(B)X_t + θ(B)ε_t

where β(B) = β₀ + β₁B + β₂B² + ... + β_KB^K. Model analysis including model identification, estimation, and forecasting is the same as (although more complicate than) the univariate ARMA analysis. Regression parameters βs and ARMA parameters ρs and θs must be simultaneously estimated through iterations of nonlinear functional (sum-of-squares or log-likelihood) optimization. For statistical reference, the degrees of freedom must be adjusted.

Auto-Regressive Conditional Heteroscedasticity

In many financial and monetary economic applications, serial correlations over time are characterized not only in the means but also in the variances. The latter is the so-called Auto-Regressive Conditional Heteroscedasticy or ARCH models. It is possible that the variance is unconditionally homogenous.

The Model

Consider the time series regression model:

Y_t = X_tβ + ε_t

At time t, conditional to the available historical information H_t, we assume that the error structure follows a normal distribution:

ε_t|H_t ~ n(0,σ²_t)

where σ²_t = α₀ + δ₁σ²_t-1 + ... + δ_pσ²_t-p + α₁ε²_t-1 + ... + α_qε²_t-q

= α₀ + ∑_i=1,2,...pδ_iσ²_t-i + ∑_j=1,2,...qα_jε²_t-j

Let υ_t = ε²_t-σ²_t, α_i = 0 for i > q, δ_j = 0 for j > p, and m = max(p,q), the above GARCH(p,q) process may be conveniently re-written as an ARMA(m,p) model for ε²_t. That is,

ε²_t = α₀ + ∑_i=1,2,...m (α_i+δ_i)ε²_t-i - ∑_j=1,2,...pδ_jυ_t-j + υ_t

This is the general specification of auto-regressive conditional heteroscedasticity, or GARCH(p,q), according to Bollerslev [1986]. If p = 0, then it is the GARCH(0,q) or simply ARCH(q) process:

σ²_t = α₀ + ∑_j=1,2,...qα_jε²_t-j

ARCH(1) Process

The simplest case is q = 1, or ARCH(1), originated in Engle [1982] as follows:

σ²_t = α₀ + α₁ε²_t-1

ARCH(1) model can be summarized as follows:

Y_t = X_tβ + ε_t
ε_t = u_t(α₀ + α₁ε²_t-1)^½ where u_t ~ nii(0,1)

Then, the conditional means E(ε_t|ε_t-1) = 0 and the conditional variances σ²_t = E(ε²_t|ε_t-1) = α₀ + α₁ε²_t-1

Note that the unconditional variance of ε_t is

E(ε²_t) = E(E(ε²_t|ε_t-1)) = α₀ + α₁E(ε²_t-1).

If σ² = E(ε²_t) = E(ε²_t-1), then σ² = α₀/(1-α₁) provided that |α₁| < 1. Therefore, the model may be free of general heteroscedasticity although the conditional heteroscedasticity is assumed.

The ARCH(1) process can be generalized (therefore the name Generalized Auto-Regressive Conditional Heteroscedasticity) to:

GARCH(1,1) Process

σ²_t = α₀ + α₁ ε²_t-1 + δ₁ σ²_t-1

This resembles the mixed auto-regressive moving-average process ARMA(1,1) as described in autocorrelation. Presample variances and squared error terms can be initialized with ∑_t=1,2,...,N ε²_t/N. The following parameter restrictions are necessary to preserve stationaity of the error process:

α₀ > 0
α₁ ≥ 0
δ₁ ≥ 0
α₁ + δ₁ < 1

Another extension is ARCH or GARCH in mean (ARCH-M or GARCH-M model) which adds the heteroscedastic variance term directly into the regression equation (assuming linear model):

ARCH-M(1) or GARCH-M(1,1) Model

ε_t = Y_t - X_tβ - γσ²_t

σ²_t = α₀ + α₁ ε²_t-1 (or σ²_t = α₀ + α₁ ε²_t-1 + δ₁ σ²_t-1)

The last variance term of the regression may be expressed in log form or in standard error σ_t. For example, Y_t = X_tβ + γln(σ²_t) + ε_t. Moreover, constraints on the parameters in the conditional variance equation may be required to ensure the positivity of variances: α₀ > 0, 0 ≤ α₁ < 1 (or α₁ + δ₁ < 1, δ₁ ≥ 0).

Model Identification for ARCH and GARCH Processes

Autocorrelation Function and Partial Autocorrelation Function based on the squares of regression residuals ε_t (or the standardized residuals ε_t/σ_t if σ_t is suspect of non-constancy).
Engle-Bollerslev LM Test of GARCH Effects (Bollerslev [1986]).
Testing H₀: α₁ = α₂ = ... = α_q = 0 for the linear regression equation ε²_t = α₀ + α₁ε²_t-1 + α₂ε²_t-2 + ... + α_qε²_t-q + υ_t, based on the test statistic NR² ~ Chi-Square(q).

Model Estimation

Recall the normal log-likelihood of a heteroscedastic regression model:

ll = -½N ln(2π) - ½ ∑_t=1,2,...,Nln(σ²_t) - ½ ∑_t=1,2,...,N(ε²_t / σ²_t)

with the general conditional heteroscedastic variance GARCH(p,q) process:

σ²_t = α₀ + α₁ε²_t-1 + α₂ε²_t-2 + ... + α_qε²_t-q + δ₁σ²_t-1 + δ₂σ²_t-2 + ... + δ_pσ²_t-p

The parameter vector (α δ) is estimated together with the regression parameters (e.g., ε_t = Y_t - X_tβ) by maximizing the log-likelihood, conditional to the starting values ε₀², ε²_-1, ..., ε²_-q, σ²₀, σ²_-1, ..., σ²_-p and satisfying the nonnegativity requirement for the estimated variances: σ²_t > 0, t=1,2,...,N.

We note that the presample series: ε₀², ε²_-1, ..., ε²_-q, σ²₀, σ²_-1, ..., σ²_-p may be initialized by the estimated (homoschedastic) unconditional variance:

1 / [1 - (∑_i=1,2,...,qα_i + ∑_j=1,2,...,pδ_j)]

or by the estimated sample variance of residuals:

∑_t=1,2,...,Nε²_t/N,

GARCH(1,1) Models Based on Non-Normal Distributions

Consider the standard GARCH(1,1) model represented by:

Y_t = X_tβ + ε_t, ε_t = σ_tu_t
σ_t² = α₀ + α₁ε_t-1² + δ₁σ_t-1²

Generalized Exponential Distribution (GED) (Nelson, 1991)

u_t ~ GED(v) with zero mean and unit variance, v is the thickness of tails for the underlying GED. If v > 2 the distribution has thinner tails than normal. If v < 2 the distribution has thicker tails than normal.

The p.d.f of GED(v) is written as:

f(u_t) =

v

λ

exp[-½|u_t/λ|^v]

2^(1+1/v)Γ(1/v)

where λ =

⌈ Γ(1/2) ⌉ ^1/2

|
|

⌊ 2^2/v Γ(3/v) ⌋

Therefore,

f(ε_t) = f(Y_t) =

v

λ

exp[-½|(Y_t-X_tβ)/(λσ_t)|^v]

2^(1+1/v)Γ(1/v)σ_t

The component log-likelihood function for each observation is:

ll_t = ln(v/λ) - (1+1/v)ln(2) - ln(Γ(1/v))

- ½|(Y_t-X_tβ)/(λσ_t)|^v - ½ln(σ_t²)

Student t-Distribution (Bollerslev, 1987)

u_t ~ t(d), d > 2 is the degree of freedom of the underlying Student t distribution. The p.d.f of Student t distribution (normalized with zero mean and unit variance) is written as:

f(u_t) = C

⌈

|

⌊

1 +

u_t² ⌉ ^-(d+1)/2

|

d-2 ⌋

Therefore,

f(ε_t) = f(Y_t) =

C

σ_t

⌈

|

⌊

1 +

1

d-2

⌈

|

⌊

Y_t-X_tβ

σ_t

⌉ ²

|

⌋

⌉ ^-(d+1)/2

|

⌋

where

C =

Γ((d+1)/2)

Γ(d/2)[(d-2)π]^1/2

The component log-likelihood function for each observation is:

ll_t = ln(C) - ½ln(σ_t²) - ½(d+1)ln[1+(1/(d-2))((Y_t-X_tβ)/σ_t))²]

where ln(C) = ln(Γ((d+1)/2)) - ln(Γ(d/2)) - ½ln(d-2) - ½ln(π)

Skewed Student t-Distribution (Hansen, 1994)

u_t ~ t(d,s), d > 2 is the degree of freedom and -1< s < 1 is the skewedness of the underlying Skewed Student t-distribution. It specializes to the Student t-distribution by setting s = 0.

The p.d.f of Skewed Student t distribution (normalized with zero mean and unit variance) is written as:

f(u_t) =

BC

⌈

| 1 +

⌊

1 ⌈

|

d-2 ⌊

A+Bu_t ⌉ 2

|

1+s-2s*I_t ⌋

⌉ ^-(d+1)/2

|

⌋

where

I_t = 1 if A+Bu_t < 0

0 if A+Bu_t ≥ 0

or, equivalently

v_t = 1+s-2sI_t = 1-s if A+Bu_t < 0

1+s if A+Bu_t ≥ 0

Therefore,

f(ε_t) = f(Y_t) =

(BC/σ_t)

⌈

| 1 +

⌊

1 ⌈

|

d-2 ⌊

A+B((Y_t-X_tβ)/σ_t) ⌉ 2

|

1+s-2sI_t ⌋

⌉ ^-(d+1)/2

|

⌋

where

I_t = 1 if A+B((Y_t-X_tβ)/σ_t) < 0

0 if A+B((Y_t-X_tβ)/σ_t) ≥ 0

C =

Γ((d+1)/2)

Γ(d/2)[(d-2)π]^1/2

A = 4sC[(d-2)/(d-1)] B² = 1 + 3s² - A²

The component log-likelihood function for each observation is:

ll_t = ½ln(1+3s²-A²) + ln(C) - ½ln(σ_t²)

- ½(d+1) ln{1+(1/(d-2))[(A+B((Y_t-X_tβ)/σ_t))/(1+s-2sI_t)]²}

GARCH(1,1) Models with Asymmetry Behavior (Leverage Effect)

There are many evidences in the financial markets that a negative surprise (change in asset returns) tends to increase volatility (variance or risk) more than positive surprise. Therefore, not only the size of the return but also the sign (negative or positive) are important in describing the characteristics of the variance of the asset returns. Consider the following simple model:

Y_t = X_tβ + ε_t
ε_t = σ_tu_t

GJR Specification (Glosten-Jagannathan-Runkle, 1993)

σ_t² = α₀ + α₁ε_t-1² + δ₁σ_t-1² + γ₁(ε_t-1²D_t-1)

where D_t-1 = 1 if ε_t-1 > 0

0 otherwise

The parameter γ₁ < 0 is sometimes referred as the Leverage Effect. The non-negativity of σ_t² is satisfied provided that α₀ > 0, δ₁ ≥ 0 α₁+γ₁ ≥ 0.

The asymmetric consquences of positive and negative innovations in the GARCH models can be studied based on various distributional assumptions (e.g., normal, t, GED) as described above.

EGARCH Specification (Nelson, 1991)

ln(σ_t²) = α₀ + δ₁ln(σ_t-1²) + α₁[γ₁u_t-1 + (|u_t-1| - E|u_t-1|)]

where u_t = ε_t/σ_t is independently distributed with zero mean and unit variance. The parameter of u_t-1, or α₁γ₁ < 0, is interpreted as the Leverage Effect. We note that the parameter of |u_t-1| or α₁ > 0 measures the symmetric effect while α₁γ₁ is the Leverage. The advantage of the Nelson's specification of the variance equation is that log of σ_t² is used, then the estimated σ_t² is positive no matter what is the sign of the estimated parameters.

Nelson's EGARCH(1,1) model assumes u_t ~ GED(v) in which E(u_t) = 0 and Var(u_t) = 1. Furthermore,

E(|u_t|) =

λ 2^1/v Γ(2/v)

Γ(1/v)

-> (2/π)^1/2 as v -> 2 (normal distribution)

We note that λ =

⌈ Γ(1/2) ⌉ ^1/2

|
|

⌊ 2^2/v Γ(3/v) ⌋

and the parameter v measures the thickness of the underlying GED distribution.

Aternatively, normal EGARCH(1,1) assumes u_t ~ Normal(0,1). Then the conditional variance equation is simply

ln(σ_t²) = α₀ + δ₁ln(σ_t-1²) + α₁[γ₁u_t-1 + |u_t-1| - (2/π)^1/2]

Multi-Equation Time Series Models

Vector Autocorrelation Model

Generalizing from the univariate time series AR(1) model:

Y_t = μ + ρY_t-1 + ε_t

the mutivariate system of G variables can be written as follows:

Y_it = μ_i + ∑_j=1,2,...,G ρ_ijY_j,t-1 + ε_it (i=1,2,...,G)

This is called Vector Autocorrelation of order 1, or VAR(1). The matrix representation of the model as a simultaneous linear equations system looks like this:

[Y_1t,Y_2t,...,Y_Gt] = [μ₁,μ₂,...,μ_G] + [Y_1,t-1,Y_2,t-1,...,Y_G,t-1]

⌈

|

|

⌊

ρ₁₁ ρ₂₁ .. ρ_G1

ρ₁₂ ρ₂₂ .. ρ_G2

: : : :

ρ_1G ρ_2G .. ρ_GG

⌉

|

|

⌋

+ [ε₁,ε₂,...,ε_G]

The alternative is the stacked form suitable for estimation as a system of regression equations:

⌈

|

|

⌊

Y_1t

Y_2t

..

Y_Gt

⌉

|

|

⌋

=

⌈

|

|

⌊

μ₁

μ₂

..

μ_G

⌉

|

|

⌋

+

⌈

|

|

⌊

ρ₁₁ ρ₁₂ .. ρ_1G

ρ₂₁ ρ₂₂ .. ρ_2G

: : : :

ρ_G1 ρ_G2 .. ρ_GG

⌉

|

|

⌋

⌈

|

|

⌊

Y_1,t-1

Y_2,t-1

..

Y_G,t-1

⌉

|

|

⌋

+

⌈

|

|

⌊

ε_1t

ε_2t

:

ε_Gt

⌉

|

|

⌋

In a shorthand notation,

Y_t = μ + ρ Y_t-1 + ε_t

Extension: VAR(p)

First, we can write the univariate AR(p) model as the system:

Y_t = μ + ρ₁Y_t-1 + ρ₂Y_t-2 + ... +ρ_pY_t-p + ε_t
Y_t-1 = Y_t-1
Y_t-2 = Y_t-2
:
Y_t-p+1 = Y_t-p+1

Or,

⌈

|

|

⌊

Y_t

Y_t-1

:

Y_t-p+1

⌉

|

|

⌋

=

⌈

|

|

⌊

μ

0

:

0

⌉

|

|

⌋

+

⌈

|

|

⌊

ρ₁ ρ₂ .. ρ_p

1 0 .. 0

: : : :

0 .. 1 0

⌉

|

|

⌋

⌈

|

|

⌊

Y_t-1

Y_t-2

:

Y_t-p

⌉

|

|

⌋

+

⌈

|

|

⌊

ε_t

0

:

0

⌉

|

|

⌋

That is,

Y_t = μ + ρ Y_t-1 + ε_t

This is a system of p equations with restricted parameters matrix. The usable time series observations are from p+1 to N (N-p in total).

Similarly, for the multivariate VAR(p) system, the model can be expressed in terms of the stacked G endogenous variables. Therefore, Y_t, Y_t-1, ..., and Y_t-p are Gx1 vectors. The size of the problem is (N-p)Gp. Then the parameter matrix ρ of the lag variable Y_t-1 is

ρ =

⌈

|

|

⌊

ρ₁ ρ₂ .. .. ρ_p

I 0 .. .. 0

0 I : : 0

0 0 .. I 0

⌉

|

|

⌋

where, for each k = 1,2,...,p, ρ_k = [ρ_ij,k (i,j=1,2...,G)]. Furthermore, I is GxG identity matrix, and 0 is GxG zeros matrix.

Impulse Response Functions

Deriving from a general VAR(1) system, Y_t = μ + ρ Y_t-1 + ε_t, we write:

[I-ρ(B)]Y_t = μ + ε_t

where B is the backshift operator. Then,

Y_t = [I-ρ]^-1μ + ∑_i=0,2...,∞ ρⁱε_t-i

= Y^* + (ε_t + ρ¹ε_t-1 + ρ²ε_t-2 + ...)

Y^* is the equilibrium and ε_t is the innovation. By shocking one element of ε_t, says ε_jt, Y_t will move away from the equilibrium Y^*. Note that the effect of Y_t due to change of ε_jt is not just on the jth variable alone but also on other variables in the system. The path whereby the variables returns to equilibirum is called the Impulse Responses of a stable VAR system. The Impulse Response Function traces the effects of a one-time innovation ε_jt on the k-th variable over time (i=0,1,2,...) as ρⁱ_kj (k,j = 1,2,...,G).

Vector Error Correction Models

To Be Added

Multivariate GARCH(1,1) Model

To consider variances and covariances of multiple markets, similar to VAR model for the mean process, the multivariate GARCH model is presented as follows (for a two-variable case):

⌈

|

⌊

σ_1,t²

σ_2,t²

σ_12,t

⌉

|

⌋

=

⌈

|

⌊

δ₁

δ₂

δ₃

⌉

|

⌋

+

⌈

|

⌊

α₁₁ α₁₂ α₁₃

α₂₁ α₂₂ α₂₃

α₃₁ α₃₂ α₃₃

⌉

|

⌋

⌈

|

⌊

ε_1,t-1²

ε_2,t-1²

ε_1,t-1ε_2,t-1

⌉

|

⌋

+

⌈

|

⌊

γ₁₁ γ₁₂ γ₁₃

γ₂₁ γ₂₂ γ₂₃

γ₃₁ γ₃₂ γ₃₃

⌉

|

⌋

⌈

|

⌊

σ_1,t-1²

σ_2,t-1²

σ_12,t-1

⌉

|

⌋

The above VEC representation of the model consists of many unknown parameters must be estimated (21 for the 2-variable case). Some parameter restrictions are necessary to ensure the positivity of the conditional varianes and to achieve the parsimony of the model. A diagonalied version of VEC model assumes α_ij = γ_ij = 0 for i≠j. This is the so-called VECH model (9 parameters for the 2-variable case):

σ_1,t² = δ₁ + α₁₁ε_1,t-1² + γ₁₁σ_1,t-1²
σ_2,t² = δ₂ + α₂₂ε_2,t-1² + γ₂₂σ_2,t-1²
σ_12,t = δ₃ + α₃₃ε_1,t-1ε_2,t-1 + γ₃₃σ_12,t-1

A popular BEK model of Engle and Kroner (1995) ensures that the conditional variances are positive as follows:

H_t = C'C + A'ε_t-1ε_t-1'A + B'H_t-1B

where for the 2-variable case,

H_t =

⌈

⌊

σ_11,t σ_12,t

σ_12,t σ_22,t

⌉

⌋

ε_t-1 =

⌈

⌊

ε_1,t-1

ε_2,t-1

⌉

⌋

C =

⌈

⌊

c₁₁ c₁₂

c₁₂ c₂₂

⌉

⌋

A =

⌈

⌊

α₁₁ α₁₂

α₂₁ α₂₂

⌉

⌋

B =

⌈

⌊

β₁₁ β₁₂

β₂₁ β₂₂

⌉

⌋

In general, σ_ij,t² depends on the squared residuals, cross products of the residuals, and the conditional variances and covariances of all variables in the system. The model is difficult to estimate. If we define the conditional correlation as

ρ_ij,t = σ_ij,t/(σ_ii,tσ_jj,t)^½

Then the Constant Conditional Correlation (CCC) Model (see Bollerslev, 1990) is defined by

ρ_ij,t = ρ_ij for all t.

In a sense, the CCC model is a compromise in that the variance terms need not be diagonalized, but the covariance terms are always proportional to (σ_ii,tσ_jj,t)^½. Hence, the covariance equation entails only one parameter. In particular, for the 2-variable VECH case,

σ_1,t² = δ₁ + α₁₁ε_1,t-1² + γ₁₁σ_1,t-1²
σ_2,t² = δ₂ + α₂₂ε_2,t-1² + γ₂₂σ_2,t-1²
σ_12,t = ρ₁₂(σ_ii,tσ_jj,t)^½

A more general Dynamic Conditional Correlation (DCC) model is to allow for time-varying conditional correlation ρ_ij,t. Let the variance-covariance matrix H_t = D_tR_tD_t', where D_t is the diagonal matrix of conditional standard errors σ_i,t, and R_t is the matrix of conditional correlation ρ_ij,t. The complete time series regression model of K variables can be described as:

Y_t = X_tβ + ε_t
ε_t = H_tu_t, u_t ~ Normal(0,I)
H_t = D_tR_tD_t', D_t = Diag[σ_i,t], R_t = [ρ_ij,t]

The model is estimated by the maximum likelihood method, with the log-likelihood function defined by

L = -½ ∑_t [Kln(2π) + ln(|H_t|) + ε_tH_t^-1ε_t]

For more information, see

L. Bauwens, S. Laurent, and J.V.K. Rombouts, "Multivariate GARCH Models: A Survey," Journal of Applied Econometrics 21, 2006, 79-110.
T. Bollerslev, "Modeling the Coherence in Short-run Nominal Echange Rates: A Multivariate Generalized ARCH Model," Review of Economics and Statistics 72, 1990, 49-505.
R. F. Engle, "A Dynamic Conditional Correlation: A Simple Class of Multivariate Generalied Autoregressive Conditional Heteroskedasticity Models," Journal of Business and Economic Statistics 20(3), 2002, 339-350.
R. F. Engle and F. K. Kroner, "Multivariate Simultaneous Generalized ARCH," Econometric Theory 11, 1995, 122-150.

State-Space Models

State-space analysis deals with dynamic time series models that involve unobserved state variables such as inflation expectation, permanent income, time-varying parameters, etc.. The basic tool used to study the state-space model is the Kalman Filter, which is a recursive algorithm for estimating the unobserved component or state vector at time t, based on available information through time t-1.

Model Representation

A state-space model consists of two equations:

Measurement Equation (Observation Equation): The relationship between observed variables (nx1 data vector Y_t) and unobserved state variables (kx1 parameter vector β_t).
Y_t = H_tβ_t + a_t + u_t
where H_t is an nxk matrix and a_t is an nx1 vector, which may be either data on exogenous variables or constant paramters. That is, given the exogenous or predetermined observed variables X_t, we may define H_t = H(X_t) and a_t = a(X_t).
We assume u_t ~ nii(0_nx1,R_nxn). Note that the covariance matrix R may also depend on X_t.
Transition Equation (State Equation): The first-order difference equation describing the dynamics of the state variables.
β_t = c_t + F_tβ_t-1 + v_t
where F_t is an kxk matrix and c_t is an kx1 vector.
We assume v_t ~ nii(0_kx1,Q_kxk) and Cov(u_t,v_s) = E(u_tv_s') = 0_nxk. Note that c_t = c(X_t), F_t = F(X_t), and the covariance matrix Q may depend on X_t.

Conditional to the information available at time t-1, the expected value of β_t is E_t-1(β_t) = c_t + F_tE_t-1(β_t-1). Similarly, the conditional covariance is Var_t-1(β_t) = F_tVar_t-1(β_t-1)F_t' + Q. For notational convenience, let β_t|t-1 = E_t-1(β_t) and Ω_t|t-1 = Var_t-1(β_t). Then,

β_t|t-1 = c_t + F_tβ_t-1|t-1
Ω_t|t-1 = F_tΩ_t-1|t-1F_t' + Q

Combining the measurement and transition equations, we have

Y_t = (H_tF_t)β_t-1 + (H_tc_t+a_t) + (H_tv_t+u_t)

Given the information at time t-1, the conditional expectation and covariance of Y_t are:

Y_t|t-1 = E_t-1(Y_t) = H_tβ_t|t-1 + a_t
Σ_t|t-1 = Var_t-1(Y_t) = H_tΩ_t|t-1H_t' + R

Since Y_t is distributed according to normal(Y_t|t-1,Σ_t|t-1), the log-likelihood is evaluated as:

ll_t = - ½ ln(2πΣ_t|t-1) - ½ (Y_t-Y_t|t-1)'Σ_t|t-1^-1(Y_t-Y_t|t-1)

Kalman Fileter

The computation of log-likelihood function for parameter estimation is based on the algorithm of Kalman Filter as follows:

Prediction
β_t|t-1 = c_t + F_tβ_t-1|t-1
Ω_t|t-1 = F_tΩ_t-1|t-1F_t' + Q
Define the prediction error ε_t|t-1 = Y_t - Y_t|t-1. Then
ε_t|t-1 = Y_t - H_tβ_t|t-1 - a_t
Σ_t|t-1 = H_tΩ_t|t-1H_t' + R
Then the log-likelihood is defined by
ll_t = - ½ ln(2πΣ_t|t-1) - ½ ε_t|t-1'Σ_t|t-1^-1ε_t|t-1
Updating
β_t|t = β_t|t-1 + K_tε_t|t-1
Ω_t|t = Ω_t|t-1 - K_tH_tΩ_t|t-1
where K_t = Ω_t|t-1H_t'Σ_t|t-1^-1 is the Kalman gain.

The above basic filter (prediction and updating) is carried out iteratively from t=1 to t=T. At the end, the sum of log-likelihoods is maximized with respect to the model paramters. To begin at time t=1, the initial values β_0|0 and Ω_0|0 must be given. If β_t is stationary, then the unconditional expectation and covariance may be used:

β_0|0 = (I-F)^-1c
vec(Ω_0|0) = (I-F⊗F)^-1vec(Q)

If β_t is nonstationary, then we can use a wild guess of β_0|0 (e.g. zeros vector) with large diagonal elements in the covariance matrix Ω_0|0. In this case, the evaluation of log-likelihood and inference should not include the first few observations of the guess values.

As a by product of maximum likelihood estimation, we obtain the estimated (updated) parameter vector and the corresponding covariance matrix at time t: β_t|t and Ω_t|t, for t=1,...,T. For a better inference, the smoothed parameter vector and the corresponding covariance matrix based on all information in the sample are:

β_t|T = β_t|t + K^*_t+1(β_t+1|T-c_t+1-F_t+1β_t|t)
Ω_t|T = Ω_t|t + K^*_t+1(Ω_t+1|T-Ω_t+1|t)K^*_t+1'

where K^*_t+1 = Ω_t|tF_t+1'Ω_t+1|t^-1. The smoothing is performed from t=T-1 down to t=1 with the initial values β_T|T and Ω_T|T obtained from the last iteration of the basic filter.

Applications

AR(p) Model
Y_t = δ + ρ₁Y_t-1 + ... + ρ_pY_t-p + ε_t
ε_t ~ nii(0,σ²)
- Measurement Equation: Y_t = Hβ_t + a + u_t ~ nii(0,R), or
  
  Y_t = [1 0 ... 0]
  
  ⌈ Y_t ⌉
  
  | Y_t-1 |
  
  | : |
  
  ⌉ Y_t-p+1 ⌋
  
  where a = 0, u_t = 0, and R = 0
- Transition Equation: β_t = Fβ_t-1 + c + v_t ~ nii(0,Q), or
  
  ⌈ Y_t ⌉
  
  | Y_t-1 |
  
  | : |
  
  ⌉ Y_t-p+1 ⌋
  
  =
  
  ⌈ ρ₁ ρ₂ ... ρ_p-1 ρ_p ⌉
  
  | 1 0 ... 0 0 |
  
  | : : : : : |
  
  ⌉ 0 0 ... 1 0 ⌋
  
  ⌈ Y_t-1 ⌉
  
  | Y_t-2 |
  
  | : |
  
  ⌉ Y_t-p ⌋
  
  +
  
  ⌈ δ ⌉
  
  | 0 |
  
  | : |
  
  ⌉ 0 ⌋
  
  +
  
  ⌈ ε_t ⌉
  
  | 0 |
  
  | : |
  
  ⌉ 0 ⌋
  
  where Q =
  
  ⌈ σ² 0 ... 0 ⌉
  
  | 0 0 ... 0 |
  
  | : : : : |
  
  ⌉ 0 0 ... 0 ⌋

MA(q) Model

Y_t = μ + ε_t - θ₁ε_t-1 - ... - θ_qε_t-q
ε_t ~ nii(0,σ²)

Measurement Equation: Y_t = Hβ_t + a + u_t ~ nii(0,R), or

Y_t = [1 -θ₁ ... -θ_q]

⌈ ε_t ⌉

| ε_t-1 |

| : |

⌉ ε_t-q ⌋

+ μ

where u_t = 0, and R = 0

Transition Equation: β_t = Fβ_t-1 + c + v_t ~ nii(0,Q), or

⌈ ε_t ⌉

| ε_t-1 |

| : |

⌉ ε_t-q ⌋

=

⌈ 0 0 ... 0 0 ⌉

| 1 0 ... 0 0 |

| : : : : : |

⌉ 0 0 ... 1 0 ⌋

⌈ ε_t-1 ⌉

| ε_t-2 |

| : |

⌉ ε_t-q-1 ⌋

+

⌈ 0 ⌉

| 0 |

| : |

⌉ 0 ⌋

+

⌈ ε_t ⌉

| 0 |

| : |

⌉ 0 ⌋

where Q =

⌈ σ² 0 ... 0 ⌉

| 0 0 ... 0 |

| : : : : |

⌉ 0 0 ... 0 ⌋

Time-Varying Parameters Model
Y_t = X_tβ_t + ε_t
ε_t ~ nii(0,σ²)
- Measurement Equation: Y_t = H_tβ_t + a + u_t ~ nii(0,R)
  where H_t = X_t, a = 0, u_t = ε_t, R = σ².
- Transition Equation: β_t = Fβ_t-1 + c + v_t ~ nii(0,Q)
  where F, c and Q may be defined according to a model specification.

Example 7

C-J. Kim and C. R. Nelson, "The Time-Varying-Parameter Model for Modeling Changing Conditional Variance: The case of the Lucas Hypothesis," Journal of Business and Economic Statistics, 1989, 433-440.

The State-Space Model Representation

Measurement Equation:
ΔM_t = β_0t + β_1tΔR_t-1 + β_2tΔP_t-1 + β_3tSURP_t-1 + β_4tΔM_t-1 + u_t
u_t ~ nii(0,σ²)
Transition Equation:
β_it = β_it-1 + v_it
v_it ~ nii(0,σ_i²) i = 0,1,...,4.

Data Description (Data)

ΔM = Quarterly M1 growth rate
ΔR = Change in 3-month T-bill interest rate
ΔP = Inflation rate as measured by the CPI
SURP = Detrended full employment budget surplus

Fixed Parameters

σ², σ₀², σ₁², σ₂², σ₃², σ₄².

Time-Varying Parameters

β_0t, β_1t, β_2t, β_3t, β_4t.

Last updated: 01/25/2016

where σ²_t	= α₀ + δ₁σ²_t-1 + ... + δ_pσ²_t-p + α₁ε²_t-1 + ... + α_qε²_t-q
	= α₀ + ∑_i=1,2,...pδ_iσ²_t-i + ∑_j=1,2,...qα_jε²_t-j

ll_t =	ln(v/λ) - (1+1/v)ln(2) - ln(Γ(1/v))
	- ½\|(Y_t-X_tβ)/(λσ_t)\|^v - ½ln(σ_t²)

I_t =	1 if A+B((Y_t-X_tβ)/σ_t) < 0
	0 if A+B((Y_t-X_tβ)/σ_t) ≥ 0

ll_t =	½ln(1+3s²-A²) + ln(C) - ½ln(σ_t²)
	- ½(d+1) ln{1+(1/(d-2))[(A+B((Y_t-X_tβ)/σ_t))/(1+s-2sI_t)]²}

Time Series Analysis III

Advanced Topics

Table of Contents

Readings

AR(1) Process

MA(1) Process

ARMA(1,1) Process

Two-Variable Transfer Function Model

The Model

Model Identification for ARCH and GARCH Processes

Model Estimation

GARCH(1,1) Models Based on Non-Normal Distributions

GARCH(1,1) Models with Asymmetry Behavior (Leverage Effect)

Multi-Equation Time Series Models

Extension: VAR(p)

Impulse Response Functions

Multivariate GARCH(1,1) Model

Model Representation

Kalman Fileter

Applications