or
Yt = Xtβ + ρ(B)-1θ(B)ut
or
ρ(B)Yt = ρ(B)Xtβ + θ(B)ut where ut ~ nii(0,σ2).
We assume |ρ| < 1 for model stability. It is clear that
σ2 = Var(ut) = (1-ρ2) Var(εt).
Denote the variable transformations Yt* = Yt - ρ Yt-1 and Xt* = Xt - ρ Xt-1. Since u1 = (1-ρ2)½ ε1, the otherwise lost first observation is kept with the transformations Y1* = (1-ρ2)½Y1 and X1* = (1-ρ2)½X1.
Thus model for estimation is
ut = Yt* - Xt*β
with the following Jacobian transformation from ut to Yt (depending on ρ only):
Jt(ρ) = |∂ut / ∂Yt| = | (1-ρ2)½ for t=1 | |
1 | for t>1 |
Therefore, the (exact) concentrated log-likelihood function is:
ll*(β,ρ|Y,X) = -½N (1+ln(2π)-ln(N)) +½ ln(1-ρ2) -½N ln(u'u)
Extension: AR(2)
The model is defined as εt = ρ1εt-1 + ρ2εt-2 + ut with the following proper data transformation (Z is referenced as either X or Y below):
Pre-Sample Data Initialization
The alternative pre-sample data initialization may be used to transform the time series:
Again, we assume |θ| < 1 for model stability. The model is
ut = Yt - Xtβ + θut-1
Notice that the one-period lag of error terms, ut-1, is used to define the model error ut. A recursive calculation is needed with proper initialization of u0. For example, set the initial value u0 = E(ut) = 0 (or alternatively the sample mean of ut), then u1 = Y1-X1β and ut = Yt-Xtβ + θut-1 for t=2,...,N.
Since each log-jacobian terms vanish in this case, the (conditional) concentrated log-likelihood function is simply
ll*(β,θ|Y,X) = -½N (1+ln(2π)-ln(N)) -½N ln(u'u)
This is the mixed process of AR(1) and MA(1). Using the variable transformations as of AR(1) and data initialization as of MA(1), the model is written as
ut = Yt* - Xt*β + θ ut-1
and the (conditional) concentrated log-likelihood function for parameter estimation is
ll*(β,ρ,θ|Y,X) = -½N (1+ln(2π)-ln(N)) +½ ln(1-ρ2) -½N ln(u'u)
where β(B) = β0 + β1B + β2B2 + ... + βKBK. Model analysis including model identification, estimation, and forecasting is the same as (although more complicate than) the univariate ARMA analysis. Regression parameters βs and ARMA parameters ρs and θs must be simultaneously estimated through iterations of nonlinear functional (sum-of-squares or log-likelihood) optimization. For statistical reference, the degrees of freedom must be adjusted.
Consider the time series regression model:
Yt = Xtβ + εt
At time t, conditional to the available historical information Ht, we assume that the error structure follows a normal distribution:
εt|Ht ~ n(0,σ2t)
where σ2t | = α0 + δ1σ2t-1 + ... + δpσ2t-p + α1ε2t-1 + ... + αqε2t-q |
= α0 + ∑i=1,2,...pδiσ2t-i + ∑j=1,2,...qαjε2t-j |
Let υt = ε2t-σ2t, αi = 0 for i > q, δj = 0 for j > p, and m = max(p,q), the above GARCH(p,q) process may be conveniently re-written as an ARMA(m,p) model for ε2t. That is,
ε2t = α0 + ∑i=1,2,...m (αi+δi)ε2t-i - ∑j=1,2,...pδjυt-j + υt
This is the general specification of auto-regressive conditional heteroscedasticity, or GARCH(p,q), according to Bollerslev [1986]. If p = 0, then it is the GARCH(0,q) or simply ARCH(q) process:
σ2t = α0 + ∑j=1,2,...qαjε2t-j
ARCH(1) Process
The simplest case is q = 1, or ARCH(1), originated in Engle [1982] as follows:
σ2t = α0 + α1ε2t-1
ARCH(1) model can be summarized as follows:
Yt = Xtβ + εt
εt = ut(α0 + α1ε2t-1)½
where ut ~ nii(0,1)
Then, the conditional means E(εt|εt-1) = 0 and the conditional variances σ2t = E(ε2t|εt-1) = α0 + α1ε2t-1
Note that the unconditional variance of εt is
E(ε2t) = E(E(ε2t|εt-1)) = α0 + α1E(ε2t-1).
If σ2 = E(ε2t) = E(ε2t-1), then σ2 = α0/(1-α1) provided that |α1| < 1. Therefore, the model may be free of general heteroscedasticity although the conditional heteroscedasticity is assumed.
The ARCH(1) process can be generalized (therefore the name Generalized Auto-Regressive Conditional Heteroscedasticity) to:
GARCH(1,1) Process
σ2t = α0 + α1 ε2t-1 + δ1 σ2t-1
This resembles the mixed auto-regressive moving-average process ARMA(1,1) as described in autocorrelation. Presample variances and squared error terms can be initialized with ∑t=1,2,...,N ε2t/N. The following parameter restrictions are necessary to preserve stationaity of the error process:
Another extension is ARCH or GARCH in mean (ARCH-M or GARCH-M model) which adds the heteroscedastic variance term directly into the regression equation (assuming linear model):
ARCH-M(1) or GARCH-M(1,1) Model
εt = Yt - Xtβ - γσ2t
σ2t = α0 + α1 ε2t-1 (or σ2t = α0 + α1 ε2t-1 + δ1 σ2t-1)
The last variance term of the regression may be expressed in log form or in standard error σt. For example, Yt = Xtβ + γln(σ2t) + εt. Moreover, constraints on the parameters in the conditional variance equation may be required to ensure the positivity of variances: α0 > 0, 0 ≤ α1 < 1 (or α1 + δ1 < 1, δ1 ≥ 0).
ll = -½N ln(2π) - ½ ∑t=1,2,...,Nln(σ2t) - ½ ∑t=1,2,...,N(ε2t / σ2t)
with the general conditional heteroscedastic variance GARCH(p,q) process:
σ2t = α0 + α1ε2t-1 + α2ε2t-2 + ... + αqε2t-q + δ1σ2t-1 + δ2σ2t-2 + ... + δpσ2t-p
The parameter vector (α δ) is estimated together with the regression parameters (e.g., εt = Yt - Xtβ) by maximizing the log-likelihood, conditional to the starting values ε02, ε2-1, ..., ε2-q, σ20, σ2-1, ..., σ2-p and satisfying the nonnegativity requirement for the estimated variances: σ2t > 0, t=1,2,...,N.
We note that the presample series: ε02, ε2-1, ..., ε2-q, σ20, σ2-1, ..., σ2-p may be initialized by the estimated (homoschedastic) unconditional variance:
1 / [1 - (∑i=1,2,...,qαi + ∑j=1,2,...,pδj)]
or by the estimated sample variance of residuals:
∑t=1,2,...,Nε2t/N,
Yt = Xtβ + εt,
εt = σtut
σt2 = α0 +
α1εt-12 +
δ1σt-12
Generalized Exponential Distribution (GED) (Nelson, 1991)
ut ~ GED(v) with zero mean and unit variance, v is the thickness of tails for the underlying GED. If v > 2 the distribution has thinner tails than normal. If v < 2 the distribution has thicker tails than normal.
The p.d.f of GED(v) is written as:
f(ut) = |
|
|
where λ = |
|
Therefore,
f(εt) = f(Yt) = |
|
|
The component log-likelihood function for each observation is:
llt = | ln(v/λ) - (1+1/v)ln(2) - ln(Γ(1/v)) |
- ½|(Yt-Xtβ)/(λσt)|v - ½ln(σt2) |
Student t-Distribution (Bollerslev, 1987)
ut ~ t(d), d > 2 is the degree of freedom of the underlying Student t distribution. The p.d.f of Student t distribution (normalized with zero mean and unit variance) is written as:
f(ut) = | C |
| 1 + |
|
Therefore,
f(εt) = f(Yt) = |
|
| 1 + |
|
|
|
|
|
where
C = |
|
The component log-likelihood function for each observation is:
llt = ln(C) - ½ln(σt2) - ½(d+1)ln[1+(1/(d-2))((Yt-Xtβ)/σt))2]
where ln(C) = ln(Γ((d+1)/2)) - ln(Γ(d/2)) - ½ln(d-2) - ½ln(π)
Skewed Student t-Distribution (Hansen, 1994)
ut ~ t(d,s), d > 2 is the degree of freedom and -1< s < 1 is the skewedness of the underlying Skewed Student t-distribution. It specializes to the Student t-distribution by setting s = 0.
The p.d.f of Skewed Student t distribution (normalized with zero mean and unit variance) is written as:
f(ut) = |
|
where
It = | 1 if A+But < 0 |
0 if A+But ≥ 0 |
or, equivalently
vt = 1+s-2sIt = | 1-s if A+But < 0 |
1+s if A+But ≥ 0 |
Therefore,
f(εt) = f(Yt) = |
|
where
It = | 1 if A+B((Yt-Xtβ)/σt) < 0 |
0 if A+B((Yt-Xtβ)/σt) ≥ 0 |
C = |
| A = 4sC[(d-2)/(d-1)] | B2 = 1 + 3s2 - A2 |
The component log-likelihood function for each observation is:
llt = | ½ln(1+3s2-A2) + ln(C) - ½ln(σt2) |
- ½(d+1) ln{1+(1/(d-2))[(A+B((Yt-Xtβ)/σt))/(1+s-2sIt)]2} |
Yt = Xtβ + εt
εt = σtut
GJR Specification (Glosten-Jagannathan-Runkle, 1993)
σt2 = α0 + α1εt-12 + δ1σt-12 + γ1(εt-12Dt-1)
where Dt-1 = | 1 if εt-1 > 0 |
0 otherwise |
The parameter γ1 < 0 is sometimes referred as the Leverage Effect. The non-negativity of σt2 is satisfied provided that α0 > 0, δ1 ≥ 0 α1+γ1 ≥ 0.
The asymmetric consquences of positive and negative innovations in the GARCH models can be studied based on various distributional assumptions (e.g., normal, t, GED) as described above.
EGARCH Specification (Nelson, 1991)
ln(σt2) = α0 + δ1ln(σt-12) + α1[γ1ut-1 + (|ut-1| - E|ut-1|)]
where ut = εt/σt is independently distributed with zero mean and unit variance. The parameter of ut-1, or α1γ1 < 0, is interpreted as the Leverage Effect. We note that the parameter of |ut-1| or α1 > 0 measures the symmetric effect while α1γ1 is the Leverage. The advantage of the Nelson's specification of the variance equation is that log of σt2 is used, then the estimated σt2 is positive no matter what is the sign of the estimated parameters.
Nelson's EGARCH(1,1) model assumes ut ~ GED(v) in which E(ut) = 0 and Var(ut) = 1. Furthermore,
E(|ut|) = |
| -> (2/π)1/2 as v -> 2 (normal distribution) |
We note that λ = |
|
Aternatively, normal EGARCH(1,1) assumes ut ~ Normal(0,1). Then the conditional variance equation is simply
ln(σt2) = α0 + δ1ln(σt-12) + α1[γ1ut-1 + |ut-1| - (2/π)1/2]
Yt = μ + ρYt-1 + εt
the mutivariate system of G variables can be written as follows:
Yit = μi + ∑j=1,2,...,G ρijYj,t-1 + εit (i=1,2,...,G)
This is called Vector Autocorrelation of order 1, or VAR(1). The matrix representation of the model as a simultaneous linear equations system looks like this:
[Y1t,Y2t,...,YGt] = [μ1,μ2,...,μG] + [Y1,t-1,Y2,t-1,...,YG,t-1] |
|
|
| + | [ε1,ε2,...,εG] |
|
|
| = |
|
|
| + |
|
|
|
|
|
| + |
|
|
|
In a shorthand notation,
Yt = μ + ρ Yt-1 + εt
Yt = μ + ρ1Yt-1 +
ρ2Yt-2 + ... +ρpYt-p +
εt
Yt-1 = Yt-1
Yt-2 = Yt-2
:
Yt-p+1 = Yt-p+1
Or,
|
|
| = |
|
|
| + |
|
|
|
|
|
| + |
|
|
|
That is,
Yt = μ + ρ Yt-1 + εt
This is a system of p equations with restricted parameters matrix. The usable time series observations are from p+1 to N (N-p in total).
Similarly, for the multivariate VAR(p) system, the model can be expressed in terms of the stacked G endogenous variables. Therefore, Yt, Yt-1, ..., and Yt-p are Gx1 vectors. The size of the problem is (N-p)Gp. Then the parameter matrix ρ of the lag variable Yt-1 is
ρ = |
|
|
|
where, for each k = 1,2,...,p, ρk = [ρij,k (i,j=1,2...,G)]. Furthermore, I is GxG identity matrix, and 0 is GxG zeros matrix.
[I-ρ(B)]Yt = μ + εt
where B is the backshift operator. Then,
Yt = [I-ρ]-1μ + ∑i=0,2...,∞ ρiεt-i
= Y* + (εt + ρ1εt-1 + ρ2εt-2 + ...)
Y* is the equilibrium and εt is the innovation. By shocking one element of εt, says εjt, Yt will move away from the equilibrium Y*. Note that the effect of Yt due to change of εjt is not just on the jth variable alone but also on other variables in the system. The path whereby the variables returns to equilibirum is called the Impulse Responses of a stable VAR system. The Impulse Response Function traces the effects of a one-time innovation εjt on the k-th variable over time (i=0,1,2,...) as ρikj (k,j = 1,2,...,G).
|
|
| = |
|
|
| + |
|
|
|
|
|
| + |
|
|
|
|
|
|
The above VEC representation of the model consists of many unknown parameters must be estimated (21 for the 2-variable case). Some parameter restrictions are necessary to ensure the positivity of the conditional varianes and to achieve the parsimony of the model. A diagonalied version of VEC model assumes αij = γij = 0 for i≠j. This is the so-called VECH model (9 parameters for the 2-variable case):
σ1,t2 = δ1 + α11ε1,t-12
+ γ11σ1,t-12
σ2,t2 = δ2 + α22ε2,t-12
+ γ22σ2,t-12
σ12,t = δ3 + α33ε1,t-1ε2,t-1
+ γ33σ12,t-1
A popular BEK model of Engle and Kroner (1995) ensures that the conditional variances are positive as follows:
Ht = C'C + A'εt-1εt-1'A + B'Ht-1B
where for the 2-variable case,
Ht = |
|
|
| εt-1 = |
|
|
| C = |
|
|
| A = |
|
|
| B = |
|
|
|
In general, σij,t2 depends on the squared residuals, cross products of the residuals, and the conditional variances and covariances of all variables in the system. The model is difficult to estimate. If we define the conditional correlation as
ρij,t = σij,t/(σii,tσjj,t)½
Then the Constant Conditional Correlation (CCC) Model (see Bollerslev, 1990) is defined by
ρij,t = ρij for all t.
In a sense, the CCC model is a compromise in that the variance terms need not be diagonalized, but the covariance terms are always proportional to (σii,tσjj,t)½. Hence, the covariance equation entails only one parameter. In particular, for the 2-variable VECH case,
σ1,t2 = δ1 + α11ε1,t-12
+ γ11σ1,t-12
σ2,t2 = δ2 + α22ε2,t-12
+ γ22σ2,t-12
σ12,t = ρ12(σii,tσjj,t)½
A more general Dynamic Conditional Correlation (DCC) model is to allow for time-varying conditional correlation ρij,t. Let the variance-covariance matrix Ht = DtRtDt', where Dt is the diagonal matrix of conditional standard errors σi,t, and Rt is the matrix of conditional correlation ρij,t. The complete time series regression model of K variables can be described as:
Yt = Xtβ + εt
εt = Htut, ut ~ Normal(0,I)
Ht = DtRtDt',
Dt = Diag[σi,t], Rt = [ρij,t]
The model is estimated by the maximum likelihood method, with the log-likelihood function defined by
L = -½ ∑t [Kln(2π) + ln(|Ht|) + εtHt-1εt]
For more information, see
Yt = Htβt + at + ut
where Ht is an nxk matrix and at is an nx1 vector, which may be either data on exogenous variables or constant paramters. That is, given the exogenous or predetermined observed variables Xt, we may define Ht = H(Xt) and at = a(Xt).
We assume ut ~ nii(0nx1,Rnxn). Note that the covariance matrix R may also depend on Xt.
βt = ct + Ftβt-1 + vt
where Ft is an kxk matrix and ct is an kx1 vector.
We assume vt ~ nii(0kx1,Qkxk) and Cov(ut,vs) = E(utvs') = 0nxk. Note that ct = c(Xt), Ft = F(Xt), and the covariance matrix Q may depend on Xt.
Conditional to the information available at time t-1, the expected value of βt is Et-1(βt) = ct + FtEt-1(βt-1). Similarly, the conditional covariance is Vart-1(βt) = FtVart-1(βt-1)Ft' + Q. For notational convenience, let βt|t-1 = Et-1(βt) and Ωt|t-1 = Vart-1(βt). Then,
βt|t-1 = ct + Ftβt-1|t-1
Ωt|t-1 = FtΩt-1|t-1Ft' + Q
Combining the measurement and transition equations, we have
Yt = (HtFt)βt-1 + (Htct+at) + (Htvt+ut)
Given the information at time t-1, the conditional expectation and covariance of Yt are:
Yt|t-1 = Et-1(Yt) =
Htβt|t-1 + at
Σt|t-1 = Vart-1(Yt) =
HtΩt|t-1Ht' + R
Since Yt is distributed according to normal(Yt|t-1,Σt|t-1), the log-likelihood is evaluated as:
llt = - ½ ln(2πΣt|t-1) - ½ (Yt-Yt|t-1)'Σt|t-1-1(Yt-Yt|t-1)
βt|t-1 = ct + Ftβt-1|t-1
Ωt|t-1 = FtΩt-1|t-1Ft' + Q
Define the prediction error εt|t-1 = Yt - Yt|t-1. Then
εt|t-1 = Yt -
Htβt|t-1 - at
Σt|t-1 =
HtΩt|t-1Ht' + R
Then the log-likelihood is defined by
llt = - ½ ln(2πΣt|t-1) - ½ εt|t-1'Σt|t-1-1εt|t-1
βt|t = βt|t-1 +
Ktεt|t-1
Ωt|t = Ωt|t-1 -
KtHtΩt|t-1
where Kt = Ωt|t-1Ht'Σt|t-1-1 is the Kalman gain.
β0|0 = (I-F)-1c
vec(Ω0|0) = (I-F⊗F)-1vec(Q)
If βt is nonstationary, then we can use a wild guess of β0|0 (e.g. zeros vector) with large diagonal elements in the covariance matrix Ω0|0. In this case, the evaluation of log-likelihood and inference should not include the first few observations of the guess values.
As a by product of maximum likelihood estimation, we obtain the estimated (updated) parameter vector and the corresponding covariance matrix at time t: βt|t and Ωt|t, for t=1,...,T. For a better inference, the smoothed parameter vector and the corresponding covariance matrix based on all information in the sample are:
βt|T = βt|t +
K*t+1(βt+1|T-ct+1-Ft+1βt|t)
Ωt|T = Ωt|t +
K*t+1(Ωt+1|T-Ωt+1|t)K*t+1'
where K*t+1 = Ωt|tFt+1'Ωt+1|t-1. The smoothing is performed from t=T-1 down to t=1 with the initial values βT|T and ΩT|T obtained from the last iteration of the basic filter.
Yt = δ +
ρ1Yt-1 + ... +
ρpYt-p + εt
εt ~ nii(0,σ2)
Yt = [1 0 ... 0] |
|
where a = 0, ut = 0, and R = 0
| = |
|
| + |
| + |
|
where Q = |
|
Yt = μ + εt -
θ1εt-1 - ... -
θqεt-q
εt ~ nii(0,σ2)
Yt = [1 -θ1 ... -θq] |
| + μ |
where ut = 0, and R = 0
| = |
|
| + |
| + |
|
where Q = |
|
Yt = Xtβt + εt
εt ~ nii(0,σ2)
where Ht = Xt, a = 0, ut = εt, R = σ2.
where F, c and Q may be defined according to a model specification.
The State-Space Model Representation
ΔMt = β0t + β1tΔRt-1 + β2tΔPt-1 + β3tSURPt-1 + β4tΔMt-1 + ut
ut ~ nii(0,σ2)
βit = βit-1 + vit
vit ~ nii(0,σi2) i = 0,1,...,4.
Data Description (Data)
ΔM = Quarterly M1 growth rate
ΔR = Change in 3-month T-bill interest rate
ΔP = Inflation rate as measured by the CPI
SURP = Detrended full employment budget surplus
Fixed Parameters
σ2, σ02, σ12, σ22, σ32, σ42.
Time-Varying Parameters
β0t, β1t, β2t, β3t, β4t.