Model misspecification due to violation of the calssical assumption of no serial correlation (Assumption 4) is considered here. In particular, the covariance between observation i and j:
Cov(εi,εj|Xi,Xj) = γ|i-j| ≠ 0, for i, j = 1,2,...,N, and i≠j.
For simplicity, we assume homoscedasticity as Var(εi|Xi) = σ2 = γ0, for i = 1,2,...,N.
Define the autocorrelation coefficient between observation i and j as
ρij = ρji = γ|i-j| / γ0.
Then, the variance-covariance matrix of ε is written as
Var(ε|X) = E(εε'|X) = σ2Ω = |
|
|
| = γ0 |
|
|
|
Yi = Xiβ + εi
εi = ρεi-1 + υi,
|ρ|<1, and i = 1,2,...,N.
Therefore, Yi = Xiβ + ρ(Yi-1-Xi-1β) + υi
We assume homoscedasticity as
Var(υi|Xi,Yi-1,Xi-1) = σu2 for i = 1,2,...,N.
Then, γ0 =
Var(εi|Xi) = σ2 =
σu2/(1-ρ2).
Furthermore,
γ|i-j| =
Cov(εi,εj|Xi,Xj) =
ρ|i-j|σ2.
That is, γ1 = ρσ2,
γ2 = ρ2σ2,
γ3 = ρ3σ2, ....
Therefore,
Var(ε|X) = E(εε'|X) = σu2/(1-ρ2) |
|
|
|
From the estimated model Y = Xb + e, we have:
b = (X'X)-1X'Y = β + (X'X)-1X'ε
e = Y-Xb = [I-X(X'X)-1X']ε
s2 = e'e/(N-K) = ε[I-X(X'X)-1X']ε.
E(b|X) = β, by Assumption 3.
But, in general E(s2) ≠ σ2
However, it can be shown that if b is consistent then s2 is a consistent estimator
of σ2:
If plim(b) = β, then plim(s2) = σ2.
Var(b|X) | = E[(b-β)(b-β)'|X] | |||||||||||||||||||||||||||||
= σ2(X'X)-1X'ΩX(X'X)-1, by assuming autocorrelation. | ||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||
= σ2(X'X)-1{∑i=1,...,N∑j=1,...,NρijXi'Xj}(X'X)-1 | ||||||||||||||||||||||||||||||
= (σ2/N)(X'X/N)-1{∑i=1,...,N∑j=1,...,NρijXi'Xj/N}(X'X/N)-1 |
Therefore, b ~a Normal(β,(σ2/N)Q-1Q*Q-1),
where Q = plim(X'X/N), and Q* = plim(X'ΩX/N) =
plim(∑i=1,...,N∑j=1,...,NρijXi'Xj/N).
S0 + ∑j=1,...,p(1-j/(p+1))(Sj+Sj')
where
S0 = ∑i=1,...,Nei2Xi'Xi/N, and
Sj = ∑i=j+1,...,Neiei-jXi'Xi-j/N.
Therefore the estimated heteroscedasticity-autocorrelation-consistent (robust) variance-covariance matrix of b is obtained by
Var(b|X) | = (1/N)(X'X/N)-1{S0 + ∑j=1,...,p(1-j/(p+1))(Sj+Sj')}(X'X/N)-1 |
= (X'X)-1{∑i=1,...,Nei2Xi'Xi + ∑j=1,...,p∑i=j+1,...,N (1-j/(p+1))[eiei-jXi'Xi-j+ei-jeiXi-j'Xi]}(X'X)-1 |
We note that both heteroscedasticity and autocorrelation are considered in the construction of the robust variance-covariance matrix. For aurocorrelated disturbance, we need to specify the number of dominated lags p. For convenience, this can be set as p ≈ N1/4.
AR(1): | εi = ρεi-1 + υi |
AR(2): | εi = ρ1εi-1 + ρ2εi-2 + υi |
: | : |
AR(p): | εi = ρ1εi-1 + ... + ρpεi-p + υi |
In the simplest case of AR(1), the estimator of autocorrelation coefficient ρ is
r = ∑i=2,...,Neiei-1 / ∑i=1,...,Nei2
where ei = Yi-Xib is the estimated error or residual.
The statistical significance of the estimated autocorrelation coefficients
is the basis for testing autocorrelation in the model specification:
H0: | ρ1 = ρ2 = ... = ρp = 0 |
H1: | not H0 |
From the least squares regression residual ei = Yi - Xib, we define
DW | = ∑i=2,...,N(ei-ei-1)2 / ∑i=1,...,Nei2 |
= {∑2=1,...,Nei2 -2∑i=2,...,Neiei-1 +∑i=2,...,Nei-12} / ∑i=1,...,Nei2 | |
≈ 2(1-r) |
Therefore, 0 < DW < 4, and
DW ≈ 2 | as r = 0 |
DW → 0 | as r → +1 |
DW → 4 | as r → -1 |
The DW statistic depends on N and K as well as the data X. There is no fixed distribution and critical values of DW can be used for testing the first-order autocorrelation. For hypothesis testing, it relies on two bound statistics which depend on N and K but not on the data X. Let
DWL = DWL(N,K) = critical value of lower bound of DW(N,K,X)
DWU = DWU(N,K) = critical value of upper bound of DW(N,K,X)
Consider the following three cases of hypothesis testing:
H0: ρ = 0 H1: ρ > 0 | 0 < DW < 2 If DW < DWL, ρ > 0 (H0 is rejected). If DW > DWU, ρ = 0 (H0 is not rejected). If DWL < DW < DWU, inconclusive! |
H0: ρ = 0 H1: ρ < 0 | 2 < DW < 4 If DW > 4-DWL, ρ < 0 (H0 is rejected). If DW < 4-DWU, ρ = 0 (H0 is not rejected). If 4-DWU < DW < 4-DWL, inconclusive! |
H0: ρ = 0 H1: ρ ≠ 0 | 0 < DW < 4 If DW < DWL or DW > 4-DWL, ρ1 ≠ 0 (H0 is rejected). If DWU < DW < 4-DWU, ρ1 = 0 (H0 is not rejected). If DWL < DW < DWU or 4-DWU < DW < 4-DWL, inconclusive! |
The shortcoming of the Durbin-Watson test is that there is an inconclusive region for the test, and it is lage if N is small. It can only be used to test the first-order autocorrelation. We warn that the Durbin-Watson test can not be applied to a model with lagged dependent variable.
From the least squares regression residual ei = Yi - Xib, we estimate the auxilary regression of the form:
ei = Xic + ρ1ei-1 + ... + ρpei-p + ui
and test the null hypothesis that all the ρs are jointly zero.
Breusch-Godfrey test statistic is based on the R-square of the auxilary regression equation as follows:
NR2 ~ χ2(p)
Let rj = ∑i=j+1,...,Neiei-j / ∑i=1,...,Nei2 be the estimator of j-th order autocorrelation coefficient.
Yi = Xiβ + εi
εi = ρεi-1 + υi
where υ is assumed to be a white noise (that is, individually independently distributed with mean zero and homoscedastic variance σu2). In general, the specification of autocorrelated disturbance must be known or consistently estimated before any correction procedures could be applied to the estimation of the regression model. Recall that for the first-order autocorrelation, we have
Var(ε|X) = E(εε'|X) = (σu2/(1-ρ2))Ω = σ2Ω
where Ω = |
|
|
|
It is easy to check that
Ω-1 = |
|
|
| = P'P, and |
P = |
|
|
|
Let
Y1* = (1-ρ2)½Y1,
Yi* = Yi-ρYi-1, i = 2,...,N;
X1* = (1-ρ2)½X1,
Xi* = Xi-ρXi-1, i = 2,...,N.
Then the least squares estimation with the transformaed data matices is a special case of the generalized least sqaures estimation (see Appendix for a review) as follows:
b*
To use Ω (or Ω-1 and P) with the GLS, ρ must be estimated.
Cochrane-Orcutt suggests a simple approach to drop the first observation of the data transformation, and
there is one less observations for model estimation.
Estimate the transformed model Y* = X*β + ε*
using ordinary least squares.
Thus, b*(1) = (X*'X*)-1X*'Y*.
Estimate the transformed model Y* = X*β + ε* using the ordinary least squares.
Select the optimal ρ, denoted by r(j), which corresponds to either the least squares of residuals or the maximum log-likelihood. The corresponding estimator of β is b*(j) = (X*'X*)-1X*'Y*.
The advantage of the Hildreth-Lu grid search procedure is that it is likely to find the global optimal solution, provided that the final search interval for the autocorrelation coefficient is made sufficiently small.
ei = r1ei-1 + ... + rpei-p + υi
where ei = Yi - Xib is the residual of the regression model.
Based on Cochrane-Orcutt procedure, the first p observations of data series are dropped. Least squares estimation is carried out using the transformed data as:
Yi* = Yi-r1Yi-1-...-rpYi-p,
i = p+1,...,N;
Xi* = Xi-r1Xi-1-...-rpXi-p,
i = p+1,...,N.
The iterations will continue until a convergent (local) solution (b*,r*) of (β,ρ) is found, where β = (β1,...,βK)' and ρ = (ρ1,...,ρp)'.
For a time series model, we first test and then correct a specific AR(p) structure with the transformed data as:
Y* = X*β + ε*
Based on the estimated errors or residuals e*, Breusch-Pagan test for heteroscedasticity is performed. If needed, correct for heterscedasticity using weighted least squares. One possible choice of the weight used is the (square-rooted) inverted fitted value of the estimated variance equation in the Breusch-Pagan test procedure.
Yi = Xiβ + εi,
εi = iid(0,σi2), and
σi2 =
E(εi2|εi-1,εi-2,...)
= α0 + α1εi-12 +
α2εi-22 + ...
The simplest case of the first-order autoregressive conditional heteroscedasticity or ARCH(1) is:
σi2 = α0 + α1εi-12.
This can be re-written as:
εi2 = α0 + α1εi-12 + vi, where vi = εi2-σi2.
ARCH(1) process is simply the AR(1) on the squares of regression residuals. A more general time series model may include autoregressive structures in the mean (i.e., residuals) and in the variance (i.e., squares of residuals). For many financial applications, the former structure is useful to study the rate of returns, while the latter is applicable for volatility analysis.
A generalization of ARCH(1) (or GARCH(1,1)) is defined by:
σi2 = α0 + α1εi-12 + δ1σi-12.
Suppose the symmetric positive definite matrix Ω is known.
There exists P (the "square root" matrix) such that Ω-1 = P'P, or
Ω = P-1P-1'.
Let Y* = PY, X* = PX, and ε* =
Pε, then the transformed linear regression model is:
Y* = X*β + ε*, and
E(ε*|X*) = 0
Var(ε*|X*) = PVar(ε|X)P' =
σ2PΩP' = σ2I
Since the classical assumptions for the transformed linear regression model are satisfied, least squares estimation is applied to minimize sum-of-squared transformed errors ε*'ε*:
b* | = (X*'X*)-1X*'Y* |
= β + (X*'X*)-1X*'ε* | |
= β + (X'Ω-1X)-1X'Ω-1ε |
E(b*|X) = β
Var(b*|X) | = E[(b*-β)(b*-β)'] |
= σ2(X*'X*)-1 | |
= σ2(X'Ω-1X)-1 |
Therefore, b* ~a Normal(β,s*2(X'Ω-1X)-1).
Statistical inferences must be based on the generalized least squares estimator b*, provided that the covariance structure Ω is known. If Ω is not known, then it must be estimated. If Ω can be estimated consistently, in large sample, then the generalized least squares estimator b* is consistent and asymptotically efficient.