Autocorrelation

Introduction

Recall the basic assumptions for least squares model estimation:

Linearity: Y = Xβ + ε
Full Rank Condition:
X may be fixed or random variables, and rank(X) = K.
Exogeneity:
1. Strict Exogeneity: E(ε|X) = 0
  That is, E(ε_i|X) = 0, i=1,2,...,N.
2. E(ε_i|X_i) = 0,
  E(X_iε_i|X_i) = 0, i=1,2,...,N.
Spherical Disturbance:
1. Var(ε|X) = E(εε'|X) = σ²I. That is,
  Var(ε_i|X) = σ²,
  Cov(ε_i,ε_j|X) = 0, i≠j, i,j=1,2,...,N.
2. Var(ε_i|X_i) = σ²,
  Cov(ε_i,ε_j|X_i,X_j) = 0, i≠j, i,j=1,2,...,N.
Normality: ε|X ~ Normal(0,σ²I)

Model misspecification due to violation of the calssical assumption of no serial correlation (Assumption 4) is considered here. In particular, the covariance between observation i and j:

Cov(ε_i,ε_j|X_i,X_j) = γ_|i-j| ≠ 0, for i, j = 1,2,...,N, and i≠j.

For simplicity, we assume homoscedasticity as Var(ε_i|X_i) = σ² = γ₀, for i = 1,2,...,N.

Define the autocorrelation coefficient between observation i and j as

ρ_ij = ρ_ji = γ_|i-j| / γ₀.

Then, the variance-covariance matrix of ε is written as

Var(ε|X) = E(εε'|X) = σ²Ω =

⌈

|

|

⌊

γ₀ γ₁ ... γ_N-1

γ₁ γ₀ ... γ_N-2

: : : :

γ_N-1 γ_N-2 ... γ₀

⌉

|

|

⌋

= γ₀

⌈

|

|

⌊

1 ρ₁₂ ... ρ_1N

ρ₂₁ 1 ... ρ_2N

: : : :

ρ_N1 ρ_N2 ... 1

⌉

|

|

⌋

Example

For example, we consider the simplest case of first order serial correlation or autocorrelation as follows:

Y_i = X_iβ + ε_i
ε_i = ρε_i-1 + υ_i, |ρ|<1, and i = 1,2,...,N.

Therefore, Y_i = X_iβ + ρ(Y_i-1-X_i-1β) + υ_i

Therefore,

Least Squares Estimation

By ignoring serial correlation in the ordinary least squares estimation, the parameter estimators are inefficient, although they are unbiased, consistent, and asymptotically normal distributed.

From the estimated model Y = Xb + e, we have:
b = (X'X)^-1X'Y = β + (X'X)^-1X'ε
e = Y-Xb = [I-X(X'X)^-1X']ε
s² = e'e/(N-K) = ε[I-X(X'X)^-1X']ε.

E(b|X) = β, by Assumption 3. But, in general E(s²) ≠ σ²
However, it can be shown that if b is consistent then s² is a consistent estimator of σ²: If plim(b) = β, then plim(s²) = σ².

Var(b|X) = E[(b-β)(b-β)'|X]

= σ²(X'X)^-1X'ΩX(X'X)^-1, by assuming autocorrelation.

= σ²(X'X)^-1X'

⌈

|

|

⌊

1 ρ₁₂ ... ρ_1N

ρ₂₁ 1 ... ρ_2N

: : : :

ρ_N1 ρ_N2 ... 1

⌉

|

|

⌋

X(X'X)^-1

= σ²(X'X)^-1{∑_i=1,...,N∑_j=1,...,Nρ_ijX_i'X_j}(X'X)^-1

= (σ²/N)(X'X/N)^-1{∑_i=1,...,N∑_j=1,...,Nρ_ijX_i'X_j/N}(X'X/N)^-1

Therefore, b ~^a Normal(β,(σ²/N)Q^-1Q^*Q^-1),
where Q = plim(X'X/N), and Q^* = plim(X'ΩX/N) = plim(∑_i=1,...,N∑_j=1,...,Nρ_ijX_i'X_j/N).

Heteroscedasticity-Autocorrelation-Consistent Variance-Covariance Matrix

σ²Q^* = σ² ∑_i=1,...,N∑_j=1,...,N ρ_ijX_i'X_j/N may be consistently estimated by ∑_i=1,...,N∑_j=1,...,N e_ie_jX_i'X_j/N, where e_i = Y_i-X_ib is the estimated least squares residual. However, the later matrix needs not to be positive definite. Parallel to the White estimator for heteroscedasticity, Newey-West estimator of σ²Q^* for autocorrelated disturbance with an unspecified structure is

S₀ + ∑_j=1,...,p(1-j/(p+1))(S_j+S_j')

where
S₀ = ∑_i=1,...,Ne_i²X_i'X_i/N, and
S_j = ∑_i=j+1,...,Ne_ie_i-jX_i'X_i-j/N.

Therefore the estimated heteroscedasticity-autocorrelation-consistent (robust) variance-covariance matrix of b is obtained by

Var(b|X) = (1/N)(X'X/N)^-1{S₀ + ∑_j=1,...,p(1-j/(p+1))(S_j+S_j')}(X'X/N)^-1

= (X'X)^-1{∑_i=1,...,Ne_i²X_i'X_i + ∑_j=1,...,p∑_i=j+1,...,N (1-j/(p+1))[e_ie_i-jX_i'X_i-j+e_i-je_iX_i-j'X_i]}(X'X)^-1

We note that both heteroscedasticity and autocorrelation are considered in the construction of the robust variance-covariance matrix. For aurocorrelated disturbance, we need to specify the number of dominated lags p. For convenience, this can be set as p ≈ N^1/4.

Hypothesis Testing for Autocorrelation

Given the linear model Y_i = X_iβ + ε_i with autocorrelated disturbance:

AR(1):	ε_i = ρε_i-1 + υ_i
AR(2):	ε_i = ρ₁ε_i-1 + ρ₂ε_i-2 + υ_i
:	:
AR(p):	ε_i = ρ₁ε_i-1 + ... + ρ_pε_i-p + υ_i

In the simplest case of AR(1), the estimator of autocorrelation coefficient ρ is

r = ∑_i=2,...,Ne_ie_i-1 / ∑_i=1,...,Ne_i²

where e_i = Y_i-X_ib is the estimated error or residual.
The statistical significance of the estimated autocorrelation coefficients is the basis for testing autocorrelation in the model specification:

H₀: ρ₁ = ρ₂ = ... = ρ_p = 0

H₁: not H₀

Durbin-Watson Bounds Test for First-Order Autocorrelation

From the least squares regression residual e_i = Y_i - X_ib, we define

DW = ∑_i=2,...,N(e_i-e_i-1)² / ∑_i=1,...,Ne_i²

= {∑_2=1,...,Ne_i² -2∑_i=2,...,Ne_ie_i-1 +∑_i=2,...,Ne_i-1²} / ∑_i=1,...,Ne_i²

≈ 2(1-r)

This is because ∑_i=1,...,Ne_i² ≈ ∑_i=2,...,Ne_i² ≈ ∑_i=2,...,Ne_i-1², and
r = ∑_i=2,...,Ne_ie_i-1 / ∑_i=1,...,Ne_i²

Therefore, 0 < DW < 4, and

DW ≈ 2 as r = 0

DW → 0 as r → +1

DW → 4 as r → -1

The DW statistic depends on N and K as well as the data X. There is no fixed distribution and critical values of DW can be used for testing the first-order autocorrelation. For hypothesis testing, it relies on two bound statistics which depend on N and K but not on the data X. Let

DW_L = DW_L(N,K) = critical value of lower bound of DW(N,K,X)
DW_U = DW_U(N,K) = critical value of upper bound of DW(N,K,X)

Consider the following three cases of hypothesis testing:

H₀: ρ = 0
H₁: ρ > 0 0 < DW < 2
If DW < DW_L, ρ > 0 (H₀ is rejected).
If DW > DW_U, ρ = 0 (H₀ is not rejected).
If DW_L < DW < DW_U, inconclusive!

H₀: ρ = 0
H₁: ρ < 0 2 < DW < 4
If DW > 4-DW_L, ρ < 0 (H₀ is rejected).
If DW < 4-DW_U, ρ = 0 (H₀ is not rejected).
If 4-DW_U < DW < 4-DW_L, inconclusive!

H₀: ρ = 0
H₁: ρ ≠ 0 0 < DW < 4
If DW < DW_L or DW > 4-DW_L, ρ₁ ≠ 0 (H₀ is rejected).
If DW_U < DW < 4-DW_U, ρ₁ = 0 (H₀ is not rejected).
If DW_L < DW < DW_U or 4-DW_U < DW < 4-DW_L, inconclusive!

The shortcoming of the Durbin-Watson test is that there is an inconclusive region for the test, and it is lage if N is small. It can only be used to test the first-order autocorrelation. We warn that the Durbin-Watson test can not be applied to a model with lagged dependent variable.

Breusch-Godfrey LM Test for Autocorrelation

From the least squares regression residual e_i = Y_i - X_ib, we estimate the auxilary regression of the form:

e_i = X_ic + ρ₁e_i-1 + ... + ρ_pe_i-p + u_i

and test the null hypothesis that all the ρs are jointly zero.
Breusch-Godfrey test statistic is based on the R-square of the auxilary regression equation as follows:

NR² ~ χ²(p)

Box-Pierce and Ljung-Box Q Tests for Autocorrelation

Let r_j = ∑_i=j+1,...,Ne_ie_i-j / ∑_i=1,...,Ne_i² be the estimator of j-th order autocorrelation coefficient.

Box-Pierce Test Statistic:
Q = N ∑_j=1,...,pr_j² ~ χ²(p)
Ljung-Box Test Statistic:
Q' = N(N+2) ∑_j=1,...,pr_j²/(N-j) ~ χ²(p)

Correction for Autocorrelation

Consider the simplest case of autocorrelated disturbance AR(1). The model of interest is:

Y_i = X_iβ + ε_i
ε_i = ρε_i-1 + υ_i

where υ is assumed to be a white noise (that is, individually independently distributed with mean zero and homoscedastic variance σ_u²). In general, the specification of autocorrelated disturbance must be known or consistently estimated before any correction procedures could be applied to the estimation of the regression model. Recall that for the first-order autocorrelation, we have

It is easy to check that

Ω^-1 =

⌈

|

|

|

|

⌊

1 -ρ 0 0 ... 0 0

-ρ 1+ρ² -ρ 0 ... 0 0

0 -ρ 1+ρ² -ρ ... 0 0

: : : : : : :

0 0 0 0 ... 1+ρ² -ρ

0 0 0 0 ... -ρ 1

⌉

|

|

|

|

⌋

= P'P, and

P =

⌈

|

|

|

⌊

(1-ρ²)^½ 0 0 ... 0 0

-ρ 1 0 ... 0 0

0 -ρ 1 ... 0 0

: : : : : :

0 0 0 ... -ρ 1

⌉

|

|

|

⌋

Let Y₁^* = (1-ρ²)^½Y₁, Y_i^* = Y_i-ρY_i-1, i = 2,...,N;
X₁^* = (1-ρ²)^½X₁, X_i^* = X_i-ρX_i-1, i = 2,...,N.

Then the least squares estimation with the transformaed data matices is a special case of the generalized least sqaures estimation (see Appendix for a review) as follows:

b^*= (X^*'X^*)^-1X^*'Y^* = (X'Ω^-1X)^-1X'Ω^-1Y
e^* = Y^* - X^*b^*
s^*2 = e^*'e^*/(N-K)
Var(b^*|X)= s^*2(X^*'X^*) = s^*2(X'Ω^-1X)^-1

To use Ω (or Ω^-1 and P) with the GLS, ρ must be estimated.

Prais-Winsten/Cochrane-Orcutt Iterative Procedure

From the least sqaures residuals e_i = Y_i - X_ib, i = 1,2,...,N,
compute the estimator of autocorrelation coefficient:
r⁽¹⁾ = ∑_i=2,...,Ne_ie_i-1 / ∑_i=1,...,Ne_i²
According to Prais-Whinsten, transform data of Y and X as:
Y₁^* = (1-r⁽¹⁾²)^½Y₁, Y_i^* = Y_i-r⁽¹⁾Y_i-1, i = 2,...,N;
X₁^* = (1-r⁽¹⁾²)^½X₁, X_i^* = X_i-r⁽¹⁾X_i-1, i = 2,...,N.
In addition,
ε₁^* = (1-ρ²)^½ε₁, ε_i^* = ε_i-ρε_i-1, i = 2,...,N.

Cochrane-Orcutt suggests a simple approach to drop the first observation of the data transformation, and there is one less observations for model estimation. Estimate the transformed model Y^* = X^*β + ε^* using ordinary least squares.
Thus, b^*(1) = (X^*'X^*)^-1X^*'Y^*.
Repeat step (1) and obtain the new r⁽²⁾.
Repeat step (2) with r⁽²⁾ and obtain the new b^*(2).
Repeat step (3) and (4) until |r^(j)-r^(j+1)| < α and |b^*(j)-b^*(j+1)| < α, where α is the tolerance level for convergence, e.g., α = 0.001.

There is no guarantee that the final estimates of ρ and β obtained from the Cochrane-Orcutt procedure will be the global optimal solution.

Hildreth-Lu Grid Search Procedure

This is a grid search procedure on the interval -1 < ρ < 1. Set j = 0.

Divide the range of ρ into 20 sub-intervals with 19 grid values:
For example, the first set of grids are -0.9, ..., -0.1, 0.0, 0.1, ..., 0.9 for the interval (-1 1).
Set j = j+1.
For each grid value ρ, transform the variables Y and X as:
Y₁^* = (1-ρ²)^½Y₁, Y_i^* = Y_i-ρY_i-1, i = 2,...,N;
X₁^* = (1-ρ²)^½X₁, X_i^* = X_i-ρX_i-1, i = 2,...,N.
In addition,
ε₁^* = (1-ρ²)^½ε₁, ε_i^* = ε_i-ρε_i-1, i = 2,...,N.

Estimate the transformed model Y^* = X^*β + ε^* using the ordinary least squares.
Select the optimal ρ, denoted by r^(j), which corresponds to either the least squares of residuals or the maximum log-likelihood. The corresponding estimator of β is b^*(j) = (X^*'X^*)^-1X^*'Y^*.
Refine the range r^(j)-1/10^j < ρ < r^(j)+1/10^j, and repeat step (1) and (2) to obtain r^(j+1) and b^*(j+1).
Repeat step (3) until |r^(j)-r^(j+1)| < α and |b^*(j)-b^*(j+1)| < α, where α is the tolerance level for convergence, e.g., α = 0.001.

The advantage of the Hildreth-Lu grid search procedure is that it is likely to find the global optimal solution, provided that the final search interval for the autocorrelation coefficient is made sufficiently small.

Extension

To extend the AR(1) model estimation with higher order autocorrelation, says AR(p) for p>1, we need to estimate the following auxilary regression equation:

e_i = r₁e_i-1 + ... + r_pe_i-p + υ_i
where e_i = Y_i - X_ib is the residual of the regression model.

Based on Cochrane-Orcutt procedure, the first p observations of data series are dropped. Least squares estimation is carried out using the transformed data as:

Y_i^* = Y_i-r₁Y_i-1-...-r_pY_i-p, i = p+1,...,N;
X_i^* = X_i-r₁X_i-1-...-r_pX_i-p, i = p+1,...,N.

The iterations will continue until a convergent (local) solution (b^*,r^*) of (β,ρ) is found, where β = (β₁,...,β_K)' and ρ = (ρ₁,...,ρ_p)'.

Heterscedasticity in Time Series Regressions

So far, we present the hypothesis testings and corrections for autocorrelation without the consideration of heteroscedasticity. Similarly, when we study heteroscedasticity we assume no serial correlation. The problems of heterscedasicity and autocorrelation can be a mixed bag in the time series models. We can compute the heteroscedasticity-autocorrelation-consistent (robust) variance-covariance matrix to account for unspecifed forms of heteroscedasticity and autocorrelation. Using methods of generalized least squares, more efficient estimators can be obtained by assuming and correcting the specific forms of heteroscedasticity (e.g., weighted least squares) or autocorrelation (e.g., iterative least squares).

For a time series model, we first test and then correct a specific AR(p) structure with the transformed data as:

Y^* = X^*β + ε^*

Based on the estimated errors or residuals e^*, Breusch-Pagan test for heteroscedasticity is performed. If needed, correct for heterscedasticity using weighted least squares. One possible choice of the weight used is the (square-rooted) inverted fitted value of the estimated variance equation in the Breusch-Pagan test procedure.

Autoregressive Conditional Heteroscedasticity

The recent development of time series analysis includes a dynamic specification of conditional heteroscedasticity as follows:

Y_i = X_iβ + ε_i, ε_i = iid(0,σ_i²), and
σ_i² = E(ε_i²|ε_i-1,ε_i-2,...) = α₀ + α₁ε_i-1² + α₂ε_i-2² + ...

The simplest case of the first-order autoregressive conditional heteroscedasticity or ARCH(1) is:

σ_i² = α₀ + α₁ε_i-1².

This can be re-written as:

ε_i² = α₀ + α₁ε_i-1² + v_i, where v_i = ε_i²-σ_i².

ARCH(1) process is simply the AR(1) on the squares of regression residuals. A more general time series model may include autoregressive structures in the mean (i.e., residuals) and in the variance (i.e., squares of residuals). For many financial applications, the former structure is useful to study the rate of returns, while the latter is applicable for volatility analysis.

A generalization of ARCH(1) (or GARCH(1,1)) is defined by:

σ_i² = α₀ + α₁ε_i-1² + δ₁σ_i-1².

Example: Great Moderation?

Does the volatility of the U. S. GDP Growth decrease since 1980s?

Appendix

Generalized Least Squares Estimation: A Review

Considering the violation of spherical disturbance assumption (Assumption 4), the linear regression model is:
Y = Xβ + ε, and
E(ε|X) = 0
Var(ε|X) = σ²Ω

Suppose the symmetric positive definite matrix Ω is known.
There exists P (the "square root" matrix) such that Ω^-1 = P'P, or Ω = P^-1P^-1'.

Let Y^* = PY, X^* = PX, and ε^* = Pε, then the transformed linear regression model is:
Y^* = X^*β + ε^*, and
E(ε^*|X^*) = 0
Var(ε^*|X^*) = PVar(ε|X)P' = σ²PΩP' = σ²I

Since the classical assumptions for the transformed linear regression model are satisfied, least squares estimation is applied to minimize sum-of-squared transformed errors ε^*'ε^*:

b^* = (X^*'X^*)^-1X^*'Y^*

= β + (X^*'X^*)^-1X^*'ε^*

= β + (X'Ω^-1X)^-1X'Ω^-1ε

e^* = Y^*-X^*b^*

E(b^*|X) = β

Var(b^*|X) = E[(b^*-β)(b^*-β)']

= σ²(X^*'X^*)^-1

= σ²(X'Ω^-1X)^-1

s^*2 = e^*'e^*/(N-K), and E(s^*2) = σ²

Therefore, b^* ~^a Normal(β,s^*2(X'Ω^-1X)^-1).

Statistical inferences must be based on the generalized least squares estimator b^*, provided that the covariance structure Ω is known. If Ω is not known, then it must be estimated. If Ω can be estimated consistently, in large sample, then the generalized least squares estimator b^* is consistent and asymptotically efficient.

Var(b\|X)	= (1/N)(X'X/N)^-1{S₀ + ∑_j=1,...,p(1-j/(p+1))(S_j+S_j')}(X'X/N)^-1
	= (X'X)^-1{∑_i=1,...,Ne_i²X_i'X_i + ∑_j=1,...,p∑_i=j+1,...,N (1-j/(p+1))[e_ie_i-jX_i'X_i-j+e_i-je_iX_i-j'X_i]}(X'X)^-1

DW	= ∑_i=2,...,N(e_i-e_i-1)² / ∑_i=1,...,Ne_i²
	= {∑_2=1,...,Ne_i² -2∑_i=2,...,Ne_ie_i-1 +∑_i=2,...,Ne_i-1²} / ∑_i=1,...,Ne_i²
	≈ 2(1-r)

b^*	= (X^'X^)^-1X^'Y^
	= β + (X^'X^)^-1X^'ε^
	= β + (X'Ω^-1X)^-1X'Ω^-1ε

Var(b^*\|X)	= E[(b^-β)(b^-β)']
	= σ²(X^'X^)^-1
	= σ²(X'Ω^-1X)^-1