Autocorrelation

Introduction

Recall the basic assumptions for least squares model estimation:

  1. Linearity: Y = Xβ + ε

  2. Full Rank Condition:
    X may be fixed or random variables, and rank(X) = K.

  3. Exogeneity:
    1. Strict Exogeneity: E(ε|X) = 0
      That is, E(εi|X) = 0, i=1,2,...,N.
    2. E(εi|Xi) = 0,
      E(Xiεi|Xi) = 0, i=1,2,...,N.

  4. Spherical Disturbance:
    1. Var(ε|X) = E(εε'|X) = σ2I. That is,
      Var(εi|X) = σ2,
      Cov(εij|X) = 0, i≠j, i,j=1,2,...,N.
    2. Var(εi|Xi) = σ2,
      Cov(εij|Xi,Xj) = 0, i≠j, i,j=1,2,...,N.

  5. Normality: ε|X ~ Normal(0,σ2I)

Model misspecification due to violation of the calssical assumption of no serial correlation (Assumption 4) is considered here. In particular, the covariance between observation i and j:

Cov(εij|Xi,Xj) = γ|i-j| ≠ 0, for i, j = 1,2,...,N, and i≠j.

For simplicity, we assume homoscedasticity as Var(εi|Xi) = σ2 = γ0, for i = 1,2,...,N.

Define the autocorrelation coefficient between observation i and j as

ρij = ρji = γ|i-j| / γ0.

Then, the variance-covariance matrix of ε is written as

Var(ε|X) = E(εε'|X) = σ2Ω =
|
|
γ0γ1...γN-1
γ1γ0...γN-2
::::
γN-1γN-2...γ0
|
|
= γ0
|
|
1ρ12...ρ1N
ρ211...ρ2N
::::
ρN1ρN2...1
|
|

Example

For example, we consider the simplest case of first order serial correlation or autocorrelation as follows:

Yi = Xiβ + εi
εi = ρεi-1 + υi, |ρ|<1, and i = 1,2,...,N.

Therefore, Yi = Xiβ + ρ(Yi-1-Xi-1β) + υi

We assume homoscedasticity as Var(υi|Xi,Yi-1,Xi-1) = σu2 for i = 1,2,...,N.
Then, γ0 = Var(εi|Xi) = σ2 = σu2/(1-ρ2).
Furthermore, γ|i-j| = Cov(εij|Xi,Xj) = ρ|i-j|σ2.
That is, γ1 = ρσ2, γ2 = ρ2σ2, γ3 = ρ3σ2, ....

Therefore,

Var(ε|X) = E(εε'|X) = σu2/(1-ρ2)
|
|
1ρ...ρN-1
ρ1...ρN-2
::::
ρN-1ρN-2...1
|
|

Least Squares Estimation

By ignoring serial correlation in the ordinary least squares estimation, the parameter estimators are inefficient, although they are unbiased, consistent, and asymptotically normal distributed.

From the estimated model Y = Xb + e, we have:
b = (X'X)-1X'Y = β + (X'X)-1X'ε
e = Y-Xb = [I-X(X'X)-1X']ε
s2 = e'e/(N-K) = ε[I-X(X'X)-1X']ε.

E(b|X) = β, by Assumption 3. But, in general E(s2) ≠ σ2
However, it can be shown that if b is consistent then s2 is a consistent estimator of σ2: If plim(b) = β, then plim(s2) = σ2.

Var(b|X) = E[(b-β)(b-β)'|X]
= σ2(X'X)-1X'ΩX(X'X)-1, by assuming autocorrelation.
= σ2(X'X)-1X'
|
|
1ρ12...ρ1N
ρ211...ρ2N
::::
ρN1ρN2...1
|
|
X(X'X)-1
= σ2(X'X)-1{∑i=1,...,Nj=1,...,NρijXi'Xj}(X'X)-1
= (σ2/N)(X'X/N)-1{∑i=1,...,Nj=1,...,NρijXi'Xj/N}(X'X/N)-1

Therefore, b ~a Normal(β,(σ2/N)Q-1Q*Q-1),
where Q = plim(X'X/N), and Q* = plim(X'ΩX/N) = plim(∑i=1,...,Nj=1,...,NρijXi'Xj/N).

Heteroscedasticity-Autocorrelation-Consistent Variance-Covariance Matrix

σ2Q* = σ2i=1,...,Nj=1,...,N ρijXi'Xj/N may be consistently estimated by ∑i=1,...,Nj=1,...,N eiejXi'Xj/N, where ei = Yi-Xib is the estimated least squares residual. However, the later matrix needs not to be positive definite. Parallel to the White estimator for heteroscedasticity, Newey-West estimator of σ2Q* for autocorrelated disturbance with an unspecified structure is

S0 + ∑j=1,...,p(1-j/(p+1))(Sj+Sj')

where
S0 = ∑i=1,...,Nei2Xi'Xi/N, and
Sj = ∑i=j+1,...,Neiei-jXi'Xi-j/N.

Therefore the estimated heteroscedasticity-autocorrelation-consistent (robust) variance-covariance matrix of b is obtained by

Var(b|X) = (1/N)(X'X/N)-1{S0 + ∑j=1,...,p(1-j/(p+1))(Sj+Sj')}(X'X/N)-1
= (X'X)-1{∑i=1,...,Nei2Xi'Xi + ∑j=1,...,pi=j+1,...,N (1-j/(p+1))[eiei-jXi'Xi-j+ei-jeiXi-j'Xi]}(X'X)-1

We note that both heteroscedasticity and autocorrelation are considered in the construction of the robust variance-covariance matrix. For aurocorrelated disturbance, we need to specify the number of dominated lags p. For convenience, this can be set as p ≈ N1/4.

Hypothesis Testing for Autocorrelation

Given the linear model Yi = Xiβ + εi with autocorrelated disturbance:
AR(1): εi = ρεi-1 + υi
AR(2): εi = ρ1εi-1 + ρ2εi-2 + υi
::
AR(p): εi = ρ1εi-1 + ... + ρpεi-p + υi

In the simplest case of AR(1), the estimator of autocorrelation coefficient ρ is

r = ∑i=2,...,Neiei-1 / ∑i=1,...,Nei2

where ei = Yi-Xib is the estimated error or residual.
The statistical significance of the estimated autocorrelation coefficients is the basis for testing autocorrelation in the model specification:

H0: ρ1 = ρ2 = ... = ρp = 0
H1: not H0

Durbin-Watson Bounds Test for First-Order Autocorrelation

From the least squares regression residual ei = Yi - Xib, we define

DW = ∑i=2,...,N(ei-ei-1)2 / ∑i=1,...,Nei2
= {∑2=1,...,Nei2 -2∑i=2,...,Neiei-1 +∑i=2,...,Nei-12} / ∑i=1,...,Nei2
≈ 2(1-r)
This is because ∑i=1,...,Nei2 ≈ ∑i=2,...,Nei2 ≈ ∑i=2,...,Nei-12, and
r = ∑i=2,...,Neiei-1 / ∑i=1,...,Nei2

Therefore, 0 < DW < 4, and
DW ≈ 2 as r = 0
DW → 0 as r → +1
DW → 4 as r → -1

The DW statistic depends on N and K as well as the data X. There is no fixed distribution and critical values of DW can be used for testing the first-order autocorrelation. For hypothesis testing, it relies on two bound statistics which depend on N and K but not on the data X. Let

DWL = DWL(N,K) = critical value of lower bound of DW(N,K,X)
DWU = DWU(N,K) = critical value of upper bound of DW(N,K,X)

Consider the following three cases of hypothesis testing:

H0: ρ = 0    
H1: ρ > 0    
0 < DW < 2
If DW < DWL, ρ > 0 (H0 is rejected).
If DW > DWU, ρ = 0 (H0 is not rejected).
If DWL < DW < DWU, inconclusive!

H0: ρ = 0    
H1: ρ < 0    
2 < DW < 4
If DW > 4-DWL, ρ < 0 (H0 is rejected).
If DW < 4-DWU, ρ = 0 (H0 is not rejected).
If 4-DWU < DW < 4-DWL, inconclusive!

H0: ρ = 0    
H1: ρ ≠ 0    
0 < DW < 4
If DW < DWL or DW > 4-DWL, ρ1 ≠ 0 (H0 is rejected).
If DWU < DW < 4-DWU, ρ1 = 0 (H0 is not rejected).
If DWL < DW < DWU or 4-DWU < DW < 4-DWL, inconclusive!

The shortcoming of the Durbin-Watson test is that there is an inconclusive region for the test, and it is lage if N is small. It can only be used to test the first-order autocorrelation. We warn that the Durbin-Watson test can not be applied to a model with lagged dependent variable.

Breusch-Godfrey LM Test for Autocorrelation

From the least squares regression residual ei = Yi - Xib, we estimate the auxilary regression of the form:

ei = Xic + ρ1ei-1 + ... + ρpei-p + ui

and test the null hypothesis that all the ρs are jointly zero.
Breusch-Godfrey test statistic is based on the R-square of the auxilary regression equation as follows:

NR2 ~ χ2(p)

Box-Pierce and Ljung-Box Q Tests for Autocorrelation

Let rj = ∑i=j+1,...,Neiei-j / ∑i=1,...,Nei2 be the estimator of j-th order autocorrelation coefficient.

Correction for Autocorrelation

Consider the simplest case of autocorrelated disturbance AR(1). The model of interest is:

Yi = Xiβ + εi
εi = ρεi-1 + υi

where υ is assumed to be a white noise (that is, individually independently distributed with mean zero and homoscedastic variance σu2). In general, the specification of autocorrelated disturbance must be known or consistently estimated before any correction procedures could be applied to the estimation of the regression model. Recall that for the first-order autocorrelation, we have

Var(ε|X) = E(εε'|X) = (σu2/(1-ρ2))Ω = σ2Ω
where Ω =
|
|
1ρ...ρN-1
ρ1...ρN-2
::::
ρN-1ρN-2...1
|
|

It is easy to check that

Ω-1 =
|
|
|
|
100...00
1+ρ20...00
01+ρ2...00
:::::::
0000...1+ρ2
0000...1
|
|
|
|
= P'P, and

P =
|
|
|
(1-ρ2)½00...00
10...00
01...00
::::::
000...1
|
|
|

Let Y1* = (1-ρ2)½Y1, Yi* = Yi-ρYi-1, i = 2,...,N;
X1* = (1-ρ2)½X1, Xi* = Xi-ρXi-1, i = 2,...,N.

Then the least squares estimation with the transformaed data matices is a special case of the generalized least sqaures estimation (see Appendix for a review) as follows:

b*= (X*'X*)-1X*'Y* = (X'Ω-1X)-1X'Ω-1Y
e* = Y* - X*b*
s*2 = e*'e*/(N-K)
Var(b*|X)= s*2(X*'X*) = s*2(X'Ω-1X)-1

To use Ω (or Ω-1 and P) with the GLS, ρ must be estimated.

Prais-Winsten/Cochrane-Orcutt Iterative Procedure

  1. From the least sqaures residuals ei = Yi - Xib, i = 1,2,...,N,
    compute the estimator of autocorrelation coefficient:
    r(1) = ∑i=2,...,Neiei-1 / ∑i=1,...,Nei2

  2. According to Prais-Whinsten, transform data of Y and X as:
    Y1* = (1-r(1)2)½Y1, Yi* = Yi-r(1)Yi-1, i = 2,...,N;
    X1* = (1-r(1)2)½X1, Xi* = Xi-r(1)Xi-1, i = 2,...,N.
    In addition,
    ε1* = (1-ρ2)½ε1, εi* = εi-ρεi-1, i = 2,...,N.

    Cochrane-Orcutt suggests a simple approach to drop the first observation of the data transformation, and there is one less observations for model estimation. Estimate the transformed model Y* = X*β + ε* using ordinary least squares.
    Thus, b*(1) = (X*'X*)-1X*'Y*.

  3. Repeat step (1) and obtain the new r(2).

  4. Repeat step (2) with r(2) and obtain the new b*(2).

  5. Repeat step (3) and (4) until |r(j)-r(j+1)| < α and |b*(j)-b*(j+1)| < α, where α is the tolerance level for convergence, e.g., α = 0.001.
There is no guarantee that the final estimates of ρ and β obtained from the Cochrane-Orcutt procedure will be the global optimal solution.

Hildreth-Lu Grid Search Procedure

This is a grid search procedure on the interval -1 < ρ < 1. Set j = 0.

  1. Divide the range of ρ into 20 sub-intervals with 19 grid values:
    For example, the first set of grids are -0.9, ..., -0.1, 0.0, 0.1, ..., 0.9 for the interval (-1 1).
    Set j = j+1.

  2. For each grid value ρ, transform the variables Y and X as:
    Y1* = (1-ρ2)½Y1, Yi* = Yi-ρYi-1, i = 2,...,N;
    X1* = (1-ρ2)½X1, Xi* = Xi-ρXi-1, i = 2,...,N.
    In addition,
    ε1* = (1-ρ2)½ε1, εi* = εi-ρεi-1, i = 2,...,N.

    Estimate the transformed model Y* = X*β + ε* using the ordinary least squares.

    Select the optimal ρ, denoted by r(j), which corresponds to either the least squares of residuals or the maximum log-likelihood. The corresponding estimator of β is b*(j) = (X*'X*)-1X*'Y*.

  3. Refine the range r(j)-1/10j < ρ < r(j)+1/10j, and repeat step (1) and (2) to obtain r(j+1) and b*(j+1).

  4. Repeat step (3) until |r(j)-r(j+1)| < α and |b*(j)-b*(j+1)| < α, where α is the tolerance level for convergence, e.g., α = 0.001.

The advantage of the Hildreth-Lu grid search procedure is that it is likely to find the global optimal solution, provided that the final search interval for the autocorrelation coefficient is made sufficiently small.

Extension

To extend the AR(1) model estimation with higher order autocorrelation, says AR(p) for p>1, we need to estimate the following auxilary regression equation:

ei = r1ei-1 + ... + rpei-p + υi
where ei = Yi - Xib is the residual of the regression model.

Based on Cochrane-Orcutt procedure, the first p observations of data series are dropped. Least squares estimation is carried out using the transformed data as:

Yi* = Yi-r1Yi-1-...-rpYi-p, i = p+1,...,N;
Xi* = Xi-r1Xi-1-...-rpXi-p, i = p+1,...,N.

The iterations will continue until a convergent (local) solution (b*,r*) of (β,ρ) is found, where β = (β1,...,βK)' and ρ = (ρ1,...,ρp)'.

Heterscedasticity in Time Series Regressions

So far, we present the hypothesis testings and corrections for autocorrelation without the consideration of heteroscedasticity. Similarly, when we study heteroscedasticity we assume no serial correlation. The problems of heterscedasicity and autocorrelation can be a mixed bag in the time series models. We can compute the heteroscedasticity-autocorrelation-consistent (robust) variance-covariance matrix to account for unspecifed forms of heteroscedasticity and autocorrelation. Using methods of generalized least squares, more efficient estimators can be obtained by assuming and correcting the specific forms of heteroscedasticity (e.g., weighted least squares) or autocorrelation (e.g., iterative least squares).

For a time series model, we first test and then correct a specific AR(p) structure with the transformed data as:

Y* = X*β + ε*

Based on the estimated errors or residuals e*, Breusch-Pagan test for heteroscedasticity is performed. If needed, correct for heterscedasticity using weighted least squares. One possible choice of the weight used is the (square-rooted) inverted fitted value of the estimated variance equation in the Breusch-Pagan test procedure.

Autoregressive Conditional Heteroscedasticity

The recent development of time series analysis includes a dynamic specification of conditional heteroscedasticity as follows:

Yi = Xiβ + εi, εi = iid(0,σi2), and
σi2 = E(εi2i-1i-2,...) = α0 + α1εi-12 + α2εi-22 + ...

The simplest case of the first-order autoregressive conditional heteroscedasticity or ARCH(1) is:

σi2 = α0 + α1εi-12.

This can be re-written as:

εi2 = α0 + α1εi-12 + vi, where vi = εi2i2.

ARCH(1) process is simply the AR(1) on the squares of regression residuals. A more general time series model may include autoregressive structures in the mean (i.e., residuals) and in the variance (i.e., squares of residuals). For many financial applications, the former structure is useful to study the rate of returns, while the latter is applicable for volatility analysis.

A generalization of ARCH(1) (or GARCH(1,1)) is defined by:

σi2 = α0 + α1εi-12 + δ1σi-12.

Example: Great Moderation?

Does the volatility of the U. S. GDP Growth decrease since 1980s?


Appendix

Generalized Least Squares Estimation: A Review

Considering the violation of spherical disturbance assumption (Assumption 4), the linear regression model is:
Y = Xβ + ε, and
E(ε|X) = 0
Var(ε|X) = σ2Ω

Suppose the symmetric positive definite matrix Ω is known.
There exists P (the "square root" matrix) such that Ω-1 = P'P, or Ω = P-1P-1'.

Let Y* = PY, X* = PX, and ε* = Pε, then the transformed linear regression model is:
Y* = X*β + ε*, and
E(ε*|X*) = 0
Var(ε*|X*) = PVar(ε|X)P' = σ2PΩP' = σ2I

Since the classical assumptions for the transformed linear regression model are satisfied, least squares estimation is applied to minimize sum-of-squared transformed errors ε**:

b*= (X*'X*)-1X*'Y*
= β + (X*'X*)-1X**
= β + (X'Ω-1X)-1X'Ω-1ε
e* = Y*-X*b*

E(b*|X) = β
Var(b*|X)= E[(b*-β)(b*-β)']
= σ2(X*'X*)-1
= σ2(X'Ω-1X)-1
s*2 = e*'e*/(N-K), and E(s*2) = σ2

Therefore, b* ~a Normal(β,s*2(X'Ω-1X)-1).

Statistical inferences must be based on the generalized least squares estimator b*, provided that the covariance structure Ω is known. If Ω is not known, then it must be estimated. If Ω can be estimated consistently, in large sample, then the generalized least squares estimator b* is consistent and asymptotically efficient.


Copyright © Kuan-Pin Lin
Last updated: January 12, 2010