Heteroscedasticity

Recall the basic assumptions for least squares model estimation:

  1. Linearity: Y = Xβ + ε

  2. Full Rank Condition:
    X may be fixed or random variables, and rank(X) = K.

  3. Exogeneity:
    1. Strict Exogeneity: E(ε|X) = 0
      That is, E(εi|X) = 0, i=1,2,...,N.
    2. E(εi|Xi) = 0,
      E(Xiεi|Xi) = 0, i=1,2,...,N.

  4. Spherical Disturbance:
    1. Var(ε|X) = E(εε'|X) = σ2I. That is,
      Var(εi|X) = σ2,
      Cov(εij|X) = 0, i≠j, i,j=1,2,...,N.
    2. Var(εi|Xi) = σ2,
      Cov(εij|Xi,Xj) = 0, i≠j, i,j=1,2,...,N.

  5. Normality: ε|X ~ Normal(0,σ2I)

Model misspecification due to violation of the calssical assumption of homoscedasticity (Assunption 4) is considered here. First, we need to review the implications of the normality assumption in small sample or the asymptotic normality property in large sample.

Asymptotic Nomality

For a small or finite sample, the desired probability distribution of the estimated parameters for a regression model is derived from the normality assumption (Assumption 5). For a large sample, the asymptotic normal distribution of the estimated parameters is established by central limit theorem. Asymptotic normality is a requirement for useful statistical inference about the estimated model and test statistics.

The least squares residual ei = Yi - Xib ~a Normal(0,s2[1-Xi(X'X)-1Xi']), i=1,2,...,N, can be tested using Bera-Jarque Test Statistic (for Asymptotic Normality) as follows:

Compute: Variance = ∑i=1,2,...,N ei2/N = s2*
Skewness = ∑i=1,2,...,N (ei3/N)/(s2*)
Kurtosis = ∑i=1,2,...,N (ei4/N)/(s2*)2
For a normal distribution, Skewness → 0 and Kurtosis → 3.

Bera-Jarque Test Statistic for Asympotic Normality is defined as
BJ = N[Skewness2/6 + (Kurtosis-3)2/24] ~ χ2(2).

For example, given a level of significance 0.05 χ20.95 = 5.99 from χ2(2)), if BJ > 5.99, then the null hypothesis of asymptotic normality is rejected. On the other hand, if BJ ≤ 5.99, then normality can not be rejected.

Heteroscedasticity

In a linear regression model, Yi = Xiβ + εi, if Var(εi|Xi) = σi2 for i = 1,2,...N, the model is subject to conditional heteroscedasticity. For simplicity, we assume no serial correlation as E(εiεj|Xi,Xj) = 0 (i≠j). In other words,

Var(ε|X) = E(εε'|X) =
|
|
σ120...0
0σ22...0
::::
00...σN2
|
|
= σ2
|
|
ω10...0
0ω2...0
::::
00...ωN
|
|
= σ2Ω

For convenience, we use the normalization ∑1,2,...,Nωi = N. Therefore σ2 = 1/N ∑1,2,...,Nσi2.

By ignoring heteroscedasticity in the ordinary least squares estimation, the parameter estimators are inefficient, although they are unbiased, consistent, and asymptotically normal distributed.

From the estimated model Y = Xb + e, we have:
b = (X'X)-1X'Y = β + (X'X)-1X'ε
e = Y-Xb = [I-X(X'X)-1X']ε
s2 = e'e/(N-K) = ε[I-X(X'X)-1X']ε.

E(b|X) = β, by Assumption 3. But, in general E(s2) ≠ σ2
However, it can be shown that if b is consistent then s2 is a consistent estimator of σ2: If plim(b) = β, then plim(s2) = σ2.

Var(b|X) = E[(b-β)(b-β)'|X]
= σ2(X'X)-1X'ΩX(X'X)-1, by assuming heteroscedasticity.
= (X'X)-1X'
|
|
σ120...0
0σ22...0
::::
00...σN2
|
|
X(X'X)-1
= (X'X)-1{∑i=1,...,Nσi2Xi'Xi}(X'X)-1
= (1/N)(X'X/N)-1{∑i=1,...,Nσi2Xi'Xi/N}(X'X/N)-1
= (σ2/N)(X'X/N)-1{∑i=1,...,NωiXi'Xi/N}(X'X/N)-1

Therefore, b ~a Normal(β,(σ2/N)Q-1Q*Q-1),
where Q = plim(X'X/N), and Q* = plim(X'ΩX/N) = plim(∑i=1,...,NωiXi'Xi/N).

Heteroscedasticity-Consistent Variance-Covariance Matrix

By definition, σi2 = E(εi2|Xi). Then an estimate of σi2 is ei2, where ei = Yi-Xib is the estimated least squares residual. The important large sample property is that

plim∑i=1,...,Nσi2Xi'Xi/N = plim∑i=1,...,Nei2Xi'Xi/N

Therefore the estimated heteroscedasticity-consistent (robust) variance-covariance matrix of b is obtained by

Var(b|X) = (X'X)-1{∑i=1,...,Nei2Xi'Xi}(X'X)-1

Hypothesis Testing for Heteroscedasticity

Generalized Least Squares Estimation

Considering the violation of homoscedasticity assumption, the linear regression model is:
Y = Xβ + ε, and
E(ε|X) = 0
Var(ε|X) = σ2Ω

Suppose the symmetric positive definite matrix Ω is known.
There exists P (the "square root" matrix) such that Ω-1 = P'P, or Ω = P-1P-1'.

Let Y* = PY, X* = PX, and ε* = Pε, then the transformed linear regression model is:
Y* = X*β + ε*, and
E(ε*|X*) = 0
Var(ε*|X*) = PVar(ε|X)P' = σ2PΩP' = σ2I

Since the classical assumptions for the transformed linear regression model are satisfied, least squares estimation is applied to minimize sum-of-squared transformed errors ε**:

b*= (X*'X*)-1X*'Y*
= β + (X*'X*)-1X**
= β + (X'Ω-1X)-1X'Ω-1ε
e* = Y*-X*b*

E(b*|X) = β
Var(b*|X)= E[(b*-β)(b*-β)']
= σ2(X*'X*)-1
= σ2(X'Ω-1X)-1
s*2 = e*'e*/(N-K), and E(s*2) = σ2

Therefore, b* ~a Normal(β,s*2(X'Ω-1X)-1).

Statistical inferences must be based on the generalized least squares estimator b*, provided that the covariance structure Ω is known. If Ω is not known, then it must be estimated. If Ω can be estimated consistently, in large sample, then the generalized least squares estimator b* is consistent and asymptotic efficient.

Correction for Heteroscedasticity: Weighted Least Squares Estimation

In general, the heteroscedastic specification of the variances must be known or consistently estimated before any correction procedures could be applied to the estimation of the regression model: Y = Xβ + ε. Recall that

Var(ε|X) = σ2Ω = σ2
|
|
ω10...0
0ω2...0
::::
00...ωN
|
|

Let wi = 1/√ωi, Xi* = wiXi, and Yi* = wiYi, for i=1,2,...,N. Then the least squares estimation with weighted data matices is a special case of the generalized least sqaures estimation as follows:

b*= (X*'X*)-1X*'Y* = (X'Ω-1X)-1X'Ω-1Y
= [∑i=1,2,...,N(Xi*'Xi*)]-1 [∑i=1,2,...,N(Xi*'Yi*)]
= [∑i=1,2,...,N(wiXi)'(wiXi)]-1 [∑i=1,2,...,N(wiXi)'(wiYi)]
e* = Y* - X*b*
Var(b*|X)= s*2(X*'X*) = s2(X'Ω-1X)-1
= s*2[∑i=1,2,...,N(Xi*'Xi*)]-1
= s*2[∑i=1,2,...,N(wiXi)'(wiXi)]-1
where s*2 = e*'e*/(N-K)

We note that the interpretation of the estimated model with weighted least squares, Y* = X*b*, is the same as for Y = Xb*.

If the source of heteroscedasticity is found to be one of the exogenous variables, says Xk, then 1/√Xk or 1/Xk may be used to weight the data matrix X and Y, and carry out weighted least sqaures estimation to correct for heteroscedasticity. In general, the heteroscedastic variance is a function of X (in part or all). Consider the following cases:

The last case of multiplicative heteroscedasticity may be expressed in log form as:
lni2) = ln2) + α1ln(Xi1) + α2ln(Xi2) + ... + αKln(XiK)

This log-variance equation can be estimated as:
ln(ei2) = α0 + α1ln(Xi1) + α2ln(Xi2) + ... + αKln(XiK) + υi

The exponential transformation of the fitted values exp[ln(ei2)] is used to approximate the heteroscedastic variance σi2. We may apply hypothesis testing for the significance of each αi. If αi = 0 for all i=1,2,...,K, then the null hypthesis of homoscedasticity can not be rejected.


Copyright © Kuan-Pin Lin
Last updated: November 23, 2009