Heteroscedasticity

Recall the basic assumptions for least squares model estimation:

Linearity: Y = Xβ + ε
Full Rank Condition:
X may be fixed or random variables, and rank(X) = K.
Exogeneity:
1. Strict Exogeneity: E(ε|X) = 0
  That is, E(ε_i|X) = 0, i=1,2,...,N.
2. E(ε_i|X_i) = 0,
  E(X_iε_i|X_i) = 0, i=1,2,...,N.
Spherical Disturbance:
1. Var(ε|X) = E(εε'|X) = σ²I. That is,
  Var(ε_i|X) = σ²,
  Cov(ε_i,ε_j|X) = 0, i≠j, i,j=1,2,...,N.
2. Var(ε_i|X_i) = σ²,
  Cov(ε_i,ε_j|X_i,X_j) = 0, i≠j, i,j=1,2,...,N.
Normality: ε|X ~ Normal(0,σ²I)

Model misspecification due to violation of the calssical assumption of homoscedasticity (Assunption 4) is considered here. First, we need to review the implications of the normality assumption in small sample or the asymptotic normality property in large sample.

Asymptotic Nomality

For a small or finite sample, the desired probability distribution of the estimated parameters for a regression model is derived from the normality assumption (Assumption 5). For a large sample, the asymptotic normal distribution of the estimated parameters is established by central limit theorem. Asymptotic normality is a requirement for useful statistical inference about the estimated model and test statistics.

The least squares residual e_i = Y_i - X_ib ~^a Normal(0,s²[1-X_i(X'X)^-1X_i']), i=1,2,...,N, can be tested using Bera-Jarque Test Statistic (for Asymptotic Normality) as follows:

Compute: Variance = ∑_i=1,2,...,N e_i²/N = s^2*

Skewness = ∑_i=1,2,...,N (e_i³/N)/(s^2*)^1½

Kurtosis = ∑_i=1,2,...,N (e_i⁴/N)/(s^2*)²

For a normal distribution, Skewness → 0 and Kurtosis → 3.

Bera-Jarque Test Statistic for Asympotic Normality is defined as
BJ = N[Skewness²/6 + (Kurtosis-3)²/24] ~ χ²(2).

For example, given a level of significance 0.05 χ²_0.95 = 5.99 from χ²(2)), if BJ > 5.99, then the null hypothesis of asymptotic normality is rejected. On the other hand, if BJ ≤ 5.99, then normality can not be rejected.

Heteroscedasticity

In a linear regression model, Y_i = X_iβ + ε_i, if Var(ε_i|X_i) = σ_i² for i = 1,2,...N, the model is subject to conditional heteroscedasticity. For simplicity, we assume no serial correlation as E(ε_iε_j|X_i,X_j) = 0 (i≠j). In other words,

Var(ε|X) = E(εε'|X) =

⌈

|

|

⌊

σ₁² 0 ... 0

0 σ₂² ... 0

: : : :

0 0 ... σ_N²

⌉

|

|

⌋

= σ²

⌈

|

|

⌊

ω₁ 0 ... 0

0 ω₂ ... 0

: : : :

0 0 ... ω_N

⌉

|

|

⌋

= σ²Ω

For convenience, we use the normalization ∑_1,2,...,Nω_i = N. Therefore σ² = 1/N ∑_1,2,...,Nσ_i².

By ignoring heteroscedasticity in the ordinary least squares estimation, the parameter estimators are inefficient, although they are unbiased, consistent, and asymptotically normal distributed.

From the estimated model Y = Xb + e, we have:
b = (X'X)^-1X'Y = β + (X'X)^-1X'ε
e = Y-Xb = [I-X(X'X)^-1X']ε
s² = e'e/(N-K) = ε[I-X(X'X)^-1X']ε.

E(b|X) = β, by Assumption 3. But, in general E(s²) ≠ σ²
However, it can be shown that if b is consistent then s² is a consistent estimator of σ²: If plim(b) = β, then plim(s²) = σ².

Var(b|X) = E[(b-β)(b-β)'|X]

= σ²(X'X)^-1X'ΩX(X'X)^-1, by assuming heteroscedasticity.

= (X'X)^-1X'

⌈

|

|

⌊

σ₁² 0 ... 0

0 σ₂² ... 0

: : : :

0 0 ... σ_N²

⌉

|

|

⌋

X(X'X)^-1

= (X'X)^-1{∑_i=1,...,Nσ_i²X_i'X_i}(X'X)^-1

= (1/N)(X'X/N)^-1{∑_i=1,...,Nσ_i²X_i'X_i/N}(X'X/N)^-1

= (σ²/N)(X'X/N)^-1{∑_i=1,...,Nω_iX_i'X_i/N}(X'X/N)^-1

Therefore, b ~^a Normal(β,(σ²/N)Q^-1Q^*Q^-1),
where Q = plim(X'X/N), and Q^* = plim(X'ΩX/N) = plim(∑_i=1,...,Nω_iX_i'X_i/N).

Heteroscedasticity-Consistent Variance-Covariance Matrix

By definition, σ_i² = E(ε_i²|X_i). Then an estimate of σ_i² is e_i², where e_i = Y_i-X_ib is the estimated least squares residual. The important large sample property is that

plim∑_i=1,...,Nσ_i²X_i'X_i/N = plim∑_i=1,...,Ne_i²X_i'X_i/N

Therefore the estimated heteroscedasticity-consistent (robust) variance-covariance matrix of b is obtained by

Var(b|X) = (X'X)^-1{∑_i=1,...,Ne_i²X_i'X_i}(X'X)^-1

Hypothesis Testing for Heteroscedasticity

Goldfeld-Quandt Test for Groupwise Heteroscedasticity
Suppose the sample can be divided into two or three groups according to the size of residual variances (which may be in relation with some explanatory variables). Let N₁ be the number of sample observations associated with the first group with larger variances. N₂ is the number of observations associated with the second group with smaller variances. N₁+N₂ ≤ N (there may be a third middle group which is eliminated). Let RSS₁ and RSS₂ are the corresponding sum of squared residuals from the estimated model Y = Xb + e with K parameters for N₁ and N₂ samples, respectively. Then,
(RSS₁/(N₁-K)) / (RSS₂/(N₂-K)) ~ F(N₁-K,N₂-K).
Breusch-Pagan LM Test for Heteroscedasticity
Suppose the specification of heteroscedasticity depends on a set of exogenous variables Z, which must include a constant term and may include some or all of the explanatory variables X.
1. From the estimated model Y = Xb + e, save e and compute the asymptotic variance s^2* = e'e/N.
2. Construct and estimate the auxilary regression equation: e² = Zδ + υ.
  From the fitted values e^2p = Zd, compute the explained-sum-of-squares e^2p'e^2p.
  Note that Z must include a constant term, and Z may be the same or a subset of X.
3. Breusch-Pagan test statistic is defined by:
  e^2p'e^2p/2(s^2*)² ~ χ²(L-1), where L is the number of variables in Z.
  Breusch-Pagan test statistic is constructed by assuming that e is assymtotic normal, therefore it is sensitive to the normality assumption.
4. The alternative Koenkar-Basset (generalized Breusch-Pagan) test statistic is defined by:
  e^2p'e^2p/Var(e²) ~ χ²(L-1)
  where Var(e²) = (e²-s^2*)'(e²-s^2*)/N, and L is the number of variables in Z.
  Note that this test statistic is equivalent to NR² statistic of the auxilary regression in step 2.
  Under normality, Koenkar-Basset test statistic has the same asymptotic distribution as the Breusch-Pagan statistic, but with the absence of normality, there is some evidence that Koenkar-Basset's provides a more powerful test.
White LM Test for Heteroscedasticity
Extending from the Breusch-Pagan test for heteroscedasticity, the auxilary regression equation in step 2 of the above Breusch-Pagan test procedure is modified as: e² = Wδ + υ, where W = [1 Z Z*Z]. That is, if Z = X, in addition to the same set explanatory variables used in the original regression, their quadratic terms (squares and cross products) are included in the auxilary regression equation.
The White LM test statistic is NR² of the estimated auxilary regression. It follows χ² distribution with degree of freedom equals to the number of variables in W excluding constant term.

Generalized Least Squares Estimation

Considering the violation of homoscedasticity assumption, the linear regression model is:
Y = Xβ + ε, and
E(ε|X) = 0
Var(ε|X) = σ²Ω

Suppose the symmetric positive definite matrix Ω is known.
There exists P (the "square root" matrix) such that Ω^-1 = P'P, or Ω = P^-1P^-1'.

Let Y^* = PY, X^* = PX, and ε^* = Pε, then the transformed linear regression model is:
Y^* = X^*β + ε^*, and
E(ε^*|X^*) = 0
Var(ε^*|X^*) = PVar(ε|X)P' = σ²PΩP' = σ²I

Since the classical assumptions for the transformed linear regression model are satisfied, least squares estimation is applied to minimize sum-of-squared transformed errors ε^*'ε^*:

b^* = (X^*'X^*)^-1X^*'Y^*

= β + (X^*'X^*)^-1X^*'ε^*

= β + (X'Ω^-1X)^-1X'Ω^-1ε

e^* = Y^*-X^*b^*

E(b^*|X) = β

Var(b^*|X) = E[(b^*-β)(b^*-β)']

= σ²(X^*'X^*)^-1

= σ²(X'Ω^-1X)^-1

s^*2 = e^*'e^*/(N-K), and E(s^*2) = σ²

Therefore, b^* ~^a Normal(β,s^*2(X'Ω^-1X)^-1).

Statistical inferences must be based on the generalized least squares estimator b^*, provided that the covariance structure Ω is known. If Ω is not known, then it must be estimated. If Ω can be estimated consistently, in large sample, then the generalized least squares estimator b^* is consistent and asymptotic efficient.

Correction for Heteroscedasticity: Weighted Least Squares Estimation

In general, the heteroscedastic specification of the variances must be known or consistently estimated before any correction procedures could be applied to the estimation of the regression model: Y = Xβ + ε. Recall that

Let w_i = 1/√ω_i, X_i^* = w_iX_i, and Y_i^* = w_iY_i, for i=1,2,...,N. Then the least squares estimation with weighted data matices is a special case of the generalized least sqaures estimation as follows:

b^* = (X^*'X^*)^-1X^*'Y^* = (X'Ω^-1X)^-1X'Ω^-1Y

= [∑_i=1,2,...,N(X_i^*'X_i^*)]^-1 [∑_i=1,2,...,N(X_i^*'Y_i^*)]

= [∑_i=1,2,...,N(w_iX_i)'(w_iX_i)]^-1 [∑_i=1,2,...,N(w_iX_i)'(w_iY_i)]

e^* = Y^* - X^*b^*

Var(b^*|X) = s^*2(X^*'X^*) = s²(X'Ω^-1X)^-1

= s^*2[∑_i=1,2,...,N(X_i^*'X_i^*)]^-1

= s^*2[∑_i=1,2,...,N(w_iX_i)'(w_iX_i)]^-1

where s^*2 = e^*'e^*/(N-K)

We note that the interpretation of the estimated model with weighted least squares, Y^* = X^*b^*, is the same as for Y = Xb^*.

If the source of heteroscedasticity is found to be one of the exogenous variables, says X_k, then 1/√X_k or 1/X_k may be used to weight the data matrix X and Y, and carry out weighted least sqaures estimation to correct for heteroscedasticity. In general, the heteroscedastic variance is a function of X (in part or all). Consider the following cases:

σ_i² = σ²X_ik for some k, then w_i = 1/√X_ik
σ_i² = σ²X_ik² for some k, then w_i = 1/X_ik
σ_i² = σ²X_ik^α for some k, then w_i = 1/X_ik^α/2
where α must be estimated.
σ_i² = σ² (X_i1^α₁X_i2^α₂...X_iK^α_K), then w_i = 1/(X_i1^α₁X_i2^α₂...X_iK^α_K)^½,
where α₁, ..., α_K must be estimated.

The last case of multiplicative heteroscedasticity may be expressed in log form as:
ln(σ_i²) = ln(σ²) + α₁ln(X_i1) + α₂ln(X_i2) + ... + α_Kln(X_iK)

This log-variance equation can be estimated as:
ln(e_i²) = α₀ + α₁ln(X_i1) + α₂ln(X_i2) + ... + α_Kln(X_iK) + υ_i

The exponential transformation of the fitted values exp[ln(e_i²)] is used to approximate the heteroscedastic variance σ_i². We may apply hypothesis testing for the significance of each α_i. If α_i = 0 for all i=1,2,...,K, then the null hypthesis of homoscedasticity can not be rejected.

Compute:	Variance = ∑_i=1,2,...,N e_i²/N = s^2*
	Skewness = ∑_i=1,2,...,N (e_i³/N)/(s^2*)^1½
	Kurtosis = ∑_i=1,2,...,N (e_i⁴/N)/(s^2*)²

b^*	= (X^'X^)^-1X^'Y^
	= β + (X^'X^)^-1X^'ε^
	= β + (X'Ω^-1X)^-1X'Ω^-1ε

Var(b^*\|X)	= E[(b^-β)(b^-β)']
	= σ²(X^'X^)^-1
	= σ²(X'Ω^-1X)^-1

b^*	= (X^'X^)^-1X^'Y^ = (X'Ω^-1X)^-1X'Ω^-1Y
	= [∑_i=1,2,...,N(X_i^'X_i^)]^-1 [∑_i=1,2,...,N(X_i^'Y_i^)]
	= [∑_i=1,2,...,N(w_iX_i)'(w_iX_i)]^-1 [∑_i=1,2,...,N(w_iX_i)'(w_iY_i)]

Var(b^*\|X)	= s^2(X^'X^*) = s²(X'Ω^-1X)^-1
	= s^2[∑_i=1,2,...,N(X_i^'X_i^*)]^-1
	= s^*2[∑_i=1,2,...,N(w_iX_i)'(w_iX_i)]^-1