Classical Linear Model

Model Comparison and Selection

Nested Model Comparison and Selection

The model specifications under consideration must be nested. That is, when the restricted model (under the null hypothesis) is tested against the alternative unrestricted model, the latter encompasses the former model as a special case. Comparison of a pair of nested models can be based on:

R² or RSS
Select a model with the highest R-square or the smallest RSS.
Adjusted R² or s² = RSS/(N-K)
Select a model with the highest adjusted R-square is the same as selecting it based on the smallest RSS or estimated variance s² = RSS/(N-K).
In a regression model, higher R² is always possible by including more explanatory variables. However, with the adjustment of degrees of freedom, the adjusted R² may not increase. There are problems in using too many (irrelevant) and too few (omitted) variables in a regression model.
1. Too Many Expanatory Variables (Irrelevant Variables)
  The consquence of including irrelevant variables in a regression model is that the parameter estimates are not efficient (with larger variances), but they are unbiased.
2. Missing Explanatory Variables (Omitted Variables)
  Given a basic model, possible missing or omitted variables may include:
  (1) Polynominal terms of the existing explanatory variables
  (2) Interaction terms of the existing explanatory variables
  (3) Lag terms of both dependent and explanatory variables
  
  The consquence of the omitted variables is that the regression estimates are biased.
Model Selection Criteria
1. Ammemiya Prediction Criterion (APC)
  Select a model with minimum APC = (RSS/(N-K))(1+K/N).
  In order to compare with AIC and BIC below,
  the log form of APC may be used: ln(RSS/N) + ln((N+K)/(N-K)).
2. Akaike Information Criterion (AIC)
  Select a model with minimum AIC = ln(RSS/N) + 2K/N.
3. Schwarz Bayesian Information Criterion (BIC)
  Select a model with minimum BIC = ln(RSS/N) + ln(N)K/N.
It is easy to check that for sufficient degrees of freedom (N-K>2), BIC > ln(APC) > AIC. As the number of explanatory variables K increases, the gap between BIC and AIC (or ln(APC)) expands. Model selection based on BIC with heavier penality for the lost of degrees of freedom, will lean toward a simpler model.
F- or χ²-Based Test Statistics: Wald, Lagrange Multiplier, and Likelihood-Ratio
Wald and Lagrange Multiplier tests were discussed earlier.
Likelihood-Ratio test is based on the normality assumption as follows:
LR = -2*(ll^*-ll) = N ln(RSS^*/RSS) ~ χ²(J)
Where ll^* and ll are the log-likelihoods of the restricted and unrestricted model, respectively; RSS^* and RSS are the sum of squared residuals of the restricted and unrestricted model, respectively; J is the number of restrictions. Both models for comparison must satisfy the normal likelihood requirement (by assumption or as the asymptotic property).
Ramsey's Regression Specification Error Test (RESET)
1. Estimate the restricted (basic) model: Y = Xβ + ε. Let Y^p = Xb.
2. Define the unrestriced model as (1) plus several power terms of the fitted dependent variable:
  Y = Xβ + γY^p2 + δY^p3 + ... + ε
3. Estimate the unrestricted model (2) and perform Wald F test for the variables: Y^p2, Y^p3, ....
This test is useful when the omitted variables are not known a prior. However, it does not suggest the unrestricted model specification, or identify possible omitted variables.

Nonnested Model Comparison and Selection

There are situations for model comparison that the alternatives are not nested (or nonnested). Examples of nonnested models include (but not limited to):

Different Functional Forms Used in Two Models
Model A: Y = Xβ + ε
Model B: ln(Y) = ln(X)γ + υ
Different Dependent Variable Used in Two Models
If the dependent variables for different models are related by a continuous functional relationship, then the proper inverse transformation is needed to convert the preditted (or fitted) values of the dependent variable to the same unit of the other model. The squared correlation of the transformed dependent variables can be used to compared with the R-square of the other model.
For example, to compare a linear model with the log transformation of the same model, consider:
Model A: Y = Xβ + ε
Model B: ln(Y) = Xγ + υ
If we assume ε (Model A) and υ (Model B) follow a normal probability distribution, then the dependent variable Y in Model A and ln(Y) in Model B have the normal distribution. In particular, Y for Model B follows a log-normal probability distribution, with the mean and variance defined by:
E(Y) = exp[E(ln(Y))+Var(ln(Y))/2] (or, e^{E(ln(Y))+Var(ln(Y))/2})
Var(Y) = exp[2E(ln(Y))+Var(ln(Y))][exp(Var(ln(Y))-1] (or, e^{2E(ln(Y))+Var(ln(Y))}(e^Var(ln(Y))-1))
Let Y^p* = exp[E(ln(Y)^p)+Var(ln(Y)^p)/2], where ln(Y)^p = Xr is the fitted Model B and Var(ln(Y)^p) is the estimated model variance. Compute the squared correlation of Y and Y^p*, and compare with the R-square of Model A.
The alternative is to adjust the sum of squared residuals RSS of Model B by multiply it with the geometric mean of the variable Y. Let Y^gm = (Y₁Y₂...Y_N)^1/N, and RSS^* = (Y^gm)²*RSS (of Model B). Compare RSS^* with RSS of Model A.
Nonnested Explanatory Variables Used in Two Models
Model A: Y = Xβ + ε
Model B: Y = Zγ + υ
where X ≠ Z.
The above nonnested models can be reformulated to encompass one as a special case of an extended model, then the classical hypothesis testing procedures are applicable. For example,

Model A: Y = Xβ + ε Model B: Y = Zγ + υ

Model C: Y = Xβ^* + Zγ^* + ω

Perform the hypothesis testings:

Model A vs. C Model B vs. C

H₀: γ^*=0 H₀: β^*=0

H₁: γ^*≠0 H₁: β^*≠0

If X and Z overlap, then the above test is modified as follows: Let W = X∩Z ≠ ∅. Define X^* = X\Z, Z^* = Z\X. Then,

Model A: Y = Xβ + ε Model B: Y = Zγ + υ

Model C: Y = X^*β^* + Z^*γ^* + Wδ + ω

Perform the hypothesis testings:

Model A vs. C Model B vs. C

H₀: γ^*=0 H₀: β^*=0

H₁: γ^*≠0 H₁: β^*≠0

The alternative Davidson-MacKinnon approach (J Test) is to formulate the general models as follows:

Model A: Y = Xβ + ε Model B: Y = Zγ + υ

Model C: Y = Xβ^* + Y_B^pγ^* + ω Model C: Y = Zγ^* + Y_A^pβ^* + ω

where Y_A^p and Y_B^p are the fitted values of Y for Model A and Model B, respectively.
Perform the hypothesis testings:

Model A vs. C Model B vs. C

H₀: γ^*=0 H₀: β^*=0

H₁: γ^*≠0 H₁: β^*≠0
Hypothesis Testing Based on Likelihood Functions
H₀ (Model A): f(Y|X,β)
H₁ (Model B): g(Y|Z,γ)

Suppose there is a "true" (unknown) model defined by the likelihood function h(Y|W,δ). Which known model (A or B) is closer to the "true"?
Kullback-Leibler Information Criterion (KLIC) measures the distance between the "true" model and a hypothesized model (A or B) in terms of the likelihood function. That is,
KLIC₀ = E(ln h(Y|W,δ)|h is true) - E(ln f(Y|X,β)|h is true) > 0
KLIC₁ = E(ln h(Y|W,δ)|h is true) - E(ln g(Y|Z,γ)|h is true) > 0
The hypothesis testing is
H₀ (Model A is better): KLIC₀ < KLIC₁
H₁ (Model B is better): KLIC₁ < KLIC₀

Vuong Test
Consider a sample of N observations, i=1,2,...,N, denote ll_i,0 = ln f(Y_i|X_i,β) and ll_i,1 = ln g(Y_i|Z_i,γ). Define the likelihood ratio lr_i = ll_i,0-ll_i,1. Then the statistic
KLIC₁-KLIC₀ = E(ln f(Y|X,β)|h is true) - E(ln g(Y|Z,γ)|h is true)
is estimated by 1/N∑_i=1,2,...,N (ll_i,0-ll_i,1) = 1/N∑_i=1,2,...,Nlr_i.
Vuong (1989) showed that
V = √N(lr^m/lr^se) ~_a Normal(0,1) if Model A and B are equivalent.
where lr^m = 1/N∑_i=1,2,...,Nlr_i and lr^se = [1/N∑_i=1,2,...,N(lr_i-lr^m)²]^½ are the sample mean and standard error of lr_i, respectively.
Therefore, for 95% significance, if V > 1.96, then Model A is a better model. Otherwise, if V < -1.96, then Model B is better.

Model A:	Y = Xβ + ε	Model B:	Y = Zγ + υ
Model C:	Y = Xβ^* + Zγ^* + ω

Model A vs. C	Model B vs. C
H₀: γ^*=0	H₀: β^*=0
H₁: γ^*≠0	H₁: β^*≠0