Select a model with the highest R-square or the smallest RSS.
Select a model with the highest adjusted R-square is the same as selecting it based on the smallest RSS or estimated variance s2 = RSS/(N-K).
In a regression model, higher R2 is always possible by including more explanatory variables. However, with the adjustment of degrees of freedom, the adjusted R2 may not increase. There are problems in using too many (irrelevant) and too few (omitted) variables in a regression model.
The consquence of the omitted variables is that the regression estimates are biased.
It is easy to check that for sufficient degrees of freedom (N-K>2), BIC > ln(APC) > AIC. As the number of explanatory variables K increases, the gap between BIC and AIC (or ln(APC)) expands. Model selection based on BIC with heavier penality for the lost of degrees of freedom, will lean toward a simpler model.
Wald and Lagrange Multiplier tests were discussed earlier.
Likelihood-Ratio test is based on the normality assumption as follows:
LR = -2*(ll*-ll) = N ln(RSS*/RSS) ~ χ2(J)
Where ll* and ll are the log-likelihoods of the restricted and unrestricted model, respectively; RSS* and RSS are the sum of squared residuals of the restricted and unrestricted model, respectively; J is the number of restrictions. Both models for comparison must satisfy the normal likelihood requirement (by assumption or as the asymptotic property).
This test is useful when the omitted variables are not known a prior. However, it does not suggest the unrestricted model specification, or identify possible omitted variables.
Model A: Y = Xβ + ε
Model B: ln(Y) = ln(X)γ + υ
If the dependent variables for different models are related by a continuous functional relationship, then the proper inverse transformation is needed to convert the preditted (or fitted) values of the dependent variable to the same unit of the other model. The squared correlation of the transformed dependent variables can be used to compared with the R-square of the other model.
For example, to compare a linear model with the log transformation of the same model, consider:
Model A: Y = Xβ + ε
Model B: ln(Y) = Xγ + υ
If we assume ε (Model A) and υ (Model B) follow a normal probability distribution, then the dependent variable Y in Model A and ln(Y) in Model B have the normal distribution. In particular, Y for Model B follows a log-normal probability distribution, with the mean and variance defined by:
E(Y) = exp[E(ln(Y))+Var(ln(Y))/2]
(or, eE(ln(Y))+Var(ln(Y))/2)
Var(Y) = exp[2E(ln(Y))+Var(ln(Y))][exp(Var(ln(Y))-1]
(or, e2E(ln(Y))+Var(ln(Y))(eVar(ln(Y))-1))
Let Yp* = exp[E(ln(Y)p)+Var(ln(Y)p)/2], where ln(Y)p = Xr is the fitted Model B and Var(ln(Y)p) is the estimated model variance. Compute the squared correlation of Y and Yp*, and compare with the R-square of Model A.
The alternative is to adjust the sum of squared residuals RSS of Model B by multiply it with the geometric mean of the variable Y. Let Ygm = (Y1Y2...YN)1/N, and RSS* = (Ygm)2*RSS (of Model B). Compare RSS* with RSS of Model A.
Model A: Y = Xβ + ε
Model B: Y = Zγ + υ
where X ≠ Z.
The above nonnested models can be reformulated to encompass one as a special case of an extended model, then the classical hypothesis testing procedures are applicable. For example,
Model A: | Y = Xβ + ε | Model B: | Y = Zγ + υ |
Model C: | Y = Xβ* + Zγ* + ω |
Perform the hypothesis testings:
Model A vs. C | Model B vs. C |
H0: γ*=0 | H0: β*=0 |
H1: γ*≠0 | H1: β*≠0 |
If X and Z overlap, then the above test is modified as follows: Let W = X∩Z ≠ ∅. Define X* = X\Z, Z* = Z\X. Then,
Model A: | Y = Xβ + ε | Model B: | Y = Zγ + υ |
Model C: | Y = X*β* + Z*γ* + Wδ + ω |
Perform the hypothesis testings:
Model A vs. C | Model B vs. C |
H0: γ*=0 | H0: β*=0 |
H1: γ*≠0 | H1: β*≠0 |
The alternative Davidson-MacKinnon approach (J Test) is to formulate the general models as follows:
Model A: | Y = Xβ + ε | Model B: | Y = Zγ + υ |
Model C: | Y = Xβ* + YBpγ* + ω | Model C: | Y = Zγ* + YApβ* + ω |
Perform the hypothesis testings:
Model A vs. C | Model B vs. C |
H0: γ*=0 | H0: β*=0 |
H1: γ*≠0 | H1: β*≠0 |
H0 (Model A): f(Y|X,β)
H1 (Model B): g(Y|Z,γ)
Suppose there is a "true" (unknown) model defined by the likelihood function h(Y|W,δ). Which known model (A or B) is closer to the "true"?
Kullback-Leibler Information Criterion (KLIC) measures the distance between the "true" model and a hypothesized model (A or B) in terms of the likelihood function. That is,
KLIC0 = E(ln h(Y|W,δ)|h is true) - E(ln f(Y|X,β)|h is true) > 0
KLIC1 = E(ln h(Y|W,δ)|h is true) - E(ln g(Y|Z,γ)|h is true) > 0
The hypothesis testing is
H0 (Model A is better): KLIC0 < KLIC1
H1 (Model B is better): KLIC1 < KLIC0
Vuong Test
Consider a sample of N observations, i=1,2,...,N, denote lli,0 = ln f(Yi|Xi,β) and lli,1 = ln g(Yi|Zi,γ). Define the likelihood ratio lri = lli,0-lli,1. Then the statistic
KLIC1-KLIC0 = E(ln f(Y|X,β)|h is true) - E(ln g(Y|Z,γ)|h is true)
is estimated by 1/N∑i=1,2,...,N (lli,0-lli,1) = 1/N∑i=1,2,...,Nlri.
Vuong (1989) showed that
V = √N(lrm/lrse) ~a Normal(0,1) if Model A and B are equivalent.
where lrm = 1/N∑i=1,2,...,Nlri and lrse = [1/N∑i=1,2,...,N(lri-lrm)2]½ are the sample mean and standard error of lri, respectively.
Therefore, for 95% significance, if V > 1.96, then Model A is a better model. Otherwise, if V < -1.96, then Model B is better.