E(ε|X) = 0
That is, E(εi|X) = 0, i=1,2,...,N. This implies:
There are many occasions such as omitted variables or errors in the explanatory variables, which result a clear violation of Assumption 3 for the classical linear regression model. The consequence is that the least squares estimator is biased, inconsistent and inefficient.
The estimation consists of two steps as follows:
b | = (Xp'Xp)-1Xp'Y |
= [X'Z(Z'Z)-1Z'X]-1 X'Z(Z'Z)-1Z'Y |
It is clear that the selection of instrumental variables is crucial for a successful estimation of the model parameters. In practice, in addition to the replacement for endogenous explanatory variables, the instrumental variables include the exogenous explanatory variables already in the model. Therefore, the instrumental variable estimation (IV) is summarized as:
Define W = Z(Z'Z)-1Z'X, and note that
W'X = W'W.
b = (W'X)-1W'Y =
[X'Z(Z'Z)-1Z'X]-1
X'Z(Z'Z)-1Z'Y
Var(b) = s2(W'X)-1 =
s2[X'Z(Z'Z)-1Z'X]-1
where s2 = e'e/(N-K) and e = Y - Xb.
Y = Xβ +ε, ε|Z ~ iid(0,Ω), Ω ≠ σ2I
b = (Xp'X)-1Xp'Y =
[X'Z(Z'Z)-1Z'X]-1X'Z(Z'Z)-1Z'Y
Var(b) = (Xp'X)-1(Xp'ΩXp)(Xp'X)-1
where Xp'ΩXp =
X'Z(Z'Z)-1(Z'ΩZ)(Z'Z)-1Z'X
The estimation of Var(b) depends on the estimation of the consistent estimator of Z'ΩZ = Z'E(εε')Z. A robust estimate of the variance-covariance matrix can be based on Newey-West estimator allowing general heteroscedasticity and autocorrelation. That is,
Σ = Z'ΩZ = S0 + ∑j=1,...,J[1-j/(J+1)](Sj+Sj')
where
S0 = (1/N)∑i=1,...,Nei2zizi'
Sj = (1/N)∑i=j+1,...,Neiei-jzizi-j'
Note: ei = Yi- Xib, i=1,...,N.
E(ε|Z) = 0
This implies the moment functions E(Ziεi) = E(Zi(Yi-Xiβ)) = 0. GMM estimator of β is obtained to minimize the objective function which is the weighted quadratic form of the moment functions:
Q(β) = (Z'ε/N)'W(Z'ε/N)
where W is the weighted matrix which is symmetric positive definite. We have,
b = [X'ZWZ'X]-1X'ZWZ'Y
Var(b) = [X'ZWZ'X]-1[X'Z(WΣW')Z'X][X'ZWZ'X]-1
Σ = E(Z'εε'Z) = Z'ΩZ = = S0 + ∑j=1,...,J[1-j/(J+1)](Sj+Sj')
as defined above, allowing for general heteroscedasticity and autocorrelation up to the J-th order.
If W = Σ-1 (optimal weighted matrix), then it is the optimal or efficient GMM estimator:
b = [X'ZΣ-1Z'X]-1X'ZΣ-1Z'Y
Var(b) = [X'ZΣ-1Z'X]-1
If W = (Z'Z)-1, then it is IV estimator.
If W = I, then it is Minimum Distance (MD) estimator.
Does a specific regressor endogenous? Given the null hypothesis that one (or more) of X is exogenous (therefore we do not need instrumental variables), Durbin-Wu-Hausman (DWH) test is formulated as follows:
It is clear that the more Z is correlated with X, the more precise the instrumental variable estimator. Although the instrumental variable estimator is consistent but it is not unbiased. The extent of biasedness depends on the quality or validity of the instrumental variables used to remove endogeneity in the explanatory variables. Of course, the more good instrumental variables Z's, the better of the model estimates. But not all are valid instruments.
How many of good instruments will be enough? We need to know that the extra or excluded instruments (over the exogenous X's) will not violate the assumption E(ε|Z) = 0, the null hypothesis. Therefore, a simple Hausman test for over-identification is performed as follows:
Many variations of over-identification tests are available in the literature depending on the estimation methods (LIML, GMM, etc.) and consideration of robustness of the estimators.
The use of instrumental variables must be justified by first checking the correlations of instruments with endogenous variables. A good fit of the first stage of 2SLS is required. In addition, a reasonable partial R-square of endogenous variables (X2, after controlling the effects of exogenous variables X1) is expected.
More formal test for weak instruments can be found in J. H. Stock and M. Yogo, "Testing for Weak Instruments in Linear IV Regression," in Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. D. W. K. Andrews and J. H. Stock, 80-108, Cambridge University Press, 2005.
In brief, under homoscedasticity assumption, their test evaluates the bias of IV against that of OLS and provides a measure of size distortion for Wald test (for the zeros parameters of endogenous variables) at 5% level of significance.