E[m(X,θ)] = 0.
We can think of m(X,θ) = ∂ll(θ|X)/∂θ and E(∂ll(θ|X)/∂θ) = 0, where ll(θ|X) is the log-likelihood function with unknown parameters θ introduced earlier. Moments are used to describe the characteristics of a distribution, and much of statistical estimation focuses on the problem of moment functions (or, orthogonality conditions). Further, a function of moments is also a moment. A more general class of estimators based on moment functions has shown to exhibit desirable asymptotic or large sample properties. By allowing the moment functions to depend on unknown parameters, Generalized Method of Moments (GMM) estimators has caught the attention of empirical research in recent years. Denote the N by L matrix of the parameterized moment functions as:
m(X,θ) = [mj(Xi,θ),i=1,2,...,N,j=1,2,...,L]
Where Xi is a sample observation of the random variable and θ is the vector of unknown parameters. GMM estimatation is based on the sample mean of the moment functions:
m(θ) = 1/N ∑i=1,2,...,N m(Xi,θ)' = 0
If there are K parameters (i.e., θ = (θ1, θ2, ..., θK)'), we will need at least same number of moment equations in order to successfully estimate the parameters (i.e., m(θ) = (m1(θ), m2(θ), ..., mL(θ))', L ≥ K). The optimization problem is to minimize the quadratic function:
Q(θ) = m(θ)'W m(θ)
Where W is a positive definite weighting matrix. Optimally, it is chosen to be the inverse of estimated consistent covariance matrix of m(θ). That is, W = W(θ) = [Var(m(θ))]-1 and
Var(m(θ)) = 1/N2 ∑∑i,j=1,2,...,N m(Xi,θ)'m(Xj,θ)
To ensure the positive definitess of the matrix, assumptions on the structure of autocovariances may be necessary. For example,
Var(m(θ)) = | S0(θ) + ∑j=1,2,...,p(1 - j/(p+1)) (Sj(θ) + Sj(θ)') |
S0(θ) = m(θ)m(θ)' = 1/N2 ∑i=1,2,...,N m(Xi,θ)'m(Xi,θ) | |
Sj = m(θ)m-j(θ)' = 1/N2 ∑i=j+1,...,N m(Xi,θ)'m(Xi-j,θ) | |
j = 1, ..., p < N. |
Where p is the lags of autocovariance assumed in the model. This is the White-Newey-West estimator of Var(m(θ)) which guarantees positive definiteness by down-weighting higher-order autocovariances. The lag weights used are p/(p+1), (p-1)/(p+1), ..., 1/(p+1) for a given p.
Typically, GMM estimation starts with the special case of W = I (the identity matrix). In other words, we find the consistent estimator θ0 of θ that minimizes the quadratic function: Q(θ) = m(θ)'m(θ), with the associated asymptotic covariance matrix:
Var(θ0) = [G(θ0)'G(θ0)]-1G(θ0)'[Var(m(θ0))] G(θ0) [G(θ0)'G(θ0)]-1
Where G(θ0) = ∂m(θ0)/∂θ is L by K matrix of derivatives. With the initial estimates θ0, let W = W(θ0) = [Var(m(θ0))]-1 and then minimize the quadratic function:
Q(θ) = m(θ)'W m(θ)
The asymptotic covariance matrix for the resulting GMM estimator θ1 of θ is:
Var(θ1) = [G(θ1)'W G(θ1)]-1G(θ1)'W [Var(m(θ1))] WG(θ1) [G(θ1)'W G(θ1)]-1
Updating the weighting matrix W = W(θ1) = [Var(m(θ1))]-1 and reiterating the optimization process until convergence, the final GMM estimator θ* of θ is obtained with the following asymptotic covariance matrix:
Var(θ*) = [G(θ*)'W(θ*)G(θ*)]-1
We note that the convergence of the above iterations is not necessary for a consistent GMM estimator of θ. Only for an asymptotic efficient estimator θ*, the convergent optimal weighting matrix W = W(θ*) must be used. That is, θ* is normally distributed with mean θ and covariance Var(θ*), asymptotically. The value of quadratic function Q at the optimal solution θ*:
Q* = Q(θ*) = m(θ*)'W(θ*)m(θ*)
serves as the basis for hypothsis testing of moment restrictions. If there are L moment equations with K parameters (L > K), Q* follows a Chi-squares distribution with L-K degrees of freedom.
m1(λ,ρ) = 1/N ∑i=1,2,...,N
Xi - ρ/λ
m2(λ,ρ) = 1/N ∑i=1,2,...,N
Xi2 - ρ(ρ+1)/λ2
m3(λ,ρ) = 1/N ∑i=1,2,...,N
ln(Xi) - dlnΓ(ρ)/dρ + ln(λ)
m4(λ,ρ) = 1/N ∑i=1,2,...,N
1/Xi - λ/(ρ-1)
GMM estimator of θ = (λ,ρ) is obtained from minimizing the weighted sum of squares (or distance):
Q(θ) = m(θ)'W m(θ)
Where m(θ) = (m1(θ), m2(θ), m3(θ), m4(θ))' and W is a positive definite symmetric matrix. Conditional to the weighting scheme W, the estimate of variance-covariance matrix of θ is
Var(θ) = [G(θ)'W G(θ)]-1 G(θ)'W [Var(m(θ))] WG(θ) [G(θ)'W G(θ)]-1
If W = [Var(m(θ))]-1, the inverse of covariance matrix of m(θ), then Var(θ) = [G(θ)'W G(θ)]-1.
GMM includes maximum likelihood estimator as a special case. Based on the gamma distribution:
ll(θ|X) = N [ρln(λ) - lnΓ(ρ)] - λ ∑i=1,2,...,N Xi + (ρ-1) ∑i=1,2,...,N ln(Xi)
Solving from the scores of the above log-likelihood function:
∂ll/∂λ =
Nρ/λ - ∑i=1,2,...,N Xi = 0
∂ll/∂ρ =
Nln(λ) - N dlnΓ(ρ)/dρ +
∑i=1,2,...,N ln(Xi) = 0
It is clear that the maximum likelihood estimate of θ = (λ,ρ) is an exactly identified GMM with m(θ) = (m1(θ), m3(θ)). The weighting matrix W is irrelevant for the exactly identified case, thus the criterion for GMM estimation is exactly zero.
For notational convenience, let X be a combined data matrix of endogenous (dependent) and predetermined (independent or explanatory) variables in the model. β is a K-element vector of unknown parameters. Suppose there are L moment equations, m(X,β) = (m1(X,β), ..., mL(X,β)), where L ≥ K. The model formulation does not limit to the case of single equation. Generalization to consider a system of linear or nonlinear equations is straightforward.
m(β) = 1/N ∑i=1,2,...,N m(Xi,β)' = 0
Assuming p-th order autocovariances, the well-known White-Newey-West estimator of covariance matrix of sample moments is
Var(m(β)) = | S0 + ∑j=1,2,...,p(1 - j/(p+1))(Sj + Sj') |
S0 = m(β)m(β)' = 1/N2 ∑i=1,2...,N m(Xi,β)'mi(X,β) | |
Sj = m(β)m-j(β)' = 1/N2 ∑i=j+1,2,...,N m(Xi,β)'m(Xi-j,β) | |
j = 1,..., p < N. |
Q(β) = m(β)'W m(β)
Optimally, W is chosen to be the inverse of the consistent estimator of asymptotic covariance matrix of m(β). That is, W = W(β) = [Var(m(β))]-1.
Although computationally challenging, it is possible to directly minimize:
Q(β) = m(β)'[Var(m(β))]-1m(β)
Var(β*) = [G(β*)'[Var(m(β*))]-1G(β*)]-1
The asymptotic efficient estimator β* is normally distributed with mean β and covariance matrix Var(β*).
W | = c(β*)'[Var(c(β*)]-1c(β*) |
= c(β*)' {(∂c(β*)/∂β) [Var(β*)] (∂c(β*)/∂β)'}-1 c(β*) |
where G(b*) = ∂m(b*)/∂β.
LM | = α[Var(α)]-1α' |
= m(b*)'W G(b*)[G(b*)'W G(b*)]-1G(b*)'W m(b*) |
LR = Q(b*) - Q(β*)
Both β* and b* are computed from using the same consistent estimator of weighting matrix W.
∑τ=0,...,∞ βτ E{u(Ct+τ) | Zt}
Where Zt is the information available to the consumer at time t. 0 < β < 1 is the discount factor of time preference. Given N different stocks, the optimal consumption-investment plan is
u'(Ct) = β E{u'(Ct+1) [(Pi,t+1+Di,t+1)/Pt] | Zt}, for i = 1,...,N.
Where u'(Ct) = ∂u/∂Ct is the marginal utility of consumption. Pi,t+1 is the price of stock i at time t+1 and Di,t+1 is the dividend per share of stock i at t+1. The ratio (Pi,t+1+Di,t+1)/Pi,t represents the returns of investment in stock i between periods t and t+1. Assume that the utility function is a form of constant relative risk aversion:
u(Ct) = Ctα/α for α<1.
Then, for each i = 1, ..., N, the decision-rule is
Ctα-1 = β E{Ct+1α-1 [(Pi,t+1+Di,t+1)/Pt] | Zt}
Equivalently, for each stock i = 1, ..., N, we must have
βE{[(Ct+1/Ct)α-1] [(Pi,t+1+Di,t+1)/Pt] | Zt} = 1
The hypothesis of rational expectation assumes that the intertemporal decision-making should be independent from the historical information available at the time of making the decision. Therefore, the derived orthogonality condition for each i = 1, ..., N is:
E{Zt(β[(Ct+1/Ct)α-1] [(Pi,t+1+Di,t+1)/Pt] -1)} = 0
For more detailed description of the model, see L. P. Hansen and K. J. Singleton (1982). For computational implementation of the model, see Hansen-Heaton-Ogaki GMM package
X = [X1,X2,X3], θ = (β,α), and
ε(X,θ) = | βX1α-1X2 - 1 |
βX1α-1X3 - 1 |
The data file GMMQ.TXT (from 1/59 to 12/78, not the original Hansen-Singleton data) consists of three variables: