Generalized Method of Moments

Table of Contents

Readings and References:


Nonlinear Generalized Method of Moments (GMM)

The alternative to the maximum likelihood estimation of a probability distribution for a random variable is to formulate and estimate the moment functions. A moment function is defined as the expectation of some continuous vector-valued function m of a random variable X with a parameter vector θ:

E[m(X,θ)] = 0.

We can think of m(X,θ) = ∂ll(θ|X)/∂θ and E(∂ll(θ|X)/∂θ) = 0, where ll(θ|X) is the log-likelihood function with unknown parameters θ introduced earlier. Moments are used to describe the characteristics of a distribution, and much of statistical estimation focuses on the problem of moment functions (or, orthogonality conditions). Further, a function of moments is also a moment. A more general class of estimators based on moment functions has shown to exhibit desirable asymptotic or large sample properties. By allowing the moment functions to depend on unknown parameters, Generalized Method of Moments (GMM) estimators has caught the attention of empirical research in recent years. Denote the N by L matrix of the parameterized moment functions as:

m(X,θ) = [mj(Xi,θ),i=1,2,...,N,j=1,2,...,L]

Where Xi is a sample observation of the random variable and θ is the vector of unknown parameters. GMM estimatation is based on the sample mean of the moment functions:

m(θ) = 1/N ∑i=1,2,...,N m(Xi,θ)' = 0

If there are K parameters (i.e., θ = (θ1, θ2, ..., θK)'), we will need at least same number of moment equations in order to successfully estimate the parameters (i.e., m(θ) = (m1(θ), m2(θ), ..., mL(θ))', L ≥ K). The optimization problem is to minimize the quadratic function:

Q(θ) = m(θ)'W m(θ)

Where W is a positive definite weighting matrix. Optimally, it is chosen to be the inverse of estimated consistent covariance matrix of m(θ). That is, W = W(θ) = [Var(m(θ))]-1 and

Var(m(θ)) = 1/N2 ∑∑i,j=1,2,...,N m(Xi,θ)'m(Xj,θ)

To ensure the positive definitess of the matrix, assumptions on the structure of autocovariances may be necessary. For example,

Var(m(θ)) = S0(θ) + ∑j=1,2,...,p(1 - j/(p+1)) (Sj(θ) + Sj(θ)')
S0(θ) = m(θ)m(θ)' = 1/N2i=1,2,...,N m(Xi,θ)'m(Xi,θ)
Sj = m(θ)m-j(θ)' = 1/N2i=j+1,...,N m(Xi,θ)'m(Xi-j,θ)
j = 1, ..., p < N.

Where p is the lags of autocovariance assumed in the model. This is the White-Newey-West estimator of Var(m(θ)) which guarantees positive definiteness by down-weighting higher-order autocovariances. The lag weights used are p/(p+1), (p-1)/(p+1), ..., 1/(p+1) for a given p.

Typically, GMM estimation starts with the special case of W = I (the identity matrix). In other words, we find the consistent estimator θ0 of θ that minimizes the quadratic function: Q(θ) = m(θ)'m(θ), with the associated asymptotic covariance matrix:

Var(θ0) = [G(θ0)'G(θ0)]-1G(θ0)'[Var(m(θ0))] G(θ0) [G(θ0)'G(θ0)]-1

Where G(θ0) = ∂m(θ0)/∂θ is L by K matrix of derivatives. With the initial estimates θ0, let W = W(θ0) = [Var(m(θ0))]-1 and then minimize the quadratic function:

Q(θ) = m(θ)'W m(θ)

The asymptotic covariance matrix for the resulting GMM estimator θ1 of θ is:

Var(θ1) = [G(θ1)'W G(θ1)]-1G(θ1)'W [Var(m(θ1))] WG(θ1) [G(θ1)'W G(θ1)]-1

Updating the weighting matrix W = W(θ1) = [Var(m(θ1))]-1 and reiterating the optimization process until convergence, the final GMM estimator θ* of θ is obtained with the following asymptotic covariance matrix:

Var(θ*) = [G(θ*)'W(θ*)G(θ*)]-1

We note that the convergence of the above iterations is not necessary for a consistent GMM estimator of θ. Only for an asymptotic efficient estimator θ*, the convergent optimal weighting matrix W = W(θ*) must be used. That is, θ* is normally distributed with mean θ and covariance Var(θ*), asymptotically. The value of quadratic function Q at the optimal solution θ*:

Q* = Q(θ*) = m(θ*)'W(θ*)m(θ*)

serves as the basis for hypothsis testing of moment restrictions. If there are L moment equations with K parameters (L > K), Q* follows a Chi-squares distribution with L-K degrees of freedom.

Example: Estimating Gamma Distribution

Returning to the example of estimating the gamma distribution of INCOME variable. Consider four sample moment functions of the gamma distribution function:

m1(λ,ρ) = 1/N ∑i=1,2,...,N Xi - ρ/λ
m2(λ,ρ) = 1/N ∑i=1,2,...,N Xi2 - ρ(ρ+1)/λ2
m3(λ,ρ) = 1/N ∑i=1,2,...,N ln(Xi) - dlnΓ(ρ)/dρ + ln(λ)
m4(λ,ρ) = 1/N ∑i=1,2,...,N 1/Xi - λ/(ρ-1)

GMM estimator of θ = (λ,ρ) is obtained from minimizing the weighted sum of squares (or distance):

Q(θ) = m(θ)'W m(θ)

Where m(θ) = (m1(θ), m2(θ), m3(θ), m4(θ))' and W is a positive definite symmetric matrix. Conditional to the weighting scheme W, the estimate of variance-covariance matrix of θ is

Var(θ) = [G(θ)'W G(θ)]-1 G(θ)'W [Var(m(θ))] WG(θ) [G(θ)'W G(θ)]-1

If W = [Var(m(θ))]-1, the inverse of covariance matrix of m(θ), then Var(θ) = [G(θ)'W G(θ)]-1.

GMM includes maximum likelihood estimator as a special case. Based on the gamma distribution:

ll(θ|X) = N [ρln(λ) - lnΓ(ρ)] - λ ∑i=1,2,...,N Xi + (ρ-1) ∑i=1,2,...,N ln(Xi)

Solving from the scores of the above log-likelihood function:

ll/∂λ = Nρ/λ - ∑i=1,2,...,N Xi = 0
ll/∂ρ = Nln(λ) - N dlnΓ(ρ)/dρ + ∑i=1,2,...,N ln(Xi) = 0

It is clear that the maximum likelihood estimate of θ = (λ,ρ) is an exactly identified GMM with m(θ) = (m1(θ), m3(θ)). The weighting matrix W is irrelevant for the exactly identified case, thus the criterion for GMM estimation is exactly zero.


GMM Estimation of Econometric Models

When we apply GMM estimation to econometric models, it can be considered as an extension of instrumental variables (IV) estimation method. IV estimation is widely used for models with random regressors (e.g. lagged dependent variable) and simultaneous equations which exhibit the correlation with model errors. The advantage of GMM is that the assumed model need not to be homoscedastic and serially independent. The estimated covariance matrix of the averages of sample moments has taken into account for heteroscedasticity and autocorrelation by minimizing the GMM criterion function.

For notational convenience, let X be a combined data matrix of endogenous (dependent) and predetermined (independent or explanatory) variables in the model. β is a K-element vector of unknown parameters. Suppose there are L moment equations, m(X,β) = (m1(X,β), ..., mL(X,β)), where L ≥ K. The model formulation does not limit to the case of single equation. Generalization to consider a system of linear or nonlinear equations is straightforward.

Sample Moment Conditions

Corresponding to the moment conditions E(m(X,β)) = 0, we write the sample moment equations as follows:

m(β) = 1/N ∑i=1,2,...,N m(Xi,β)' = 0

Assuming p-th order autocovariances, the well-known White-Newey-West estimator of covariance matrix of sample moments is

Var(m(β)) = S0 + ∑j=1,2,...,p(1 - j/(p+1))(Sj + Sj')
S0 = m(β)m(β)' = 1/N2i=1,2...,N m(Xi,β)'mi(X,β)
Sj = m(β)m-j(β)' = 1/N2i=j+1,2,...,N m(Xi,β)'m(Xi-j,β)
j = 1,..., p < N.

GMM Criterion Function

Given a positive definite symmetric weighting matrix W, the goal is to minimize the quadratic function:

Q(β) = m(β)'W m(β)

Optimally, W is chosen to be the inverse of the consistent estimator of asymptotic covariance matrix of m(β). That is, W = W(β) = [Var(m(β))]-1.

Although computationally challenging, it is possible to directly minimize:

Q(β) = m(β)'[Var(m(β))]-1m(β)

GMM Estimation

The GMM estimator β* of β is obtained from solving the zero gradient conditions: ∂Q(β*)/∂β = 0. Let G(β*) = ∂m(β*)/∂β which is L by K matrix of derivatives. The estimated variance-covariance matrix of β* is

Var(β*) = [G(β*)'[Var(m(β*))]-1G(β*)]-1

The asymptotic efficient estimator β* is normally distributed with mean β and covariance matrix Var(β*).

Nonlinear IV Estimation

Consider the model ε = ε(β) = F(Y,X,β) (or Y- f(X,β)), where Y is the endogenous or dependent variable, and X consists of predetermined or independent variables. β is a K-element parameters vector. Suppose there is a set of L instrumental variables Z, for which we assume L ≥ K. The model is E(Z'ε) = 0 under the general assumption that E(ε) = 0 and Var(ε) = E(εε') = Σ = σ2Ω.

Linear IV Estimation

If the model is linear, or ε = ε(β) = Y - Xβ, then the GMM estimator of β is equivalent to the IV estimator:

Special Case

If the model is homoscedastic and serially uncorrelated, that is Σ = σ2I, then If Z is of the same dimension as X, then β* = (Z'X)-1Z'Y.

Hypothesis Testing

Based on the statistical inference for nonlinear regression models, there are three corresponding test statistics for testing GMM estimator of β under the J constraint equations expressed as c(β) = 0. Let β* be the unconstrained GMM estimator of β, and b* is the constrained estimator. All three tests statistics have a Chi-square distribution with J degrees of freedom.

Wald Test

W = c(β*)'[Var(c(β*)]-1c(β*)
= c(β*)' {(∂c(β*)/∂β) [Var(β*)] (∂c(β*)/∂β)'}-1 c(β*)

Langrangian Multiplier (LM) Test

If the constraints hold, then α = ∂Q(b*)/∂β = 2 m(b*)'W G(b*) -> 0

where G(b*) = ∂m(b*)/∂β.

LM = α[Var(α)]-1α'
= m(b*)'W G(b*)[G(b*)'W G(b*)]-1G(b*)'W m(b*)

Likelihood Ratio (LR) Test

LR = Q(b*) - Q(β*)

Both β* and b* are computed from using the same consistent estimator of weighting matrix W.


Application: A Nonlinear Rational Expectation Model

An important application of the GMM model estimation is to estimate the first-order conditions (or Euler equations) of a dynamic optimization problem. Suppose a representative consumer (stockholder) is tried to maximize a concave utility function over consumption each period:

τ=0,...,∞ βτ E{u(Ct+τ) | Zt}

Where Zt is the information available to the consumer at time t. 0 < β < 1 is the discount factor of time preference. Given N different stocks, the optimal consumption-investment plan is

u'(Ct) = β E{u'(Ct+1) [(Pi,t+1+Di,t+1)/Pt] | Zt}, for i = 1,...,N.

Where u'(Ct) = ∂u/∂Ct is the marginal utility of consumption. Pi,t+1 is the price of stock i at time t+1 and Di,t+1 is the dividend per share of stock i at t+1. The ratio (Pi,t+1+Di,t+1)/Pi,t represents the returns of investment in stock i between periods t and t+1. Assume that the utility function is a form of constant relative risk aversion:

u(Ct) = Ctα/α     for α<1.

Then, for each i = 1, ..., N, the decision-rule is

Ctα-1 = β E{Ct+1α-1 [(Pi,t+1+Di,t+1)/Pt] | Zt}

Equivalently, for each stock i = 1, ..., N, we must have

βE{[(Ct+1/Ct)α-1] [(Pi,t+1+Di,t+1)/Pt] | Zt} = 1

The hypothesis of rational expectation assumes that the intertemporal decision-making should be independent from the historical information available at the time of making the decision. Therefore, the derived orthogonality condition for each i = 1, ..., N is:

E{Zt(β[(Ct+1/Ct)α-1] [(Pi,t+1+Di,t+1)/Pt] -1)} = 0

For more detailed description of the model, see L. P. Hansen and K. J. Singleton (1982). For computational implementation of the model, see Hansen-Heaton-Ogaki GMM package

The Model

E(Zε(X,θ)) = 0, where

X = [X1,X2,X3], θ = (β,α), and
ε(X,θ) = βX1α-1X2 - 1
βX1α-1X3 - 1
The instrumental variables Z consist of the lags of X and a constant. We note that this is a system of two nonlinear equations.

The data file GMMQ.TXT (from 1/59 to 12/78, not the original Hansen-Singleton data) consists of three variables:

  1. X1: Ratio of two-period consumption, Ct+1/Ct
  2. X2: Value-weighted returns of NYSE stock market, (Pt+1+Dt+1)/Pt where Pt+1 is the price and Dt+1 is the dividend payoff of stock at t+1.
  3. X3: Free-risk rate of returns (T-Bill rate)


Copyright © Kuan-Pin Lin
Last updated: 11/12/2012