Topic 5

Generalized Method of Moments

Nonlinear Generalized Method of Moments (GMM)
- Example: Estimating Gamma Distribution
GMM Estimation for Econometric Models
Application: A Nonlinear Rational Expectation Model

Readings and References:

W. H. Greene, Econometric Analysis, 6th Ed., Chapter 15: Minimum Distance Estimation and the Generalized Method of Moments, Prentice-Hall, 2008.
J. Y. Campbell, A. W. Lo, and A. C. MacKinlay, The Econometrics of Financial Markets, Chapter 12: Nonlinearities in Financial Data, Princeton University Press, 1997, 467-526.
A. Hall, "Some Aspects of Generalized Method of Moments Estimation," Handbook of Statistics, Vol. 11, ed. by G. S. Maddala, C. R. Rao, and H. D. Vinod, Elsevier Science Publishers, North-Holland, 1993, 393-417.
L. P. Hansen and K. J. Singleton, "Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models," Econometrica 50, 1982, 1269-1286 (Paper).
M. Ogaki, "Generalized Method of Moments: Econometric Applications," Handbook of Statistics, Vol. 11, ed. by G. S. Maddala, C. R. Rao, and H. D. Vinod, Elsevier Science Publishers, North-Holland, 1993, 455-488.

Nonlinear Generalized Method of Moments (GMM)

The alternative to the maximum likelihood estimation of a probability distribution for a random variable is to formulate and estimate the moment functions. A moment function is defined as the expectation of some continuous vector-valued function m of a random variable X with a parameter vector θ:

E[m(X,θ)] = 0.

We can think of m(X,θ) = ∂ll(θ|X)/∂θ and E(∂ll(θ|X)/∂θ) = 0, where ll(θ|X) is the log-likelihood function with unknown parameters θ introduced earlier. Moments are used to describe the characteristics of a distribution, and much of statistical estimation focuses on the problem of moment functions (or, orthogonality conditions). Further, a function of moments is also a moment. A more general class of estimators based on moment functions has shown to exhibit desirable asymptotic or large sample properties. By allowing the moment functions to depend on unknown parameters, Generalized Method of Moments (GMM) estimators has caught the attention of empirical research in recent years. Denote the N by L matrix of the parameterized moment functions as:

m(X,θ) = [m_j(X_i,θ),i=1,2,...,N,j=1,2,...,L]

Where X_i is a sample observation of the random variable and θ is the vector of unknown parameters. GMM estimatation is based on the sample mean of the moment functions:

m(θ) = 1/N ∑_i=1,2,...,N m(X_i,θ)' = 0

If there are K parameters (i.e., θ = (θ₁, θ₂, ..., θ_K)'), we will need at least same number of moment equations in order to successfully estimate the parameters (i.e., m(θ) = (m₁(θ), m₂(θ), ..., m_L(θ))', L ≥ K). The optimization problem is to minimize the quadratic function:

Q(θ) = m(θ)'W m(θ)

Where W is a positive definite weighting matrix. Optimally, it is chosen to be the inverse of estimated consistent covariance matrix of m(θ). That is, W = W(θ) = [Var(m(θ))]^-1 and

Var(m(θ)) = 1/N² ∑∑_{i,j=1,2,...,N} m(X_i,θ)'m(X_j,θ)

To ensure the positive definitess of the matrix, assumptions on the structure of autocovariances may be necessary. For example,

Var(m(θ)) = S₀(θ) + ∑_j=1,2,...,p(1 - j/(p+1)) (S_j(θ) + S_j(θ)')

S₀(θ) = m(θ)m(θ)' = 1/N² ∑_i=1,2,...,N m(X_i,θ)'m(X_i,θ)

S_j = m(θ)m_-j(θ)' = 1/N² ∑_i=j+1,...,N m(X_i,θ)'m(X_i-j,θ)

j = 1, ..., p < N.

Where p is the lags of autocovariance assumed in the model. This is the White-Newey-West estimator of Var(m(θ)) which guarantees positive definiteness by down-weighting higher-order autocovariances. The lag weights used are p/(p+1), (p-1)/(p+1), ..., 1/(p+1) for a given p.

Typically, GMM estimation starts with the special case of W = I (the identity matrix). In other words, we find the consistent estimator θ⁰ of θ that minimizes the quadratic function: Q(θ) = m(θ)'m(θ), with the associated asymptotic covariance matrix:

Var(θ⁰) = [G(θ⁰)'G(θ⁰)]^-1G(θ⁰)'[Var(m(θ⁰))] G(θ⁰) [G(θ⁰)'G(θ⁰)]^-1

Where G(θ⁰) = ∂m(θ⁰)/∂θ is L by K matrix of derivatives. With the initial estimates θ⁰, let W = W(θ⁰) = [Var(m(θ⁰))]^-1 and then minimize the quadratic function:

Q(θ) = m(θ)'W m(θ)

The asymptotic covariance matrix for the resulting GMM estimator θ¹ of θ is:

Var(θ¹) = [G(θ¹)'W G(θ¹)]^-1G(θ¹)'W [Var(m(θ¹))] WG(θ¹) [G(θ¹)'W G(θ¹)]^-1

Updating the weighting matrix W = W(θ¹) = [Var(m(θ¹))]^-1 and reiterating the optimization process until convergence, the final GMM estimator θ^* of θ is obtained with the following asymptotic covariance matrix:

Var(θ^*) = [G(θ^*)'W(θ^*)G(θ^*)]^-1

We note that the convergence of the above iterations is not necessary for a consistent GMM estimator of θ. Only for an asymptotic efficient estimator θ^*, the convergent optimal weighting matrix W = W(θ^*) must be used. That is, θ^* is normally distributed with mean θ and covariance Var(θ^*), asymptotically. The value of quadratic function Q at the optimal solution θ^*:

Q^* = Q(θ^*) = m(θ^*)'W(θ^*)m(θ^*)

serves as the basis for hypothsis testing of moment restrictions. If there are L moment equations with K parameters (L > K), Q^* follows a Chi-squares distribution with L-K degrees of freedom.

Example: Estimating Gamma Distribution

Returning to the example of estimating the gamma distribution of INCOME variable. Consider four sample moment functions of the gamma distribution function:

m₁(λ,ρ) = 1/N ∑_i=1,2,...,N X_i - ρ/λ
m₂(λ,ρ) = 1/N ∑_i=1,2,...,N X_i² - ρ(ρ+1)/λ²
m₃(λ,ρ) = 1/N ∑_i=1,2,...,N ln(X_i) - dlnΓ(ρ)/dρ + ln(λ)
m₄(λ,ρ) = 1/N ∑_i=1,2,...,N 1/X_i - λ/(ρ-1)

GMM estimator of θ = (λ,ρ) is obtained from minimizing the weighted sum of squares (or distance):

Q(θ) = m(θ)'W m(θ)

Where m(θ) = (m₁(θ), m₂(θ), m₃(θ), m₄(θ))' and W is a positive definite symmetric matrix. Conditional to the weighting scheme W, the estimate of variance-covariance matrix of θ is

Var(θ) = [G(θ)'W G(θ)]^-1 G(θ)'W [Var(m(θ))] WG(θ) [G(θ)'W G(θ)]^-1

If W = [Var(m(θ))]^-1, the inverse of covariance matrix of m(θ), then Var(θ) = [G(θ)'W G(θ)]^-1.

GMM includes maximum likelihood estimator as a special case. Based on the gamma distribution:

ll(θ|X) = N [ρln(λ) - lnΓ(ρ)] - λ ∑_i=1,2,...,N X_i + (ρ-1) ∑_i=1,2,...,N ln(X_i)

Solving from the scores of the above log-likelihood function:

∂ll/∂λ = Nρ/λ - ∑_i=1,2,...,N X_i = 0
∂ll/∂ρ = Nln(λ) - N dlnΓ(ρ)/dρ + ∑_i=1,2,...,N ln(X_i) = 0

It is clear that the maximum likelihood estimate of θ = (λ,ρ) is an exactly identified GMM with m(θ) = (m₁(θ), m₃(θ)). The weighting matrix W is irrelevant for the exactly identified case, thus the criterion for GMM estimation is exactly zero.

GMM Estimation of Econometric Models

When we apply GMM estimation to econometric models, it can be considered as an extension of instrumental variables (IV) estimation method. IV estimation is widely used for models with random regressors (e.g. lagged dependent variable) and simultaneous equations which exhibit the correlation with model errors. The advantage of GMM is that the assumed model need not to be homoscedastic and serially independent. The estimated covariance matrix of the averages of sample moments has taken into account for heteroscedasticity and autocorrelation by minimizing the GMM criterion function.

For notational convenience, let X be a combined data matrix of endogenous (dependent) and predetermined (independent or explanatory) variables in the model. β is a K-element vector of unknown parameters. Suppose there are L moment equations, m(X,β) = (m₁(X,β), ..., m_L(X,β)), where L ≥ K. The model formulation does not limit to the case of single equation. Generalization to consider a system of linear or nonlinear equations is straightforward.

Sample Moment Conditions

Corresponding to the moment conditions E(m(X,β)) = 0, we write the sample moment equations as follows:

m(β) = 1/N ∑_i=1,2,...,N m(X_i,β)' = 0

Assuming p-th order autocovariances, the well-known White-Newey-West estimator of covariance matrix of sample moments is

Var(m(β)) = S₀ + ∑_j=1,2,...,p(1 - j/(p+1))(S_j + S_j')

S₀ = m(β)m(β)' = 1/N² ∑_i=1,2...,N m(X_i,β)'m_i(X,β)

S_j = m(β)m_-j(β)' = 1/N² ∑_{i=j+1,2,...,N} m(X_i,β)'m(X_i-j,β)

j = 1,..., p < N.

GMM Criterion Function

Given a positive definite symmetric weighting matrix W, the goal is to minimize the quadratic function:

Q(β) = m(β)'W m(β)

Optimally, W is chosen to be the inverse of the consistent estimator of asymptotic covariance matrix of m(β). That is, W = W(β) = [Var(m(β))]^-1.

Although computationally challenging, it is possible to directly minimize:

Q(β) = m(β)'[Var(m(β))]^-1m(β)

GMM Estimation

The GMM estimator β^* of β is obtained from solving the zero gradient conditions: ∂Q(β^*)/∂β = 0. Let G(β^*) = ∂m(β^*)/∂β which is L by K matrix of derivatives. The estimated variance-covariance matrix of β^* is

Var(β^*) = [G(β^*)'[Var(m(β^*))]^-1G(β^*)]^-1

The asymptotic efficient estimator β^* is normally distributed with mean β and covariance matrix Var(β^*).

Nonlinear IV Estimation

Consider the model ε = ε(β) = F(Y,X,β) (or Y- f(X,β)), where Y is the endogenous or dependent variable, and X consists of predetermined or independent variables. β is a K-element parameters vector. Suppose there is a set of L instrumental variables Z, for which we assume L ≥ K. The model is E(Z'ε) = 0 under the general assumption that E(ε) = 0 and Var(ε) = E(εε') = Σ = σ²Ω.

m(β) = 1/N Z'ε(β)
Var(m(β)) = 1/N² Z'ε(β)ε(β)'Z = 1/N² Z'Σ(β)Z
Q(β) = ε(β)'Z [Z'Σ(β)Z]^-1Z'ε(β)
β^* minimizes Q(β), and β^* is asymptotically normally distributed with mean β and covariance matrix Var(β^*) = (∂ε(β^*)/∂β)'Z [Z'Σ(β^*)Z]^-1Z'(∂ε(β^*)/∂β)
Σ(β^*) is assumed to be the White-Newey-West estimator of Σ(β).

Linear IV Estimation

If the model is linear, or ε = ε(β) = Y - Xβ, then the GMM estimator of β is equivalent to the IV estimator:

β^* = (X'Z[Z'Σ(β^*)Z]^-1Z'X)^-1 X'Z[Z'Σ(β^*)Z]^-1Z'Y
Var(β^*) = X'Z[Z'Σ(β^*)Z]^-1Z'X

Special Case

If the model is homoscedastic and serially uncorrelated, that is Σ = σ²I, then

β^* = (X'Z[Z'Z]^-1Z'X)^-1X'Z[Z'Z]^-1Z'Y
Var(β^*) = σ²(β^*) X'Z[Z'Z]^-1Z'X
σ²(β^*) = 1/N ε(β^*)'ε(β^*)

If Z is of the same dimension as X, then β^* = (Z'X)^-1Z'Y.

Hypothesis Testing

Based on the statistical inference for nonlinear regression models, there are three corresponding test statistics for testing GMM estimator of β under the J constraint equations expressed as c(β) = 0. Let β^* be the unconstrained GMM estimator of β, and b^* is the constrained estimator. All three tests statistics have a Chi-square distribution with J degrees of freedom.

Wald Test

W	= c(β^)'[Var(c(β^)]^-1c(β^*)
	= c(β^)' {(∂c(β^)/∂β) [Var(β^)] (∂c(β^)/∂β)'}^-1 c(β^*)

Langrangian Multiplier (LM) Test

If the constraints hold, then α = ∂Q(b^*)/∂β = 2 m(b^*)'W G(b^*) -> 0

where G(b^*) = ∂m(b^*)/∂β.

LM = α[Var(α)]^-1α'

= m(b^*)'W G(b^*)[G(b^*)'W G(b^*)]^-1G(b^*)'W m(b^*)

Likelihood Ratio (LR) Test

LR = Q(b^*) - Q(β^*)

Both β^* and b^* are computed from using the same consistent estimator of weighting matrix W.

Application: A Nonlinear Rational Expectation Model

An important application of the GMM model estimation is to estimate the first-order conditions (or Euler equations) of a dynamic optimization problem. Suppose a representative consumer (stockholder) is tried to maximize a concave utility function over consumption each period:

∑_τ=0,...,∞ β^τ E{u(C_t+τ) | Z_t}

Where Z_t is the information available to the consumer at time t. 0 < β < 1 is the discount factor of time preference. Given N different stocks, the optimal consumption-investment plan is

u'(C_t) = β E{u'(C_t+1) [(P_i,t+1+D_i,t+1)/P_t] | Z_t}, for i = 1,...,N.

Where u'(C_t) = ∂u/∂C_t is the marginal utility of consumption. P_i,t+1 is the price of stock i at time t+1 and D_i,t+1 is the dividend per share of stock i at t+1. The ratio (P_i,t+1+D_i,t+1)/P_i,t represents the returns of investment in stock i between periods t and t+1. Assume that the utility function is a form of constant relative risk aversion:

u(C_t) = C_t^α/α for α<1.

Then, for each i = 1, ..., N, the decision-rule is

C_t^α-1 = β E{C_t+1^α-1 [(P_i,t+1+D_i,t+1)/P_t] | Z_t}

Equivalently, for each stock i = 1, ..., N, we must have

βE{[(C_t+1/C_t)^α-1] [(P_i,t+1+D_i,t+1)/P_t] | Z_t} = 1

The hypothesis of rational expectation assumes that the intertemporal decision-making should be independent from the historical information available at the time of making the decision. Therefore, the derived orthogonality condition for each i = 1, ..., N is:

E{Z_t(β[(C_t+1/C_t)^α-1] [(P_i,t+1+D_i,t+1)/P_t] -1)} = 0

For more detailed description of the model, see L. P. Hansen and K. J. Singleton (1982). For computational implementation of the model, see Hansen-Heaton-Ogaki GMM package

The Model

E(Zε(X,θ)) = 0, where

X = [X₁,X₂,X₃], θ = (β,α), and

ε(X,θ) = βX₁^α-1X₂ - 1

βX₁^α-1X₃ - 1

The instrumental variables Z consist of the lags of X and a constant. We note that this is a system of two nonlinear equations.

The data file GMMQ.TXT (from 1/59 to 12/78, not the original Hansen-Singleton data) consists of three variables:

X₁: Ratio of two-period consumption, C_t+1/C_t
X₂: Value-weighted returns of NYSE stock market, (P_t+1+D_t+1)/P_t where P_t+1 is the price and D_t+1 is the dividend payoff of stock at t+1.
X₃: Free-risk rate of returns (T-Bill rate)

Var(m(θ)) =	S₀(θ) + ∑_j=1,2,...,p(1 - j/(p+1)) (S_j(θ) + S_j(θ)')
	S₀(θ) = m(θ)m(θ)' = 1/N² ∑_i=1,2,...,N m(X_i,θ)'m(X_i,θ)
	S_j = m(θ)m_-j(θ)' = 1/N² ∑_i=j+1,...,N m(X_i,θ)'m(X_i-j,θ)
	j = 1, ..., p < N.

Var(m(β)) =	S₀ + ∑_j=1,2,...,p(1 - j/(p+1))(S_j + S_j')
	S₀ = m(β)m(β)' = 1/N² ∑_i=1,2...,N m(X_i,β)'m_i(X,β)
	S_j = m(β)m_-j(β)' = 1/N² ∑_{i=j+1,2,...,N} m(X_i,β)'m(X_i-j,β)
	j = 1,..., p < N.

LM	= α[Var(α)]^-1α'
	= m(b^)'W G(b^)[G(b^)'W G(b^)]^-1G(b^)'W m(b^)

Generalized Method of Moments

Table of Contents

Readings and References:

Nonlinear Generalized Method of Moments (GMM)

Example: Estimating Gamma Distribution

GMM Estimation of Econometric Models

Sample Moment Conditions

GMM Criterion Function

GMM Estimation

Nonlinear IV Estimation

Linear IV Estimation

Special Case

Hypothesis Testing

Wald Test

Langrangian Multiplier (LM) Test

Likelihood Ratio (LR) Test

Application: A Nonlinear Rational Expectation Model

The Model