Topic 3

Nonlinear Methods in Econometrics

Nonlinear Model Estimation
Statistical Inferences in Nonlinear Models
Applications
- Box-Cox Variable Transformation
- Heteroscedastic Regression Models

Readings and References:

R Function or Package Documentation: nls, maxLik
R. F. Engle, "Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics", Handbook of Econometrics, Vol. 1, Chapter 13, ed. by Z. Griliches and M. D. Intriligator, North-Holland, 1983, 775-826 (Paper).
W. H. Greene, Econometric Analysis, 7th Ed., Chapter 7: Nonlinear, Semiparametric, and Nonparametric Regression Models, Prentice-Hall, 2011.
G. G. Judge, R. C. Hill, W. E. Griffiths, H. Lutkepohl, T.-C. Lee, Introduction to the Theory and Practice of Econometrics, 2nd Ed., Chapter 12: Nonlinear Least Squares and Nonlinear Maximum Likelihood Estimation, John Wiley, 1988, 497-570.
G. G. Judge, W. E. Griffiths, R. C. Hill, H. Lutkepohl, T.-C. Lee, The Theory and Practice of Econometrics, 2nd Ed., Chapter 6: Nonlinear Statistical Models, John Wiley, 1988, 195-223.
A. Zellner and N. Revankar, "Generalized Production Functions," Review of Economic Studies, 1970, 241-250. (Paper)

Nonlinear Model Estimation

Nonlinear least squares and maximum log-likelihood are the most common methods for parameter estimation of nonlinear models in econometrics. Assume that the model, linear or nonlinear, is expressed as:

F(z,β) = ε

F(z,β) defines the functional form of the model, where z = [y x] is the data matrix which includes both dependent (endogenous) y and independent (exogenous) x variables, β is the vector of unknown parameters and ε is the model error. A typical nonlinear model in econometrics takes a separable form (between y and x) like this:

ε = F(z,β) = F(y,x,β) = y - f(x,β), or

y = f(x,β) + ε

For a general nonseparable (between y and x) nonlinear model F(z,β), the asymptotic theory of nonlinear least squares does not apply. Maximum likelihood or generalized method of moments should be considered instead.

Nonlinear Least Squares

The model is estimated by minimizing the sum of squared errors:

S(β|y,x) = ε'ε = (y-f(x,β))' (y-f(x,β))

Least squares estimates of the parameters are computed from the first-order condition for minimization (zero gradient):

∂S/∂β = 2ε'(∂ε/∂β) = - 2ε'(∂f(x,β)/∂β) = 0.

Finally, the following Hessian matrix must be checked for the positive definiteness:

∂²S/∂β∂β' = 2[(∂ε/∂β)'(∂ε/∂β) + ∑_i=1,2,...,Nε_i (∂²ε_i/∂β'∂β)]

= 2[(∂f(x,β)/∂β)'(∂f(x,β)/∂β) - ∑_i=1,2,...,Nε_i (∂²f(x_i,β)/∂β'∂β)]

Given E(∂S/∂β) = 0, following from Taylor approximation of ∂S/∂β at the NLS estimator b of β, the asymptotic theory implies that

√N(b-β) →^d N(0,H^-1VH^-1)

where V = Var(∂S/∂β) = E((∂S/∂β)'(∂S/∂β)), and H = E(∂²S/∂β∂β').

Evaluated at the NLS estimator b of β, the sample analogy of H and V is respectively:

H = 2[(∂ε/∂β)'(∂ε/∂β)]/N
V = 4[(∂ε/∂β)'εε'(∂ε/∂β)]/N.

Therefore, b ~^a N(β,[(∂ε/∂β)'(∂ε/∂β)]^-1 [(∂ε/∂β)'εε'(∂ε/∂β)] [(∂ε/∂β)'(∂ε/∂β)]^-1)

Under the assumption of homschedasticity, E(εε') = σ²I, the estimated variance-covariance matrix of the parameters b is simplified as follows:

Var(b) = s²[(∂ε/∂β)'(∂ε/∂β)]^-1

where s² is the estimated model variance σ². That is, s² = e'e/N, and e = y-f(x,b) is the estimated errors or residuals.

If there are equality or inequality parameter constraints (e.g., non-negativity) expressed in terms of a continuous transformation β = φ(α) where α is an unconstrained parameter vector. Then from the estimator of α and Var(α), we have

β = φ(α)
Var(β) = (∂φ/∂α) [Var(α)] (∂φ/∂α)'

Nonlinear Weighted Least Squares

The technique of nonlinear least squares can be generalized straightforwardly to consider the weighted model errors. Denote the weighting scheme w = w(β|y,x), a scalar or a vector, which may be linearly or nonlinearly dependent on part or all of the parameters. Define the weighted error terms as ε^* = wε. The model can be estimated by minimizing the sum of weighted squared errors:

S^*(β|y,x) = ε^*'ε^*

Since the weighting function w may depend on the unknown parameters β, the consistency condition is not satisfied in general for the weighted least squares model. Weighted least squares estimator may be inconsistent.

Maximum Normal Likelihood

Assuming the normal probability distribution of the model error, or ε ~ normal(0,σ²I), then the log-likelihood function for each data observation i is

ll(β,σ²|y_i,x_i) = -½ [ln(2π)+ln(σ²) + ε_i²/σ²] + ln(J_i(β))

where ε_i = F(y_i,x_i,β), and J_i(β) = |∂ε_i/∂y_i| is the Jacobian of transformation from ε_i to y_i. The model is estimated by maximizing the sum of log-likelihood over a sample of N observations as follows:

ll(β,σ²|y,x) = -½N [ln(2π)+ln(σ²)] -½ (ε'ε/σ²) + ∑_i=1,2,...,Nln(J_i(β))

The solution is obtained from the system of first-order condition as follows:

∂ll/∂β = - ε'/σ² (∂ε/∂β) + ∑_i=1,2,...,N[1/J_i(β)](∂J_i/∂β) = 0.
∂ll/∂σ² = - N/(2σ²) - ε'ε/(2σ⁴) = 0.

Usually the maximum likelihood estimation is performed by substituting out the asymptotic variance estimate σ². That is, σ² = ε'ε/N. Then the following concentrated log-likelihood function is maximized to find the parameter estimates β:

ll^*(β|y,x) = -½N [1+ln(2π)-ln(N)] -½N ln(ε'ε) + ∑_i=1,2,...,Nln(J_i(β))

Let ε^* = ε/[(J₁...J_N)^1/N]. Then the last two terms of the above concentrated log-likelihood function can be combined and the function is re-written as

ll^*(β|y,x) = -½N [1+ln(2π)-ln(N)] -½N ln(ε^*'ε^*)

Therefore, maximizing the concentrated log-likelihood function ll^*(β|y,x) is equivalent to minimizing the sum of squared weighted errors:

S^*(β|y,x) = ε^*'ε^*

where ε^* = wε, with the weight w = 1/[(J₁...J_N)^1/N] (inverse of the geometric mean of Jacobians) applied to each observation of the error terms ε.

Solving from the first-order condition or zero-gradient condition:

∂ll^*/∂β = -½N ∂ln(S^*)/∂β = -½(N/S^*)(∂S^*/∂β) = -(N/S^*)[ε^*'(∂ε^*/∂β)] = 0,

the solution must be checked for the negative definiteness of the Hessian matrix (the second-order condition):

∂²ll^*/∂β∂β' = ½N ∂²ln(S^*)/∂β∂β'
= ½(N/S^*)[(1/S^*)(∂S^*/∂β)'(∂S^*/∂β) -(∂²S^*/∂β∂β')]

Since ∂S^*/∂β = 0 from the first-order condition for the maximum likelihood solution, the corresponding negative definite Hessian matrix is simply

∂²ll^*/∂β∂β' = -(N/S^*)[½(∂²S^*/∂β∂β')]
= -(N/S^*)[(∂ε^*/∂β)'(∂ε^*/∂β) + ∑_i=1,2,...,Nε_i^* (∂²ε_i^*/∂β∂β')]

Given E(∂ll^*/∂β) = 0, following from Taylor approximation of ∂ll^*/∂β at the ML estimator b of β, the asymptotic theory implies that

√N(b-β) →^d N(0,H^-1VH^-1)

where V = Var(∂ll^*/∂β) = E((∂ll^*/∂β)'(∂ll^*/∂β)), and H = E(∂²ll^*/∂β∂β').

Evaluated at the ML estimator b of β, the sample analogy of H and V is respectively:

H = (-1/σ^2*)[(∂ε^*/∂β)'(∂ε^*/∂β)]/N
V = (-1/σ^2*)²[(∂ε^*/∂β)'ε^*ε^*'(∂ε^*/∂β)]/N, where σ^2* = S²/N.

For a class of models satisfying regularity assumptions, the Information Matrix Equality holds as - H = V or

- E(∂²ll^*/∂β∂β') = E((∂ll^*/∂β)'(∂ll^*/∂β))

Therefore, √N(b-β) →^d N(0,-H^-1). In other words, b ~^a N(β,σ^2*[(∂ε^*/∂β)'(∂ε^*/∂β)]^-1)

The estimated variance-covariance matrix of the parameters b is:

Var(b) = s^2*[(∂ε^*/∂β)'(∂ε^*/∂β)]^-1, where s^2* is the sample estimate of σ^2*.

Further, as in the case of nonlinear least squares, if there are equality or inequality parameter constraints (e.g., non-negativity) expressed in terms of a continuous transformation β = φ(α) where α is an unconstrained parameter vector. Then from the estimator of α and Var(α), we have

β = φ(α)
Var(β) = (∂φ/∂α) [Var(α)] (∂φ/∂α)'

A Special Case

If the Jacobian J_i(β) = |∂ε_i/∂y_i| = 1 for all observation i, then the log-Jacobian terms in the above concentrated log-likelihood function vanish. Therefore,

ll^*(β|y,x) = -½N [1+ln(2π)-ln(N)] -½N ln(ε'ε)

This is exactly the special case of classical nonlinear model in which ε = F(y,x,β) = y - f(x,β). For this special case, maximizing the concentrated log-likelihood function ll^*(β|y,x) is the same as minimizing the sum of squared errors S(β|y,x).

Example 1: Generalized Production Functions

First, we fit the following two classical production functions based on 30 data observations of labor L, capital K, and output Q given in the file JUDGE.TXT (The data of this example is taken from Judge, et. al. [1988], Chapter 12, p. 512):

Cobb-Douglas Production Function
ln(Q) = β₁ + β₂ln(L) + β₃ln(K) + ε
CES Production Function
ln(Q) = β₁ + β₄ln(β₂L^β₃ + (1-β₂)K^β₃) + ε

Based on the least squares and maximum likelihood criteria, estimate and compare the Cobb-Douglas and CES production function, respectively.

According to Zellner and Revankar [1970], the classical production functions may be generalized to consider variable rate of returns to scale as follows:

Generalized Cobb-Douglas Production Function
ln(Q) + θ Q = β₁ + β₂ln(L) + β₃ln(K) + ε
Generalized CES Production Function
ln(Q) + θ Q = β₁ + β₄ln(β₂L^β₃ + (1-β₂)K^β₃) + ε

Modify and estimate the Generalized version of Cobb-Doulass and CES production function, repectively.

Statistical Inferences in Nonlinear Models

The classical assumption of statistical inferences is the normal probability distribution of the model error:

ε = F(y,x,β) ~ normal(0,σ²I).

Thus the estimated least squares or maximum likelihood parameters b of β is normally distributed:

b ~ normal(β,Var(b))

where the estimated variance-covariance matrix is

Var(b) = [-E(∂²ll(b)/∂β∂β')]^-1 = s²[½ E(∂²S(b)/∂β∂β')]^-1

= s²[(∂ε(b)/∂β)'(∂ε(b)/∂β)]^-1

and the estimated asymptotic variance of the model is s² = S(b)/N.

Confidence location of the true parameter vector β is derived from the estimated b based on the following familiar F statistic:

F = (S(β)-S(b))/J/s²

where J is the degrees of freedom associated with the testing hypotheses.

By approximating the sum of squares function S(β) at b up to the second order and set ∂S(b)/∂β = 0,

S(β) - S(b) = ½ (β-b)' [∂²S(b)/∂β∂β'] (β-b)

Therefore, the test statistic for testing β = b:

JF = (β-b)' [Var(b)]^-1 (β-b)

follows a Chi-Square distribution with J degrees of freedom.

Wald Test

Consider J active constraints of parameters, linear or nonlinear (continuous and differentiable), expressed as the equation:

c(β) = 0.

If the constraints were true, without estimating the constrained model, the unconstrained parameter estimator b is expected to satisfy the constraint equation closely. That is, c(b) = 0. The test statistic

W = c(b)'[Var(c(b)]^-1c(b)

has a Chi-square distribution with J degrees of freedom. With the first-order linear approximation of the constraint function c(β) at b,

W = c(b)' {(∂c(b)/∂β) [Var(b)] (∂c(b)/∂β)'}^-1 c(b)

Note that this test statistic does not require the computation of the constrained parameter estimator.

Lagrangian Multiplier Test

Given the J-element constraint equation c(β) = 0, let b^* denote the constrained maximum likelihood estimator of the parameters β. The Lagrangian multiplier test is based on the score vector ∂ll(b^*)/∂β of the original parameterization. If the constraints hold, then ∂ll(b^*)/∂β should be close to ∂ll(b)/∂β for the unconstrained parameter estimater b, which is of course zero.

The test statistic is written as:

LM = (∂ll(b^*)/∂β) [Var(∂ll(b^*)/∂β)]^-1 (∂ll(b^*)/∂β)'

= (∂ll(b^*)/∂β) [Var(b^*)] (∂ll(b^*)/∂β)'

The estimated variance-covariance matrix of the constrained estimator b^* is computed as follows:

Var(b^*) = H^-1 [I - G'(G H^-1G')^-1GH^-1]

where H = [-∂ll(b^*)²/∂β∂β'] and G = [∂c(b^*)/∂β].

LM test statistic is easily approximated with the following formula:

LM = {[ε(b^*)'(∂ε(b^*)/∂β)] [(∂ε(b^*)/∂β)'(∂ε(b^*)/∂β)]^-1 [ε(b^*)'(∂ε(b^*)/∂β)]'}/(ε(b^*)'ε(b^*)/N)

Note that the maximum likelihood estimates of errors ε(b^*) may be properly weighted, and this test statistic is based on the constrained parameters alone.

Likelihood Ratio Test

If both the constrained and unconstrained maximum likelihood solutions are available, then the test statistic

LR = -2(ll(b^*)-ll(b))

follows a Chi-square distribution with J degrees of freedom, in which there are J constraints in the equation c(β) = 0. In terms of sum of squares, it is

LR = N ln(S(b^*)/S(b)).

Figure: Three Bases for Hypothsis Tests

Example 2

Returning to the previous example of Generalized CES production function:

ln(Q) + θ Q = β₁ + β₄ln(β₂L^β₃ + (1-β₂)K^β₃) + ε

Using Wald test, Largrangian multiplier test, and likelihood ratio test to verify the nonlinear equality constraints (or classical CES): β₄ = 1/β₃, and θ = 0.

Box-Cox Variable Transformation

Readings and References:

W. H. Greene, Econometric Analysis, 7th Ed., Chapter 7: Nonlinear, Semiparametric, and Nonparametric Regression Models, Prentice-Hall, 2011.
J. J. Spitzer, "A Primer on Box-Cox Estimation," Review of Economics and Statistics, 1982, 307-313. (Paper)

The Box-Cox transformation of a data variable X is defined by

X^(λ) = (X^λ-1)/λ

Although the range of λ can cover the whole real line, -2 ≤ λ ≤ 2 is the area of interest in many econometric applications. If λ = 2, it is the quadratic transformation. If λ = 0.5, it is a square-root. A linear model corresponds to λ = 1, and the logarithmic transformation is the limiting case that λ -> 0 (by L'Hôspital's rule, lim_λ->0(X^λ-1)/λ = ln(X)).

The value of power λ may not be the same for each of the variables in the model. In particular, the dependent variable and independent variables as a group may take different Box-Cox transformations. Let α = (β,θ,λ)' be the vector of unknown parameters for a regression model:

ε = F(Y,X,α) = Y^(θ) - X^(λ)β

Or, Equivalently,

Y^(θ) = X^(λ)β + ε

where ε ~ normal(0,σ²I). The log-likelihood function is

ll(α,σ²|Y,X) = -½N [ln(2π)+ln(σ²)] -½ (ε'ε/σ²) + (θ-1)∑_i=1,2,...,Nln(|Y_i|)

Note that for each data observation i, the Jacobian term is derived as J_i(θ) = |∂ε_i/∂Y_i| = |Y_i^θ-1|. By substituting out σ² = ε'ε/N, the concentrated log-likelihood function is

ll^*(α|Y,X) = -½N [1+ln(2π)-ln(N)] -½N ln(ε'ε) + (θ-1) ∑_i=1,2,...,Nln(|Y_i|)

= -½N [1+ln(2π)-ln(N)] -½N ln(ε^*'ε^*)
where ε^* = ε / [(|Y₁||Y₂|...|Y_N|)^(θ-1)/N]

Given the values of Box-Cox transformation parameters θ and λ, a wide range of model specifications are possible. Of course, θ and λ should be estimated simultaneously with β. The efficient estimator of α = (β,θ,λ)' is obtained by maximizing the above concentrated log-likelihood function. It is equivalent to minimizing the sum of squared weighted errors:

S^*(β|Y,X) = ε^*'ε^*,

where ε^* = wε, and w = 1/[(|Y₁||Y₂|...|Y_N|)^(θ-1)/N].

Based on the estimated parameter vector α = (β,θ,λ)', a Box-Cox model is typically interpreted in terms of the elasticity. That is,

∂ln(Y)/∂ln(X) = (X/Y)(∂y/∂X) = (X^λ/Y^θ)β

Example 3

Based on the money demand data given in Greene's Table 10.1 (1997, p. 443 and 451) or MONEY.TXT, formulate and estimate the following functional forms of money demand equations (see also Greene's Example 10.11 for comparison):

M^(θ) = β₀ + β₁ R^(λ) + β₂ Y^(λ) + ε

As described in Greene's Example 10.5 and Table 10.3, M is the real stock of M2, R is the discount interest rate, and Y is the real GNP. Several variations of the Box-Cox transformation parameters may be estimated and tested for the most appropriate functional form of money demand equation:

θ = λ, i.e.
M^(λ) = β₀ + β₁ R^(λ) + β₂ Y^(λ) + ε
θ -> 0, i.e.
ln(M) = β₀ + β₁ R^(λ) + β₂ Y^(λ) + ε
λ -> 0, i.e.
M^(θ) = β₀ + β₁ ln(R) + β₂ ln(Y) + ε
θ -> 0 and λ -> 0, i.e.
ln(M) = β₀ + β₁ ln(R) + β₂ ln(Y) + ε

Heteroscedastic Regression Models

Readings and References:

W. H. Greene, Econometric Analysis, 7th Ed., Chapter 9: The Generalized Regression Model and Heteroscedasticity, Prentice-Hall, 2011.
A. C. Harvey, "Estimating Regression Models with Multiplicative Heteroscedasticity," Econometrics, 1976, 461-465. (Paper)

Consider a general regression model F(Y,X,β) = ε ~ normal(0,Σ). Let the covariance matrix Σ = σ²Ω(α), then the corresponding log-likelihood function is

The last term is the sum of log-Jacobians from ε_i to Y_i over the entire sample. Since the variance term can be solved as σ² = ε'Ω^-1ε / N for a given Ω, the concentrated log-likelihood function is

It is clear that the maximum likelihood estimation is in general not equivalent to the nonlinear least squares unless Ω = I, the identity matrix, and ∂ε_i/∂Y_i = 1 for each i=1,2,...,N. If Ω is known and the log-Jacobians vanish, it is the GLS (Generalized Least Squares) problem that minimizes ε'Ω^-1ε.

Unfortunately, Ω = Ω(α) is not known and must be parameterized with a lower dimension α, which in turn is estimated together with the vector of model parameters β. The models with heteroscedastic and/or autocorrelated errors are the special cases of the general regression model in which Ω(α) is defined more specifically.

For simplicity, consider a regression model ε = F(Y,X,β) = Y - f(X,β). Then for each data observation i, ∂ε_i/∂Y_i = 1. We assume further that the heteroscedastic error ε_i ~ normal(0,σ_i²). The log-likelihood function is

ll(β,σ_i²|Y_i,X_i) = -½ [ln(2π) + ln(σ_i²) + ε_i²/σ_i²]

Summing over a sample of N observations, the total log-likelihood function is written as

ll(β,σ₁²,σ₂²,...,σ_N²|Y,X) = -½N ln(2π) -½ ∑_i=1,2,...,Nln(σ_i²) -½ ∑_i=1,2,...,N(ε_i²/σ_i²)

Given the general form of heteroscedasticity, there are too many unknown parameters. For practical purpose, some hypotheses of heteroscedasticity must be assumed:

σ_i² = σ² h_i(α)

where σ² > 0 and h_i(α) is indexed by i to indicate that it is a function of Z_i. That is h_i(α) = h(α|Z_i), where Z is a set of independent variables that may or may not be coincide with X. Depending on the form of heteroscedasticity h_i(α), denoted by h_i for brevity, the log-likelihood function is written as

ll(β,α,σ²|Y,X) = -½N (ln(2π) + ln(σ²)) -½ ∑_i=1,2,...,Nln(h_i) -½(1/σ²)∑_i=1,2,...,N(ε_i²/h_i)

Let ε_i^* = ε_i / √h_i and substitute out the maximum likelihood estimator of σ² with ε^*'ε^*/N, then the concentrated log-likelihood function is

ll^*(β,α|Y,X) = -½N (1+ln(2π)-ln(N)) -½ ∑_i=1,2,...,Nln(h_i) -½N ln(ε^*'ε^*)

The last two log-terms can be combined as:

ll^*(β,α|Y,X) = -½N (1+ln(2π)-ln(N)) -½N ln(ε^**'ε^**)

where ε^** = ε^*√h, and h = (h₁h₂...h_N)^1/N. It becomes a weighted nonlinear least squares probelm with the weighted errors defined by ε_i^** = ε_i√(h/h_i).

Consider the following special cases of h_i = h(α|Z_i) = h(Z_iα):

σ_i² = σ²(Z_iα), Z_iα > 0
σ_i² = σ²(Z_iα)²
Exponential Heteroscedasticity: σ_i² = σ²exp(Z_iα)
The corresponding concentrated log-likelihood function for estimation is
ll^*(β,α|Y,X) = -½N (1+ln(2π)-ln(N)) -½ ∑_i=1,2,...,NZ_iα -½N ln(ε^*'ε^*)
where ε_i^* = ε_i / exp(Z_iα)^½ for each observation i=1,2,...,N.
Equivalently,
ll^*(βα|Y,X) = -½N (1+ln(2π)-ln(N)) -½N ln(ε^**'ε^**)
ε_i^** = ε_i√(h/h_i), h = (h₁h₂...h_N)^1/N, and h_i = exp(Z_iα) for each observation i=1,2,...,N.
Multiplicative Heteroscedasticity: σ_i² = σ²Π_m=1,2,...,M Z_im^α_m, where M is the number of variables in Z_i. This is equivalent to the exponential case if the variables in Z are logs. That is, σ_i² = σ²exp[ln(Z_i)α]. A special case, with a single variable, is
σ_i² = σ² Z_i^α
If α = 0, the model is homoscedastic; If α = 2, it is the case (ii).

Example 4

Given the data of per capita expenditure on public schools and per capita income from Greene's Table 12.1 (1997, p. 541) or GREENE.TXT, consider the following somewhat heteroscedastic relationship of public school spending (Y) and income (X):

Y = β₀ + β₁ X + β₂ X² + ε

Find and compare the maximum likelihood estimates based on the following hypotheses of heteroscedasticity:

σ_i² = σ² X_i²
σ_i² = σ² X_i^α
σ_i² = σ² exp(αX_i)

Note that (1) is a special case of (2) in which α = 2; and (2) is equivalent to (3) if X is expressed in log form.

∂²S/∂β∂β'	= 2[(∂ε/∂β)'(∂ε/∂β) + ∑_i=1,2,...,Nε_i (∂²ε_i/∂β'∂β)]
	= 2[(∂f(x,β)/∂β)'(∂f(x,β)/∂β) - ∑_i=1,2,...,Nε_i (∂²f(x_i,β)/∂β'∂β)]

∂²ll^*/∂β∂β'	= ½N ∂²ln(S^*)/∂β∂β'
	= ½(N/S^)[(1/S^)(∂S^/∂β)'(∂S^/∂β) -(∂²S^*/∂β∂β')]

∂²ll^*/∂β∂β'	= -(N/S^)[½(∂²S^/∂β∂β')]
	= -(N/S^)[(∂ε^/∂β)'(∂ε^/∂β) + ∑_i=1,2,...,Nε_i^ (∂²ε_i^*/∂β∂β')]

Var(b)	= [-E(∂²ll(b)/∂β∂β')]^-1 = s²[½ E(∂²S(b)/∂β∂β')]^-1
	= s²[(∂ε(b)/∂β)'(∂ε(b)/∂β)]^-1

LM	= (∂ll(b^)/∂β) [Var(∂ll(b^)/∂β)]^-1 (∂ll(b^*)/∂β)'
	= (∂ll(b^)/∂β) [Var(b^)] (∂ll(b^*)/∂β)'

ll^*(α\|Y,X)	= -½N [1+ln(2π)-ln(N)] -½N ln(ε'ε) + (θ-1) ∑_i=1,2,...,Nln(\|Y_i\|)
	= -½N [1+ln(2π)-ln(N)] -½N ln(ε^'ε^) where ε^* = ε / [(\|Y₁\|\|Y₂\|...\|Y_N\|)^(θ-1)/N]

Nonlinear Methods in Econometrics

Table of Contents

Readings and References:

Nonlinear Weighted Least Squares

A Special Case

Example 1: Generalized Production Functions

Figure: Three Bases for Hypothsis Tests

Readings and References:

Readings and References: