Generalized Linear Models (GLM)

Readings and References

The generalized linear model (GLM) is a flexible generalization of ordinary least squares regression. OLS restricts the regression coefficients to have a constant effect on the dependent variable. GLM allows for this effect to vary along the range of the explanatory variables. In particular, a nonlinear function links the linear parameterization to the expected value of the random variable.

Let μ = E(Y) and η = Xβ. The basic structure of GLM is the link function g(μ) = η. Therefore, Y = g-1(Xβ) + ε.

GLM is essentially a non-linear model with the linear parameterization in the expected value of Y. To estimate the model, one needs three components:

  1. Random component, f(ε) or f(Y), specifying the conditional distribution of the response variable given the explanatory variables X. Typically, this distribution is from the exponential family:

    Yf(Y)E(Y)Var(Y)
    Bernoulli(π)0,1πY (1-π)1-Yππ(1-π)
    Poisson(λ)0,1,2,...exp(-λ) λY/Y!λλ
    Normal(μ,σ)(-∞,∞)1/√(2πσ2) exp[-(Y-μ)2/(2σ2)]μσ2
    Gamma(λ,ρ)[0,∞)λρ/Γ(ρ) exp(-λY) Yρ-1ρ/λρ/λ2
    Exponential(λ)[0,∞)λ exp(-λY) 1/λ1/λ2
    Inverse Normal...
    Inverse Gamma...
    ...

  2. A linear predictor which is a linear function of the regressors: η = β0 + β1X1 + ... + βkXk = Xβ
  3. A link function which transforms the expectation of the response to the linear predictor. In other words, the link function describes the relationship between the linear predictor and the mean of the distribution function. The link function must be invertible.

    The table below lists commonly used link functions and their inverse:

    Linkη=g(μ)μ=g-1(η)
    Identityμη
    Logln(μ)exp(η)
    Inverseμ-1η-1
    Inverse-Squareμ-2η-0.5
    Square Rootμ0.5η2
    Logitln[μ/(1-μ)]Λ(η)=exp(η)/[1+exp(η)]
    ProbitΦ-1(μ)Φ(η)
    Log-log-ln[-ln(μ)]exp[-exp(-η)]

To estimate the coefficients for a GLM model, we use maximum likelihood method.

The model interpretation is typically based on the marginal effect defined by ∂E(Y)/∂X. From the definition of the link function in GLM, g(μ) = η or g(E(Y)) = Xβ, we derive the differentiation ∂g(E(Y))/∂X = g' ∂E(Y)/∂X = β, where g' = ∂g(μ)/∂μ. Therefore ∂E(Y)/∂X = β/g'. For the identity link, g' = 1, or ∂E(Y)/∂X = β.

GLM Examples

Given a sample of N observations (Yi,Xi), i=1,...,N, the log-likelihood function is defined for each GLM as follows:
FamilyLinkLog-Likelihood Function: llf(θ)θNotes
Normal(μ,σ)Identity: μ=Xβ -Nln(2πσ2)-1/2∑i=1,...,N(Yi-Xiβ)22 (β,σ)This is a linear model
Normal(μ,σ)Log: ln(μ)=Xβ -Nln(2πσ2)-1/2∑i=1,...,N(Yi-exp(Xiβ))22 (β,σ)Not a log-linear model
Gamma(λ,ρ)Identity: ρ/λ=Xβ N[ρ(ln(ρ)-lnΓ(ρ)] +∑i=1,...,N[(ρ-1)ln(Yi)-ln(Xiβ)-ρYi/Xiβ] (β,ρ)
Exponential(λ)Identity: 1/λ=Xβ i=1,...,N(-ln(Xiβ)-Yi/Xiβ); β
Exponential(λ)Inverse: 1/λ=1/Xβ i=1,...,N(ln(Xiβ)-YiXiβ); β
Poisson(λ)Identity: λ=Xβ i=1,...,N(-Xiβ)+Yiln(Xiβ)-ln(Yi!) β
Poisson(λ)Log: ln(λ)=Xβ i=1,...,Nexp(-Xiβ)+Yi(Xiβ)-ln(Yi!) β
Bernoulli(π)Logit: ln(π/(1-π))=Xβ i=1,...,NYiln(Λ(Xiβ)) +(1-Yi)ln(1-Λ(Xiβ)) βLogit Model
Bernoulli(π)Probit: Φ-1(π)=Xβ i=1,...,NYiln(Φ(Xiβ)) +(1-Yi)ln(1-Φ(Xiβ)) βProbit Model
...

Example 1: Income Earning Equation

Using 20 observations of the hypothetical data series INCOME and EDUCATION of Greene (Table FC.1) in YED20.TXT, we can estimate the generalized linear model (GLM) of INCOME-EDUCATION relationship based on a probability distribution of the exponential family (e.g., normal, gamma, etc..) with a link function (e.g, identity, log, inverse, etc..). Derive the corresponding log-likelihood function for the model and estimate the parameters by maximizing the log-likelihood function.

Example 2: Binary Choice Models

This example (see also, Greene [2012], Example 17.3) examines the effect of a new teaching method on students' grades. We consider the following qualitative regression (or binary choice) model:

GRADE = β0 + β1GPA + β2TUCE + β3PSI + ε

The following variables are avaialble in the data file GRADE.TXT:

Using maximum likelihood estimation method to represent and estimate the generlized linear model of Bernoulli or binomial distribution with logit and probit link, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.

Example 3: Count Data and Poisson Regression Model

This example is taken from Greene (2012, Example 18.9), which is based on Fair (1978). This study examines the qualitative responses to a question about extramarital affairs from a sample of 601 men and women married for the first time. The dependent variable is:

Y = Number of affairs in the past year: 0, 1, 2, 3, 4-10 (coded as 7), 11 or more (coded as 12).

Here, we present only the model using five explanatory variables as follows:

Z2 = Age.
Z3 = Number of years married.
Z5 = Degree of religiousness: 1 (anti-religious), ..., 5 (very religious).
Z7 = Hollingshead scale of occupation: 1, ..., 7.
Z8 = Self-rating of marriage satisfaction: 1 (very unhappy), ..., 5 (very happy).

The regression equation is:

Y = β0 + β2Z2 + β3Z3 + β5Z5 + β7Z7 + β8Z8 + ε

By examing the data of extramarital affairs, the preponderance of zeros (no affairs) dependent variable may suggest a possion distribution with log or identity link for the study. Further, with the potential problem of heterogeneity in the count data, a modified Poisson or the negative binomial regression model may be a better modeling framework.

Note that this model may be estimated with a probit or logit specification if the dependent variable Y is modified as:

Y = 0 if no extramarital affair
1 otherwise (e.g., 1,2,3,7,12)

Formulate, estimate, and compare the probit (or logit), Poisson, and negative binomial regression models, respectively.


Binary Choice Models

Consider a linear regression model Y = Xβ + ε, where

Yi = 1 with probability Pi
0 with probability 1-Pi

It is clear that Xi explains the probability of Yi to be 1 or 0. Let

Pi = Prob(Yi=1|Xi) = F(Xiβ)
1-Pi = Prob(Yi=0|Xi) = 1-F(Xiβ)

Since E(Yi|Xi) = (1)F(Xiβ) + (0)(1-F(Xiβ)) = F(Xiβ), the estimated model may be interpreted with the marginal effects defined by

∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β

Given a sample of N independent observations, the likelihood function is

L(β) = ∏i=1,2,...,N PiYi (1-Pi)1-Yi = ∏i=1,2,...,N F(Xiβ)Yi (1-F(Xiβ))1-Yi

Then the log-likelihood function is

ll(β) = ln(L(β)) = ∑i=1,2,...,N (Yi lnF(Xiβ) + (1-Yi) ln(1-F(Xiβ)))

To maximize ll(β) with respect to β, we solve from the first order condition:

ll(β)/∂β = ∑i=1,2,...,N (Yi/Fi-(1-Yi)/(1-Fi)) fiXi
= ∑i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

where Fi = F(Xiβ) and fi = f(Xiβ) = ∂Fi/∂(Xiβ). Note that fiXi = ∂Fi/∂β.

Finally, the Hessian ∂ll2(β)/∂β∂β' must be negative definite, and the estimated variance-covariance matrix of β is Var(β) = [-E(∂ll2(β)/∂β∂β')]-1.

Linear Probability Model

Pi = F(Xiβ) = Xiβ

It is immediately that E(Yi|Xi) = Xiβ. In particular,

E(εi) = (1-Xiβ)Pi + (-Xiβ)(1-Pi) = Pi - Xiβ
Var(εi) = E(εi2) = Pi(1-Xiβ)2 + (1-Pi)(-Xiβ)2
= Pi(1-Pi)2 + (1-Pi)(-Pi)2 = (1-Pi)Pi = (1-Xiβ)(Xiβ)

The range of Var(εi) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Yi|Xi) = F(Xiβ) = Xiβ, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.

Probit Model

Pi = F(Xiβ) = ∫-∞Xiβ 1/(2π)½ exp(-z2/2) dz

Pi, the cumulative normal distribution, is called Probit for the i-th observation. The model Yi = F-1(Pi) + εi is called the Probit Model, where F-1(Pi) = Xiβ is the inverse of cumulative distribution F(Xiβ). The probit model can be derived from a model involving an unobserved, or latent, variable Yi* such that Yi* = Xiβ + εi where εi ~ normal(0,1). Suppose the value of the observed binary variable Yi depends on the sign of Yi*:

Yi = 1 if Yi* > 0
0 if Yi* ≤ 0

Therefore,

Pi = Prob(Yi=1|Xi) = Prob(Yi*>0|Xi) = Prob(εi>-Xiβ)
= ∫-Xiβ 1/(2π)½ exp(-z2/2) dz
= ∫-∞Xiβ 1/(2π)½ exp(-z2/2) dz

For maximum likelihood estimation, we solve the following first order condition:

i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

where Fi = F(Xiβ) = ∫-∞Xiβ 1/(2π)½ exp(-z2/2) dz, and
fi = ∂F(Xiβ)/∂(Xiβ) = 1/(2π)½ exp(-(Xiβ)2/2)

This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Yi = F(Xiβ) + εi with weights given by [F(Xiβ)(1-F(Xiβ))].

Furthermore, it can be shown that for the maximum likelihood estimates β

E([∂2ll(β)/∂β∂β']) = -∑i=1,2,...,N(fi2XiXi')/(Fi(1-Fi))

which is negative definite. The estimated variance-covariance matrix of β is computed as

Var(β) = (-E[∂2ll(β)/∂β∂β'])-1

If the normal probability model is misspecified, then Quasi-Maximum Likelihood (QML) estimation is suggested by correcting the asymptotic variance-covariance matrix with a robust ("sandwich") estimator as follows:

Var(β) = (-H)-1G(-H)-1

where H = E[∂2ll(β)/∂β∂β'], and G = E[∂ll(β)/∂β][∂ll(β)/∂β'].

For model interpretation, the marginal effects of Xi is defined as

∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β = f(Xiβ)β = fiβ

Logit Model

Pi = F(Xiβ) = 1/(1+exp(-Xiβ))

Pi as defined is the logistic curve. The model Yi = F-1(Pi) + εi is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to Xiβ, or the odd ratio model: ln(Pi/(1-Pi)) = Xiβ Solving for Pi, we find that

Pi = exp(Xiβ)/(1+exp(Xiβ)) = 1/(1+exp(-Xiβ))

For maximum likelihood estimation, we solve the following first order condition:

i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

Because of the logistic functional form,

Fi = F(Xiβ) = 1/(1+exp(-Xiβ)) and
fi = ∂F(Xiβ)/∂(Xiβ) = exp(-Xiβ)/(1+exp(-Xiβ)) = Fi(1-Fi)

it amounts to solve the following simple expression:

i=1,2,...,N (Yi-Fi)Xi = 0

with the negative definite Hessian:

2ll(β)/∂β∂β' = - ∑i=1,2,...,NFi(1-Fi)Xi'Xi

Therefore, the estimate of variance-covariance matrix of β is

Var(β) = [-∂2ll(β)/∂β∂β']-1

For model interpretation, the marginal effects of Xi is defined as

∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β = fiβ = Fi(1-Fi


Count Data and Poisson Regression Model

If a decision variable takes values of nonnegative integers, in which there is no prior upper bound and there are some zeros, this is the model of count data.

Suppose Y = {0,1,2,...} follows a Poisson distribution with a parameter λ>0:

f(Y|λ) = eλY / Y!

It is known that E(Y) = Var(Y) = λ. If Y is to be explained by X such that E(Y|X) > 0, a natural approach is to set λ = E(Y|X) and parameterized by the regression parameter β in the Poisson distribution function. For example,

λ(X,β) = E(Y|X,β) = e > 0

Therefore, given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, the likelihood function is written as:

L(β) = ∏i=1,2,...,N [e-λ(Xi,β) λ(Xi,β)Yi / Yi!]

The corresponding log-likelihood function is

ll(β) = ∑i=1,2,...,NYiln(λ(Xi,β)) - ∑i=1,2,...,Nλ(Xi,β) - ∑i=1,2,...,Nln(Yi!)

Maximum likelihood estimate of β is obtained from:

ll/∂β = ∑i=1,2,...,N[(Yi-λ(Xi,β))/λ(Xi,β)] [∂λ(Xi,β)/∂β] = 0, and
2ll/∂β∂β' = ∑i=1,2,...,N[(Yi-λ(Xi,β))/λ(Xi,β)] [∂2λ(Xi,β)/∂β∂β']
+ ∑i=1,2,...,N[-Yi/λ(Xi,β)2] [∂λ(Xi,β)/∂β'] [∂λ(Xi,β)/∂β]
is negative definite.

If λ(Xi,β) = E(Yi|Xi,β) = eXiβ, the model is interpreted as:

∂E(Yi|Xi,β)/∂Xij = eXiββj = E(Yi|Xi,β)βj, or

βj = ∂E(Yi|Xi,β)/∂Xij / E(Yi|Xi,β)

Heterogeneity and Negative Binomial Regression Model

Maximum likelihood estimation of the Poisson regression model suffers from the problem of overdispersion due to the fact that Var(Y|X) = E(Y|X) when Y follows a Poisson distribution. We generalize the Poisson model by introducing an individual unobservable effect v>0 into the conditional mean:

E(Y|X,β,v) = λv = ev

Then Y follows a Poisson distribution with the density:

f(Y|λv) = e-λv(λv)Y / Y!

Suppose v follows a gamma distribution with E(v) = 1 and Var(v) = 1/θ. That is,

g(v|θ) = θθ/Γ(θ) vθ-1e-θv

Therefore,

f(Y|λ,θ) = ∫0 e-λv(λv)Y/Y! g(v|θ) dv
= (θθλY)/(Γ(θ)Y!) ∫0 e-(λ+θ)v v(Y+θ-1) dv
= [(θθλY)/(Γ(θ)Y!)] [Γ(Y+θ)/(λ+θ)y+θ]
= [Γ(Y+θ)/(Γ(θ)Y!)] [λ/(λ+θ)]Y [1-λ/(λ+θ)]θ

This is one form of negative binomial distribution with mean λ and variance λ(1+λ/θ). By construction, it is a Poisson-Gamma mixture. Typically, the parameter 1/θ is used to measure the extent of overdispersion. Given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, and let λi = λ(Xi,β) = eXiβ, the log-likelihood function is written as:

ll(β,θ) = ∑i=1,2,...N ln f(Yii,θ)

The negative binomial model can be estimated by maximum likelihood without much difficulty. A test of the Poisson distribution is often carried out by testing the hypothesis 1/θ -> 0.


Copyright© Kuan-Pin Lin
Last updated: 10/24/2016