Topic 4f

Generalized Linear Models (GLM)

Readings and References

P. McCullagh and J.A. Nelder, Generalized Linear Models, 2nd ed., Chapman and Hall, 1989.
J.A. Nelder and R.W.M. Wedderburn, "Generalized Linear Models," Journal of the Royal Statistical Society, 1972, Series A 135: 370-384.

The generalized linear model (GLM) is a flexible generalization of ordinary least squares regression. OLS restricts the regression coefficients to have a constant effect on the dependent variable. GLM allows for this effect to vary along the range of the explanatory variables. In particular, a nonlinear function links the linear parameterization to the expected value of the random variable.

Let μ = E(Y) and η = Xβ. The basic structure of GLM is the link function g(μ) = η. Therefore, Y = g^-1(Xβ) + ε.

GLM is essentially a non-linear model with the linear parameterization in the expected value of Y. To estimate the model, one needs three components:

Random component, f(ε) or f(Y), specifying the conditional distribution of the response variable given the explanatory variables X. Typically, this distribution is from the exponential family:

Y f(Y) E(Y) Var(Y)

Bernoulli(π) 0,1 π^Y (1-π)^1-Y π π(1-π)

Poisson(λ) 0,1,2,... exp(-λ) λ^Y/Y! λ λ

Normal(μ,σ) (-∞,∞) 1/√(2πσ²) exp[-(Y-μ)²/(2σ²)] μ σ²

Gamma(λ,ρ) [0,∞) λ^ρ/Γ(ρ) exp(-λY) Y^ρ-1 ρ/λ ρ/λ²

Exponential(λ) [0,∞) λ exp(-λY) 1/λ 1/λ²

Inverse Normal ...

Inverse Gamma ...

...

A linear predictor which is a linear function of the regressors: η = β₀ + β₁X₁ + ... + β_kX_k = Xβ

	Y	f(Y)	E(Y)	Var(Y)
Bernoulli(π)	0,1	π^Y (1-π)^1-Y	π	π(1-π)
Poisson(λ)	0,1,2,...	exp(-λ) λ^Y/Y!	λ	λ
Normal(μ,σ)	(-∞,∞)	1/√(2πσ²) exp[-(Y-μ)²/(2σ²)]	μ	σ²
Gamma(λ,ρ)	[0,∞)	λ^ρ/Γ(ρ) exp(-λY) Y^ρ-1	ρ/λ	ρ/λ²
Exponential(λ)	[0,∞)	λ exp(-λY)	1/λ	1/λ²
Inverse Normal	...
Inverse Gamma	...
...

A link function which transforms the expectation of the response to the linear predictor. In other words, the link function describes the relationship between the linear predictor and the mean of the distribution function. The link function must be invertible.

The table below lists commonly used link functions and their inverse:

Link η=g(μ) μ=g^-1(η)

Identity μ η

Log ln(μ) exp(η)

Inverse μ^-1 η^-1

Inverse-Square μ^-2 η^-0.5

Square Root μ^0.5 η²

Logit ln[μ/(1-μ)] Λ(η)=exp(η)/[1+exp(η)]

Probit Φ^-1(μ) Φ(η)

Log-log -ln[-ln(μ)] exp[-exp(-η)]

Link	η=g(μ)	μ=g^-1(η)
Identity	μ	η
Log	ln(μ)	exp(η)
Inverse	μ^-1	η^-1
Inverse-Square	μ^-2	η^-0.5
Square Root	μ^0.5	η²
Logit	ln[μ/(1-μ)]	Λ(η)=exp(η)/[1+exp(η)]
Probit	Φ^-1(μ)	Φ(η)
Log-log	-ln[-ln(μ)]	exp[-exp(-η)]

To estimate the coefficients for a GLM model, we use maximum likelihood method.

The model interpretation is typically based on the marginal effect defined by ∂E(Y)/∂X. From the definition of the link function in GLM, g(μ) = η or g(E(Y)) = Xβ, we derive the differentiation ∂g(E(Y))/∂X = g' ∂E(Y)/∂X = β, where g' = ∂g(μ)/∂μ. Therefore ∂E(Y)/∂X = β/g'. For the identity link, g' = 1, or ∂E(Y)/∂X = β.

GLM Examples

Given a sample of N observations (Y_i,X_i), i=1,...,N, the log-likelihood function is defined for each GLM as follows:

Family	Link	Log-Likelihood Function: llf(θ)	θ	Notes
Normal(μ,σ)	Identity: μ=Xβ	-Nln(2πσ²)-1/2∑_i=1,...,N(Y_i-X_iβ)²/σ²	(β,σ)	This is a linear model
Normal(μ,σ)	Log: ln(μ)=Xβ	-Nln(2πσ²)-1/2∑_i=1,...,N(Y_i-exp(X_iβ))²/σ²	(β,σ)	Not a log-linear model
Gamma(λ,ρ)	Identity: ρ/λ=Xβ	N[ρ(ln(ρ)-lnΓ(ρ)] +∑_i=1,...,N[(ρ-1)ln(Y_i)-ln(X_iβ)-ρY_i/X_iβ]	(β,ρ)
Exponential(λ)	Identity: 1/λ=Xβ	∑_i=1,...,N(-ln(X_iβ)-Y_i/X_iβ);	β
Exponential(λ)	Inverse: 1/λ=1/Xβ	∑_i=1,...,N(ln(X_iβ)-Y_iX_iβ);	β
Poisson(λ)	Identity: λ=Xβ	∑_i=1,...,NX_iβ+Y_iln(X_iβ)-ln(Y_i!)	β
Bernoulli(π)	Logit: ln(π/(1-π))=Xβ	∑_i=1,...,NY_iln(Λ(X_iβ)) +(1-Y_i)ln(1-Λ(X_iβ))	β	Logit Model
Bernoulli(π)	Probit: Φ^-1(π)=Xβ	∑_i=1,...,NY_iln(Φ(X_iβ)) +(1-Y_i)ln(1-Φ(X_iβ))	β	Probit Model
...

Example: Binary Choice Models

This example (see also, Greene [2012], Example 17.3) examines the effect of a new teaching method on students' grades. We consider the following qualitative regression (or binary choice) model:

GRADE = β₀ + β₁GPA + β₂TUCE + β₃PSI + ε

The following variables are avaialble in the data file GRADE.TXT:

GRADE = An indicator of whether the student's grade on an examination improved after exposure to the new teaching method PSI
PSI = An indicator of whether the student was exposed to the new teaching method
TUCE = Score of a pretest that indicates entering knowledge of the material
GPA = Grade point average

Estimate the generlized linear model of Bernoulli or binomial distribution with logit and probit link, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.