The generalized linear model (GLM) is a flexible generalization of ordinary least squares regression. OLS restricts the regression coefficients to have a constant effect on the dependent variable. GLM allows for this effect to vary along the range of the explanatory variables. In particular, a nonlinear function links the linear parameterization to the expected value of the random variable.
Let μ = E(Y) and η = Xβ. The basic structure of GLM is the link function g(μ) = η. Therefore, Y = g-1(Xβ) + ε.
GLM is essentially a non-linear model with the linear parameterization in the expected value of Y. To estimate the model, one needs three components:
Y | f(Y) | E(Y) | Var(Y) | |
---|---|---|---|---|
Bernoulli(π) | 0,1 | πY (1-π)1-Y | π | π(1-π) |
Poisson(λ) | 0,1,2,... | exp(-λ) λY/Y! | λ | λ |
Normal(μ,σ) | (-∞,∞) | 1/√(2πσ2) exp[-(Y-μ)2/(2σ2)] | μ | σ2 |
Gamma(λ,ρ) | [0,∞) | λρ/Γ(ρ) exp(-λY) Yρ-1 | ρ/λ | ρ/λ2 |
Exponential(λ) | [0,∞) | λ exp(-λY) | 1/λ | 1/λ2 |
Inverse Normal | ... | |||
Inverse Gamma | ... | |||
... |
The table below lists commonly used link functions and their inverse:
Link | η=g(μ) | μ=g-1(η) |
---|---|---|
Identity | μ | η |
Log | ln(μ) | exp(η) |
Inverse | μ-1 | η-1 |
Inverse-Square | μ-2 | η-0.5 |
Square Root | μ0.5 | η2 |
Logit | ln[μ/(1-μ)] | Λ(η)=exp(η)/[1+exp(η)] |
Probit | Φ-1(μ) | Φ(η) |
Log-log | -ln[-ln(μ)] | exp[-exp(-η)] |
To estimate the coefficients for a GLM model, we use maximum likelihood method.
The model interpretation is typically based on the marginal effect defined by ∂E(Y)/∂X. From the definition of the link function in GLM, g(μ) = η or g(E(Y)) = Xβ, we derive the differentiation ∂g(E(Y))/∂X = g' ∂E(Y)/∂X = β, where g' = ∂g(μ)/∂μ. Therefore ∂E(Y)/∂X = β/g'. For the identity link, g' = 1, or ∂E(Y)/∂X = β.
Family | Link | Log-Likelihood Function: llf(θ) | θ | Notes |
---|---|---|---|---|
Normal(μ,σ) | Identity: μ=Xβ | -Nln(2πσ2)-1/2∑i=1,...,N(Yi-Xiβ)2/σ2 | (β,σ) | This is a linear model |
Normal(μ,σ) | Log: ln(μ)=Xβ | -Nln(2πσ2)-1/2∑i=1,...,N(Yi-exp(Xiβ))2/σ2 | (β,σ) | Not a log-linear model |
Gamma(λ,ρ) | Identity: ρ/λ=Xβ | N[ρ(ln(ρ)-lnΓ(ρ)] +∑i=1,...,N[(ρ-1)ln(Yi)-ln(Xiβ)-ρYi/Xiβ] | (β,ρ) | |
Exponential(λ) | Identity: 1/λ=Xβ | ∑i=1,...,N(-ln(Xiβ)-Yi/Xiβ); | β | |
Exponential(λ) | Inverse: 1/λ=1/Xβ | ∑i=1,...,N(ln(Xiβ)-YiXiβ); | β | |
Poisson(λ) | Identity: λ=Xβ | ∑i=1,...,N(-Xiβ)+Yiln(Xiβ)-ln(Yi!) | β | |
Poisson(λ) | Log: ln(λ)=Xβ | ∑i=1,...,Nexp(-Xiβ)+Yi(Xiβ)-ln(Yi!) | β | |
Bernoulli(π) | Logit: ln(π/(1-π))=Xβ | ∑i=1,...,NYiln(Λ(Xiβ)) +(1-Yi)ln(1-Λ(Xiβ)) | β | Logit Model |
Bernoulli(π) | Probit: Φ-1(π)=Xβ | ∑i=1,...,NYiln(Φ(Xiβ)) +(1-Yi)ln(1-Φ(Xiβ)) | β | Probit Model |
... | ||||
GRADE = β0 + β1GPA + β2TUCE + β3PSI + ε
The following variables are avaialble in the data file GRADE.TXT:
Using maximum likelihood estimation method to represent and estimate the generlized linear model of Bernoulli or binomial distribution with logit and probit link, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.
Y = Number of affairs in the past year: 0, 1, 2, 3, 4-10 (coded as 7), 11 or more (coded as 12).
Here, we present only the model using five explanatory variables as follows:
Z2 | = Age. |
Z3 | = Number of years married. |
Z5 | = Degree of religiousness: 1 (anti-religious), ..., 5 (very religious). |
Z7 | = Hollingshead scale of occupation: 1, ..., 7. |
Z8 | = Self-rating of marriage satisfaction: 1 (very unhappy), ..., 5 (very happy). |
The regression equation is:
Y = β0 + β2Z2 + β3Z3 + β5Z5 + β7Z7 + β8Z8 + ε
By examing the data of extramarital affairs, the preponderance of zeros (no affairs) dependent variable may suggest a possion distribution with log or identity link for the study. Further, with the potential problem of heterogeneity in the count data, a modified Poisson or the negative binomial regression model may be a better modeling framework.
Note that this model may be estimated with a probit or logit specification if the dependent variable Y is modified as:
Y = | 0 | if no extramarital affair |
1 | otherwise (e.g., 1,2,3,7,12) |
Formulate, estimate, and compare the probit (or logit), Poisson, and negative binomial regression models, respectively.
Yi = | 1 with probability Pi |
0 with probability 1-Pi |
It is clear that Xi explains the probability of Yi to be 1 or 0. Let
Pi = Prob(Yi=1|Xi) = F(Xiβ)
1-Pi = Prob(Yi=0|Xi) = 1-F(Xiβ)
Since E(Yi|Xi) = (1)F(Xiβ) + (0)(1-F(Xiβ)) = F(Xiβ), the estimated model may be interpreted with the marginal effects defined by
∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β
Given a sample of N independent observations, the likelihood function is
L(β) = ∏i=1,2,...,N PiYi (1-Pi)1-Yi = ∏i=1,2,...,N F(Xiβ)Yi (1-F(Xiβ))1-Yi
Then the log-likelihood function is
ll(β) = ln(L(β)) = ∑i=1,2,...,N (Yi lnF(Xiβ) + (1-Yi) ln(1-F(Xiβ)))
To maximize ll(β) with respect to β, we solve from the first order condition:
∂ll(β)/∂β | = ∑i=1,2,...,N (Yi/Fi-(1-Yi)/(1-Fi)) fiXi |
= ∑i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0 |
where Fi = F(Xiβ) and fi = f(Xiβ) = ∂Fi/∂(Xiβ). Note that fiXi = ∂Fi/∂β.
Finally, the Hessian ∂ll2(β)/∂β∂β' must be negative definite, and the estimated variance-covariance matrix of β is Var(β) = [-E(∂ll2(β)/∂β∂β')]-1.
It is immediately that E(Yi|Xi) = Xiβ. In particular,
E(εi) | = (1-Xiβ)Pi + (-Xiβ)(1-Pi) = Pi - Xiβ |
Var(εi) | = E(εi2) = Pi(1-Xiβ)2 + (1-Pi)(-Xiβ)2 |
= Pi(1-Pi)2 + (1-Pi)(-Pi)2 = (1-Pi)Pi = (1-Xiβ)(Xiβ) |
The range of Var(εi) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Yi|Xi) = F(Xiβ) = Xiβ, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.
Pi, the cumulative normal distribution, is called Probit for the i-th observation. The model Yi = F-1(Pi) + εi is called the Probit Model, where F-1(Pi) = Xiβ is the inverse of cumulative distribution F(Xiβ). The probit model can be derived from a model involving an unobserved, or latent, variable Yi* such that Yi* = Xiβ + εi where εi ~ normal(0,1). Suppose the value of the observed binary variable Yi depends on the sign of Yi*:
Yi = | 1 if Yi* > 0 |
0 if Yi* ≤ 0 |
Therefore,
Pi = Prob(Yi=1|Xi) = Prob(Yi*>0|Xi)
= Prob(εi>-Xiβ)
= ∫∞-Xiβ
1/(2π)½ exp(-z2/2) dz
= ∫-∞Xiβ
1/(2π)½ exp(-z2/2) dz
For maximum likelihood estimation, we solve the following first order condition:
∑i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0
where Fi = F(Xiβ) =
∫-∞Xiβ
1/(2π)½ exp(-z2/2) dz, and
fi =
∂F(Xiβ)/∂(Xiβ) =
1/(2π)½
exp(-(Xiβ)2/2)
This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Yi = F(Xiβ) + εi with weights given by [F(Xiβ)(1-F(Xiβ))]-½.
Furthermore, it can be shown that for the maximum likelihood estimates β
E([∂2ll(β)/∂β∂β']) = -∑i=1,2,...,N(fi2XiXi')/(Fi(1-Fi))
which is negative definite. The estimated variance-covariance matrix of β is computed as
Var(β) = (-E[∂2ll(β)/∂β∂β'])-1
If the normal probability model is misspecified, then Quasi-Maximum Likelihood (QML) estimation is suggested by correcting the asymptotic variance-covariance matrix with a robust ("sandwich") estimator as follows:
Var(β) = (-H)-1G(-H)-1
where H = E[∂2ll(β)/∂β∂β'], and G = E[∂ll(β)/∂β][∂ll(β)/∂β'].
For model interpretation, the marginal effects of Xi is defined as
∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β = f(Xiβ)β = fiβ
Pi as defined is the logistic curve. The model Yi = F-1(Pi) + εi is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to Xiβ, or the odd ratio model: ln(Pi/(1-Pi)) = Xiβ Solving for Pi, we find that
Pi = exp(Xiβ)/(1+exp(Xiβ)) = 1/(1+exp(-Xiβ))
For maximum likelihood estimation, we solve the following first order condition:
∑i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0
Because of the logistic functional form,
Fi = F(Xiβ) =
1/(1+exp(-Xiβ)) and
fi = ∂F(Xiβ)/∂(Xiβ) =
exp(-Xiβ)/(1+exp(-Xiβ)) =
Fi(1-Fi)
it amounts to solve the following simple expression:
∑i=1,2,...,N (Yi-Fi)Xi = 0
with the negative definite Hessian:
∂2ll(β)/∂β∂β' = - ∑i=1,2,...,NFi(1-Fi)Xi'Xi
Therefore, the estimate of variance-covariance matrix of β is
Var(β) = [-∂2ll(β)/∂β∂β']-1
For model interpretation, the marginal effects of Xi is defined as
∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β = fiβ = Fi(1-Fi)β
Suppose Y = {0,1,2,...} follows a Poisson distribution with a parameter λ>0:
f(Y|λ) = e-λλY / Y!
It is known that E(Y) = Var(Y) = λ. If Y is to be explained by X such that E(Y|X) > 0, a natural approach is to set λ = E(Y|X) and parameterized by the regression parameter β in the Poisson distribution function. For example,
λ(X,β) = E(Y|X,β) = eXβ > 0
Therefore, given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, the likelihood function is written as:
L(β) = ∏i=1,2,...,N [e-λ(Xi,β) λ(Xi,β)Yi / Yi!]
The corresponding log-likelihood function is
ll(β) = ∑i=1,2,...,NYiln(λ(Xi,β)) - ∑i=1,2,...,Nλ(Xi,β) - ∑i=1,2,...,Nln(Yi!)
Maximum likelihood estimate of β is obtained from:
∂ll/∂β = ∑i=1,2,...,N[(Yi-λ(Xi,β))/λ(Xi,β)] [∂λ(Xi,β)/∂β] = 0, and
∂2ll/∂β∂β' | = ∑i=1,2,...,N[(Yi-λ(Xi,β))/λ(Xi,β)] [∂2λ(Xi,β)/∂β∂β'] |
+ ∑i=1,2,...,N[-Yi/λ(Xi,β)2] [∂λ(Xi,β)/∂β'] [∂λ(Xi,β)/∂β] |
If λ(Xi,β) = E(Yi|Xi,β) = eXiβ, the model is interpreted as:
∂E(Yi|Xi,β)/∂Xij = eXiββj = E(Yi|Xi,β)βj, or
βj = ∂E(Yi|Xi,β)/∂Xij / E(Yi|Xi,β)
E(Y|X,β,v) = λv = eXβv
Then Y follows a Poisson distribution with the density:
f(Y|λv) = e-λv(λv)Y / Y!
Suppose v follows a gamma distribution with E(v) = 1 and Var(v) = 1/θ. That is,
g(v|θ) = θθ/Γ(θ) vθ-1e-θv
Therefore,
f(Y|λ,θ) | = ∫0∞ e-λv(λv)Y/Y! g(v|θ) dv |
= (θθλY)/(Γ(θ)Y!) ∫0∞ e-(λ+θ)v v(Y+θ-1) dv | |
= [(θθλY)/(Γ(θ)Y!)] [Γ(Y+θ)/(λ+θ)y+θ] | |
= [Γ(Y+θ)/(Γ(θ)Y!)] [λ/(λ+θ)]Y [1-λ/(λ+θ)]θ |
This is one form of negative binomial distribution with mean λ and variance λ(1+λ/θ). By construction, it is a Poisson-Gamma mixture. Typically, the parameter 1/θ is used to measure the extent of overdispersion. Given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, and let λi = λ(Xi,β) = eXiβ, the log-likelihood function is written as:
ll(β,θ) = ∑i=1,2,...N ln f(Yi|λi,θ)
The negative binomial model can be estimated by maximum likelihood without much difficulty. A test of the Poisson distribution is often carried out by testing the hypothesis 1/θ -> 0.