Topic 6

Qualitative Choice Models

Binary Choice Models
- Linear Probability Model
- Logit Model
- Probit Model
Multinomial Choice Models
Limited Dependent Variable Models
- Tobit Model
Count Data and Poisson Regression Model
Duration Data

Readings and References:

W. H. Greene, Econometric Analysis, 5th Ed., Chapter 21: Models with Discrete Dependent Variables, and Chapter 22: Limited Dependent Variable and Duration Models, Prentice-Hall, 2002.
A. C. Cameron and P. K. Trivedi, Microeconometrics: Methods and Applications, Cambridge University Press, 2005.
J. M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, Part IV, Nonlinear Models and Related Topics, The MIT Press, 2002.
P. J. Dhrymes, "Limited Dependent Variables," Handbook of Econometrics, Vol. 3, Chapter 27, ed. by Z. Griliches and M. D. Intriligator, North-Holland, 1983, 1567-1631 (Paper).
J. Kennan, "The Duaration of Contract Strikes in U.S. Manufacturing," Journal of Econometrics, 28, 1985, 5-28.
N. M. Kiefer, "Economic Duration Data and Hazard Functions," Journal of Economic Literature, 1988, 646-679 (Paper).

Binary Choice Models

Consider a linear regression model Y = Xβ + ε, where

Y_i = 1 with probability P_i

0 with probability 1-P_i

It is clear that X_i explains the probability of Y_i to be 1 or 0. Let

P_i = Prob(Y_i=1|X_i) = F(X_iβ)
1-P_i = Prob(Y_i=0|X_i) = 1-F(X_iβ)

Since E(Y_i|X_i) = (1)F(X_iβ) + (0)(1-F(X_iβ)) = F(X_iβ), the estimated model may be interpreted with the marginal effects defined by

∂E(Y_i|X_i)/∂X_i = [∂F(X_iβ)/∂(X_iβ)] β

Given a sample of N independent observations, the likelihood function is

L(β) = ∏_i=1,2,...,N P_i^Yi (1-P_i)^1-Yi = ∏_i=1,2,...,N F(Xiβ)^Yi (1-F(X_iβ))^1-Yi

Then the log-likelihood function is

ll(β) = ln(L(β)) = ∑_i=1,2,...,N (Y_i lnF(X_iβ) + (1-Y_i) ln(1-F(X_iβ)))

To maximize ll(β) with respect to β, we solve from the first order condition:

∂ll(β)/∂β = ∑_i=1,2,...,N (Y_i/F_i-(1-Y_i)/(1-F_i)) f_iX_i

= ∑_i=1,2,...,N (Y_i-F_i)/(F_i(1-F_i)) f_iX_i = 0

where F_i = F(X_iβ) and f_i = f(X_iβ) = ∂F_i/∂(X_iβ). Note that f_iX_i = ∂F_i/∂β.

Finally, the Hessian ∂ll²(β)/∂β∂β' must be negative definite, and the estimated variance-covariance matrix of β is Var(β) = [-E(∂ll²(β)/∂β∂β')]^-1.

Linear Probability Model

P_i = F(X_iβ) = X_iβ

It is immediately that E(Y_i|X_i) = X_iβ. In particular,

E(ε_i) = (1-X_iβ)P_i + (-X_iβ)(1-P_i) = P_i - X_iβ

Var(ε_i) = E(ε_i²) = P_i(1-X_iβ)² + (1-P_i)(-X_iβ)²

= P_i(1-P_i)² + (1-P_i)(-P_i)² = (1-P_i)P_i = (1-X_iβ)(X_iβ)

The range of Var(ε_i) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Y_i|X_i) = F(X_iβ) = X_iβ, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.

Probit Model

P_i = F(X_iβ) = ∫_-∞^Xiβ 1/(2π)^½ exp(-z²/2) dz

P_i, the cumulative normal distribution, is called Probit for the i-th observation. The model Y_i = F^-1(P_i) + ε_i is called the Probit Model, where F^-1(P_i) = X_iβ is the inverse of cumulative distribution F(X_iβ). The probit model can be derived from a model involving an unobserved, or latent, variable Y_i^* such that Y_i^* = X_iβ + ε_i where ε_i ~ normal(0,1). Suppose the value of the observed binary variable Y_i depends on the sign of Y_i^*:

Y_i = 1 if Y_i^* > 0

0 if Y_i^* ≤ 0

Therefore,

P_i = Prob(Y_i=1|X_i) = Prob(Y_i^*>0|X_i) = Prob(ε_i>-X_iβ)
= ∫^∞_{-X_iβ} 1/(2π)^½ exp(-z²/2) dz
= ∫_-∞^Xiβ 1/(2π)^½ exp(-z²/2) dz

For maximum likelihood estimation, we solve the following first order condition:

∑_i=1,2,...,N (Y_i-F_i)/(F_i(1-F_i)) f_iX_i = 0

where F_i = F(X_iβ) = ∫_-∞^Xiβ 1/(2π)^½ exp(-z²/2) dz, and
f_i = ∂F(X_iβ)/∂(X_iβ) = 1/(2π)^½ exp(-(X_iβ)²/2)

This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Y_i = F(X_iβ) + ε_i with weights given by [F(X_iβ)(1-F(X_iβ))]^-½.

Furthermore, it can be shown that for the maximum likelihood estimates β

E([∂²ll(β)/∂β∂β']) = -∑_i=1,2,...,N(f_i²X_iX_i')/(F_i(1-F_i))

which is negative definite. The estimated variance-covariance matrix of β is computed as

Var(β) = (-E[∂²ll(β)/∂β∂β'])^-1

If the normal probability model is misspecified, then Quasi-Maximum Likelihood (QML) estimation is suggested by correcting the asymptotic variance-covariance matrix with a robust ("sandwich") estimator as follows:

Var(β) = (-H)^-1G(-H)^-1

where H = E[∂²ll(β)/∂β∂β'], and G = E[∂ll(β)/∂β][∂ll(β)/∂β'].

For model interpretation, the marginal effects of X_i is defined as

∂E(Y_i|X_i)/∂X_i = [∂F(X_iβ)/∂(X_iβ)] β = f(X_iβ)β = f_iβ

Logit Model

P_i = F(X_iβ) = 1/(1+exp(-X_iβ))

P_i as defined is the logistic curve. The model Y_i = F^-1(P_i) + ε_i is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to X_iβ, or the odd ratio model: ln(P_i/(1-P_i)) = X_iβ Solving for P_i, we find that

P_i = exp(X_iβ)/(1+exp(X_iβ)) = 1/(1+exp(-X_iβ))

For maximum likelihood estimation, we solve the following first order condition:

∑_i=1,2,...,N (Y_i-F_i)/(F_i(1-F_i)) f_iX_i = 0

Because of the logistic functional form,

F_i = F(X_iβ) = 1/(1+exp(-X_iβ)) and
f_i = ∂F(X_iβ)/∂(X_iβ) = exp(-X_iβ)/(1+exp(-X_iβ)) = F_i(1-F_i)

it amounts to solve the following simple expression:

∑_i=1,2,...,N (Y_i-F_i)X_i = 0

with the negative definite Hessian:

∂²ll(β)/∂β∂β' = - ∑_i=1,2,...,NF_i(1-F_i)X_i'X_i

Therefore, the estimate of variance-covariance matrix of β is

Var(β) = [-∂²ll(β)/∂β∂β']^-1

For model interpretation, the marginal effects of X_i is defined as

∂E(Y_i|X_i)/∂X_i = [∂F(X_iβ)/∂(X_iβ)] β = f_iβ = F_i(1-F_i)β

Example 1

This example (see also, Greene [1999], Example 19.1) examines the effect of a new teaching method on students' grades. The following variables in the data file GRADE.TXT are used:

GRADE = An indicator of whether the student's grade on an examination improved after exposure to the new teaching method PSI
PSI = An indicator of whether the student was exposed to the new teaching method
TUCE = Score of a pretest that indicates entering knowledge of the material
GPA = Grade point average

The qualitative equation is formulated as follows:

GRADE = β₀ + β₁GPA + β₂TUCE + β₃PSI + ε

Estimate and interpret the logit and probit probablity model specifications of the above equation, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.

Homework

To consider the potential problem of heteroscedasticity in the above cross section study, we assume a form of multiplicative heterscedasticity associated with the important variable PSI in the qualitative regression equation:

GRADE = β₀ + β₁GPA + β₂TUCE + β₃PSI + ε

where for each observation i, the heteroscedastic variance is defined by

σ_i² = exp(αPSI_i)²

That is, when PSI_i = 0, σ_i² = 1; when PSI_i equals 1, σ_i² = exp(α)², where α is the unknown parameter in addition to βs for model estimation.

Formulate and estimate the Probit and Logit models with multiplicative heteroscedasticity defined above.
Interpret the models based on the marginal effects of each explanatory variables (GPA, TUCE, PSI) at the means.
Compare and test the models with and without heteroscedasticity correction (Hint: See Greene [1999], Chapter 19, Example 19.7).

Multinomial Choice Models

When a single decision is made among two or more alternatives, the outcome may be ordered (with a preference rank) or unordered.

Unordered Model

Suppose there are J+1 alternatives: j=0,1,...,J. For each observation i (e.g. an individual), the decision of selecting the alternative j is described by:

Y_i = X_iβ^j + ε_i, where

Y_i = 0 with probability P_i0

1 with probability P_i1

...

J with probability P_iJ

That is, for each i=1,2,...,N, P_ij = Prob(Y_i=j|X_i), j=0,1,...,J, and ∑_j=0,1,...,JP_ij = 1.

Notice that the estimated parameter vector β^j is alternative specific. For notational convenience, let β = [β⁰,β¹,...,β^J]'. We further assume a binary decision outcome for each individual i on selecting alternative j as follows:

d_ij = 1 if Y_i = j

0 if Y_i ≠ j

Then the log-likelihood function for the model is

ll(β) = ∑_i=1,2,...,N ∑_j=0,1,...,J d_ijln(P_ij)

Multinomial Logit Model

P_ik = exp(X_iβ^k) / ∑_j=0,1,...Jexp(X_iβ^j), k = 0,1,...,J.

It is clear that if β = [β⁰, β¹, ..., β^J]' maximizes the log-likelihood function, so will be the vector β + γ for any constant γ. Normalize the parameters vector β with β⁰ = 0, a zero vector, gives a consistent representation of a binary logit model when J=1 as follows:

P_i0 = 1 / (1+∑_j=1,...Jexp(X_iβ^j)),
...
P_ik = exp(X_iβ^k) / (1+∑_j=1,...Jexp(X_iβ^j)), k = 1,...,J.

Finally, the log of the odds between alternatives j and k is simply

ln(P_ij/P_ik) = X_i(β^j - β^k).

Conditional Logit Model

When the data consist of choice-specific attributes instead of individual-specific characteristics, the probabilities should be defined by

P_ik = exp(X_ikβ) / ∑_j=1,...Jexp(X_ijβ), k = 1,...,J.

Note that X_ij can not contain the constant. The interpretation of the model is based on the marginal effects as ∂P_ij/∂X_ik.

Example

Greene (2003) applies a choice model to a 210-respondent dataset recording travel mode between Sydney and Melbourne in Australia (Data). The modes of travel are air, train, bus, and car, and the chosen mode is recorded with 0 (not chosen) and 1 (chosen). The choice-specific variables include:

GC = Generalized cost constructed from measures on the in-vehicle cost (INVC) and on time spent on traveling (INVT).
TTIME = Terminal time (e.g., zero waiting time for traveling by car).

In addition, there are two individual-specific variables:

HINC = Household income
PSIZE = Party size in the chosen model

Let X consists of the two individual specific variables HINC, PSIZE and a constant term. By allowing choice-specific parameters, formulate and estimate the multinomial logistic model.

By comparing the choice between air and the other modes, define the dummy variable as follow:

AIRINC = Air-Travel*HINC

With the choice-specific variables GC and TTIME are used, while HINC, PSIZE, and AIRINC are the explanatory variables for the multinominal logit model.

Ordered Model

Consider the latent variable model: Y_i^* = X_iβ + ε_i, where ε_i ~ normal(0,1). Let

Y_i = 0 if Y_i^* < α₁

1 if α₁ ≤ Y_i^* < α₂

...

J if α_J ≤ Y_i^*

The parameters of this model consist of β and α = [α₁,α₂,...,α_J]' with α_J > ...> α₂ > α₁. The α_i's are thresholds which determine the value of Y_i depending on the specific interval of Y_i^* will map into. Therefore,

It is clear that P_i0 = 1 - ∑_j=1,2,...,JP_ij. The estimated parameters (β,α) and the corresponding variance-covariance matrix are obtainted from maximizing the log-likelihood function:

ll(β,α) = ∑_i=1,2,...,N ∑_j=0,1,...,J d_ijln(P_ij)

where

d_ij = 1 if Y_i = j

0 if Y_i ≠ j

Ordered Probit Model

P_i0 = ∫_-∞^α₁-X_iβ 1/(2π)^½ exp(-z²/2) dz
P_i1 = ∫_{α₁-X_iβ}^α₂-X_iβ 1/(2π)^½ exp(-z²/2) dz
...
P_ij = ∫_{α_j-X_iβ}^α_j+1-X_iβ 1/(2π)^½ exp(-z²/2) dz, j = 1, 2, ..., J-1
...
P_iJ = ∫_{α_J-X_iβ}^∞ 1/(2π)^½ exp(-z²/2) dz

Since the threshold parameters α = [α₁,α₂,...,α_J]' must be estimated with the regression parameters β, the explanatory variables X_i should not include a constant term for an ordered probit model.

Limited Dependent Variable Models

If the random decision variable follows a mixture of (discrete or continuous) probability distributions, we have the limited dependent variable (truncated or censored regression) model. If the sample is said to be truncated, there is no corresponding data for the variables. If the sample is said to be censored, the data is observable. We consider only the case of censored data.

Recall the latent variable interpretation of the probit model,

Y_i^* = X_iβ + ε_i

where ε_i ~ normal (0,σ²), and

Y_i = 1 if Y_i^* > 0

0 if Y_i^* ≤ 0

Suppose, however, that Y_i is censored-that is, we restrict the number (or kinds) of values that Y_i can take. As an example, consider the following:

Y_i = Y_i^* if Y_i^* > 0

0 if Y_i^* ≤ 0

That is, Y_i = max(Y_i^*,0) = max(X_iβ+ε_i,0).

Tobit Model

Define f_i and F_i to be the probability density function and cumulative density function of a standardized normal random variable evaluated at X_iβ/σ. That is,

F_i = F(X_iβ/σ) = ∫_-∞^X_iβ/σ 1/(2π)^½ exp(-z²/2) dz
f_i = f(X_iβ/σ) = 1/(2π)^½ exp[-(X_iβ/σ)²/2]

For the observations such that Y_i = 0 or Y_i^* = X_iβ + ε_i ≤ 0, the likelihood function is

Prob(Y_i = 0) = Prob(ε_i ≤ -X_iβ) = Prob(ε_i/σ ≤ -X_iβ/σ) = F(-X_iβ/σ) = 1-F(X_iβ/σ) = 1-F_i

If Y_i > 0, on the other hand, then the likelihood function is simply the normal density function:

1/(2πσ²)^½ exp[-(Y_i-X_iβ)²/(2σ²)]

Therefore the likelihood function for the Tobit model is a mixture of the above distributions depending on the values taken by the dependent variable (i.e., zero or positive):

L(β,σ²) = ∏_{{i|Y_i=0}}(1-F(X_iβ/σ)) ∏_{{i|Y_i>0}}1/(2πσ²)^½ exp[-(Y_i-X_iβ)²/(2σ²)]

The corresponding log-likelihood function is

ll(β,σ²) = ∑_{{i|Y_i=0}}ln(1-F(X_iβ/σ)) -1/2 ∑_{{i|Y_i>0}}[ln(2π)+ln(σ²)+(Y_i-X_iβ)²/σ²]

Then, for the maximum likelihood estimation, we solve from the following first-order conditions:

∂ll/∂β = -(1/σ)∑_{{i|Y_i=0}}f_iX_i/(1-F_i) +(1/σ)∑_{{i|Y_i>0}}(Y_i-X_iβ)X_i = 0
∂ll/∂σ² = (1/2)(1/σ³)∑_{{i|Y_i=0}}f_iX_iβ/(1-F_i) -(1/2)(1/σ²)∑_{{i|Y_i>0}}[1-(Y_i-X_iβ)²/σ²] = 0

If the model error does not follow a normal probability distribution, Quasi-Maximum Likelihood (QML) estimation corrects the estimated asymptotic variance-covariance matrix of β with a robust estimator as follows:

Var(θ) = (-H)^-1G(-H)^-1

where θ = (β,σ²), H = E[∂²ll(θ)/∂θ∂θ'], and G = E[∂ll(θ)/∂θ][∂ll(θ)/∂θ'].

To interpret the estimated coefficients of the model, we may use three conditional expected values:

The first expected value (corresponding to the "uncensored" case) is easy to obtain. The last expected value will be of particular interest if our sample contains many censored observations. Accordingly, for the j-th explanatory variable, the corresponding marginal effects are:

We note that the last censored marginal effect differs from the first uncensored one by a scale factor equal to the probability of that observation not being censored. In other words, the scale factor is equal to F_i (recall that F_i is 1-Prob(Y_i=0)).

The tobit model is often estimated for comparison with the alternative probit or count model specifications. The model can be easily extended to consider more than one censoring point. For instance, we could censor both tails of the distribution. This is an example of a doubly censored regression.

Example 2: Tobit Analysis of Extramarital Affairs

This example is taken from Greene (2002, 22.3.6), which is based on Fair (1978). The study examines the qualitative responses to a question about extramarital affairs from a sample of 601 men and women married for the first time. The dependent variable is:

Y = Number of affairs in the past year: 0, 1, 2, 3, 4-10 (coded as 7), 11 or more (coded as 12).

The preponderance of zeros (no affairs) may not render the tobit model to be the best for the study, here we present only the model using five explanatory variables as follows:

Z2 = Age.

Z3 = Number of years married.

Z5 = Degree of religiousness: 1 (anti-religious), ..., 5 (very religious).

Z7 = Hollingshead scale of occupation: 1, ..., 7.

Z8 = Self-rating of marriage satisfaction: 1 (very unhappy), ..., 5 (very happy).

The regression equation is:

Y = β₀ + β₂Z₂ + β₃Z₃ + β₅Z₅ + β₇Z₇ + β₈Z₈ + ε

where ε ~ normal(0,σ²I). The estimation and interpretation of the estimated tobit model are left as exercises (Data).

Homework

The Fair Model may be estimated with a probit specification, but the interpretation is entirely different from that of the tobit model. Let the dependent variable Y be:

Y = 0 if no extramarital affair

1 otherwise (e.g., 1,2,3,7,12)

If the specification of the tobit model is correct, then probit estimators should be consistent for β/σ from the tobit model.
Since the coded data of 7 and 12 are not the actual data, these right-hand-side values (7 and 12) for the tobit model are difficult for interpretation. An alternative is to consider the dependent variable not only censored at 0 but also censored at the right, says 4 for "4 or more". Therefore the doubly censored tobit model as:

Y = 0 if Y^*≤ 0

Y^* if 0 < Y^* < 4 or {1,2,3}

4 if Y^*≥ 4 or {7,12}

where Y^* = Zβ+ε and ε ~ normal(0,σ²I).
We have shown that the probability of Y = 0 is:
Prob(Y_i = 0) = Prob(Y_i^* ≤ 0) = Prob(ε_i ≤ -Z_iβ) = Prob(ε_i/σ ≤ -Z_iβ/σ)
It is easy to show that the probability of Y = 4 is:
Prob(Y_i = 4) = Prob(Y_i^*≥4) = Prob(ε_i ≥ 4-Z_iβ) = Prob(ε_i/σ ≥ (4-Z_iβ)/σ)
Finally, for Y_i = {1,2,3}, the likelihood is simply the normal density function:
1/(2πσ²)^½ exp[-(Y_i-Z_iβ)²/(2σ²)]
Then the corresponding log-likelihood function is
ll(β,σ²) = ∑_{{i|Y_i=0}}lnF(-Z_iβ/σ) -1/2 ∑_{{i|Y_i=1,2,3}}[ln(2π)+ln(σ²)+(Y_i-X_iβ)²/σ²] +∑_{{i|Y_i=4}}ln(1-F((4-Z_iβ)/σ))
Estimate and interpret the doubly-censored tobit model.

Count Data and Poisson Regression Model

If a decision variable takes values of nonnegative integers, in which there is no prior upper bound and there are some zeros, this is the model of count data.

Suppose Y = {0,1,2,...} follows a Poisson distribution with a parameter λ>0:

f(Y|λ) = e^-λλ^Y / Y!

It is known that E(Y) = Var(Y) = λ. If Y is to be explained by X such that E(Y|X) > 0, a natural approach is to set λ = E(Y|X) and parameterized by the regression parameter β in the Poisson distribution function. For example,

λ(X,β) = E(Y|X,β) = e^Xβ > 0

Therefore, given a sample of independent observations {(Y_i,X_i), i=1,2,...,N}, the likelihood function is written as:

L(β) = ∏_i=1,2,...,N [e^-λ(X_i,β) λ(X_i,β)^Y_i / Y_i!]

The corresponding log-likelihood function is

ll(β) = ∑_i=1,2,...,NY_iln(λ(X_i,β)) - ∑_i=1,2,...,Nλ(X_i,β) - ∑_i=1,2,...,Nln(Y_i!)

Maximum likelihood estimate of β is obtained from:

∂ll/∂β = ∑_i=1,2,...,N[(Y_i-λ(X_i,β))/λ(X_i,β)] [∂λ(X_i,β)/∂β] = 0, and

∂²ll/∂β∂β' = ∑_i=1,2,...,N[(Y_i-λ(X_i,β))/λ(X_i,β)] [∂²λ(X_i,β)/∂β∂β']

+ ∑_i=1,2,...,N[-Y_i/λ(X_i,β)²] [∂λ(X_i,β)/∂β'] [∂λ(X_i,β)/∂β]

is negative definite.

If λ(X_i,β) = E(Y_i|X_i,β) = e^X_iβ, the model is interpreted as:

∂E(Y_i|X_i,β)/∂X_ij = e^X_iββ_j = E(Y_i|X_i,β)β_j, or

β_j = ∂E(Y_i|X_i,β)/∂X_ij / E(Y_i|X_i,β)

Heterogeneity and Negative Binomial Regression Model

Maximum likelihood estimation of the Poisson regression model suffers from the problem of overdispersion due to the fact that Var(Y|X) = E(Y|X) when Y follows a Poisson distribution. We generalize the Poisson model by introducing an individual unobservable effect v>0 into the conditional mean:

E(Y|X,β,v) = λv = e^Xβv

Then Y follows a Poisson distribution with the density:

f(Y|λv) = e^-λv(λv)^Y / Y!

Suppose v follows a gamma distribution with E(v) = 1 and Var(v) = 1/θ. That is,

g(v|θ) = θ^θ/Γ(θ) v^θ-1e^-θv

Therefore,

f(Y|λ,θ) = ∫₀^∞ e^-λv(λv)^Y/Y! g(v|θ) dv

= (θ^θλ^Y)/(Γ(θ)Y!) ∫₀^∞ e^-(λ+θ)v v^(Y+θ-1) dv

= [(θ^θλ^Y)/(Γ(θ)Y!)] [Γ(Y+θ)/(λ+θ)^y+θ]

= [Γ(Y+θ)/(Γ(θ)Y!)] [λ/(λ+θ)]^Y [1-λ/(λ+θ)]^θ

This is one form of negative binomial distribution with mean λ and variance λ(1+λ/θ). By construction, it is a Poisson-Gamma mixture. Typically, the parameter 1/θ is used to measure the extent of overdispersion. Given a sample of independent observations {(Y_i,X_i), i=1,2,...,N}, and let λ_i = λ(X_i,β) = e^X_iβ, the log-likelihood function is written as:

ll(β,θ) = ∑_i=1,2,...N ln f(Y_i|λ_i,θ)

The negative binomial model can be estimated by maximum likelihood without much difficulty. A test of the Poisson distribution is often carried out by testing the hypothesis 1/θ -> 0.

Homework

By examing the data of extramarital affairs in the previous example, the dependent variable is a count, not a continuous measurement. The Poisson or negative binomial regression model may be a better modeling framework.

As discussed earlier, responses of 7 and 12 do not represent the actual data. We have re-coded 4 for both values of 7 and 12 as "4 or more" and treated it as a right censored observation. The Poisson and negative binomial model may be modified to consider the censored data.

Formulate, estimate, and compare the censored and uncensored versions of Poisson and negative binomial regression models, respectively.

Duration Data

A non-negative random variable T may be used to represent the duration of an initial state (e.g., hospital stay, unemployment). Let t ≥ 0 be a particular value of T, then the probability of staying in the initial state at t is denoted by

F(t) = Pr(T ≤ t)

Clearly, F(0) = 0, and

1-F(t) = Pr(T>t) = S(t)

is the probability of "surviving" past t. Therefore S(t) is called survivor function.

Given the distribution of duration F, we can define the probability of "exiting the initial state" in the time interval (t,t+Δt] when the event has survived through t (or T>t) as follows:

Pr(t<T≤t+Δt|T>t) = (Pr(T≤t+Δt)-Pr(T≤t)) / Pr(T>t)

= (F(t+Δt)-F(t)) / (1-F(t))

Taking the limit, we have

lim_Δt→0Pr(t<T≤t+Δt|T>t)
= f(t)/(1-F(t)) where f(t) = dF(t)/dt is the density function of T at t.
= f(t)/S(t)
= [-dS(t)/dt]/S(t)
= -dln(S(t))/dt, negative rate of survival.

Define the hazard function as:

h(t) = -dln(S(t))/dt

That is, ln(S(t)) = - ∫₀^th(τ)dτ, or

S(t) = e^{-∫₀^th(τ)dτ}
F(t) = 1 - S(t) = 1 - e^{-∫₀^th(τ)dτ}
f(t) = dF(t)/dt = h(t) e^{-∫₀^th(τ)dτ}

We need to specify the hazard function h(t) in order to study the duration data. Denote h(t|θ) the hazard function of t with unknown parameter θ.

Constant Hazard

Assume h(t|θ) = θ, independent of t. Then, ∫₀^th(τ|θ)dτ = θt. We have,

S(t|θ) = e^-θt,
F(t|θ) = 1-e^-θt, and
f(t|θ) = θe^-θt

In this case, F is the exponential distribution. θ is the parameter which can be estimated by the reciprocal of the sample mean because E(T) = 1/θ for the exponential distributed random variable T.

Linear Hazard

Assume h(t|θ) = α+βt, where θ={α,β}. Then,

S(t|θ) = e^-(α+½βt)t
F(t|θ) = 1-e^-(α+½βt)t, and
f(t|θ) = (α+βt)e^-(α+½βt)t

The problem is that the estimated hazard h(t|θ) may be negative.

Log-Logistic Hazard

The log duration ln(T) is assumed to have a logistic distribution with mean -ln(λ) and variance π²/(3ρ²). It is clear that the hazards first increase then decrease, where

h(t|θ) = λρ(λt)^ρ-1/[1+(λt)^ρ]
S(t|θ) = 1/[1+(λt)^ρ]

Therefore, f(t|θ) = λρ(λt)^ρ-1/[1+(λt)^ρ]²

Log-Normal Hazard

The log duration ln(T) is assumed to be normally distributed with mean -ln(λ) and standard deviation 1/ρ. The hazards first increase then decrease, where

S(t|θ) = ∫_-∞^-ρln(λt) 1/(2π)^½exp(-z²/2)dz
f(t|θ) = (ρ/t) 1/(2π)^½exp(-½[ρln(λt)]²)

Duration Dependence and Weibull Distribution

If h(t|θ) = λρ(λt)^ρ-1, where θ = {λ,ρ}; λ > 0 is the location parameter and ρ > 0 is the scale parameter. This is the hazard function for the Weibull distribution. The duration dependent hazard function h(t|θ) is monotonically increasing or decreasing depending on ρ < 1 or ρ > 1, respectively. It is easy to show that

S(t|θ) = e^{-(λt)^ρ}
F(t|θ) = 1-e^{-(λt)^ρ}
f(t|θ) = λρ(λt)^(ρ-1) e^{-(λt)^ρ}

Model Interpretation

From the estimated survival distribution S(t|θ), we can interpret the model based on expected and median duration. By definition, the median duration is the duration in time t so that S(t|θ) = 0.5.

Survival
Distribution Median
Duration Expected
Duration

Exponential (1/λ)ln(2) (1/λ)γ(2)

Weibull (1/λ)[ln(2)]^1/ρ (1/λ)γ(1+1/ρ)

Log-Normal (1/λ) (1/λ)[exp(1/ρ²)]^½

Log-Logistic (1/λ) (1/λ)[exp(π²/(3ρ²)]^½

Likelihood Function

Given sample observations of duration data {t₁,t₂,...,t_N}, based on hazard function h(t|θ) and therefore the probability density f(t|θ) = h(t|θ)S(t|θ), the log-likelihood function for model estimation is

ll(θ) = ∑_i=1,2,...,N ln f(t_i|θ) = ∑_i=1,2,...,N [ln h(t_i|θ) + ln S(t_i|θ)]

The vector of parameters θ of these models can be estimated by maximum likelihood. Censored observations can be incorporated easily as such:

ll(θ) = ∑_{t=uncensored obs.} ln f(t|θ) + ∑_{t=censored obs.} ln S(t|θ)

= ∑_{t=uncensored obs.} ln h(t|θ) + ∑_{t=all obs.} ln S(t|θ)

For example, the log-likelihood function for Weibull distribution is:

ll(λ,ρ) = ∑_i=1,2,...,N [ln(ρ)-ln(t_i)] + ∑_i=1,2,...,N ρ[ln(λ)+ln(t_i)] - ∑_i=1,2,...,N (λt_i)^ρ

The introduction of explanatory variables to the duration models is fairly straightforward, although the interpretation of the estimated parameters in the model is difficult. Consider, for example, the Weibull model. Let λ_i = e^-X_iβ where X_i is assumed to be the same from T=0 to T=t. The corresponding log-likelihood function is:

ll(β,λ,ρ) = ∑_i=1,2,...,N[ln(ρ)-ln(t_i)] + ∑_i=1,2,...,Nρ(ln(t_i)-X_iβ) - ∑_i=1,2,...,N(e^-X_iβt_i)^ρ

Heterogeneity

The problem of heterogeneity in duration models can be viewed essentially as the result of an incomplete specification. There are a number of ways of extending duration models to account for heterogeneity. One direct approach is to model heterogeneity in the parametric model.

To incorporate heterogeneity into the Weibull model, suppose the survival function is conditioned on the individual effect v as:

S(t|v) = e^{-(vλt)^ρ}

and v follows a gamma distribution with E(v)=1 and Var(v)=1/γ:

g(v) = [γ^γ/Γ(γ)] v^γ-1e^-γv

Then, S(t) = E_v[S(t|v)] = ∫_vS(t|v)g(v)dv = [1+(1/γ)(λt)^ρ]^-γ

The limiting value, with 1/γ → 0, is the Weibull survival model, so Var(v) → 0 or no heterogeneity. The corresponding hazard function is

h(t) = λρ(λt)^ρ-1[S(t)]^1/γ = λρ(λt)^ρ-1/[1+(1/γ)(λt)^ρ]

Therefore,

f(t) = λρ(λt)^ρ-1[S(t)]^1+1/γ = λρ(λt)^ρ-1/[1+(1/γ)(λt)^ρ]^1+γ

Example 3

Using a set of strike duration data for the U.S. manufacturing from 1968 to 1976, Kennan [1985] studied 62 cases of strike duration T in days. In addition, a covariate X is used to explain the duration of a strike. X is a measure of "unanticipated" aggregate industrial production net of seasonal and trend components (See Data).

Estimate and compare four hazard functions: (1) Exponential (2) Weibull (3) Log-Normal (4) Log-Logistic. Interpret four estimated models with median and expected duration.
Estimate and test the potential problem of heterogeneity in the context of the Weibull model.
Based on Weibull distribution, formulate and estimate the model with the "unanticipated" industrial production variable included as a covariate.
As suggested by Kiefer [1989], formulate and estimate the above model with the same duration data but censored at 80 days.

∂ll(β)/∂β	= ∑_i=1,2,...,N (Y_i/F_i-(1-Y_i)/(1-F_i)) f_iX_i
	= ∑_i=1,2,...,N (Y_i-F_i)/(F_i(1-F_i)) f_iX_i = 0

E(ε_i)	= (1-X_iβ)P_i + (-X_iβ)(1-P_i) = P_i - X_iβ
Var(ε_i)	= E(ε_i²) = P_i(1-X_iβ)² + (1-P_i)(-X_iβ)²
	= P_i(1-P_i)² + (1-P_i)(-P_i)² = (1-P_i)P_i = (1-X_iβ)(X_iβ)

Y_i =	0 with probability P_i0
	1 with probability P_i1
	...
	J with probability P_iJ

Y_i =	0	if Y_i^* < α₁
	1	if α₁ ≤ Y_i^* < α₂
	...
	J	if α_J ≤ Y_i^*

E(Y_i^*\|X_i)	= X_iβ
E(Y_i\|X_i,Y_i>0)	= X_iβ + E(ε_i\|Y_i>0) = X_iβ + E(ε_i\|ε_i>-X_iβ) = X_iβ + σf_i/F_i > E(Y_i^*\|X_i)
E(Y_i\|X_i)	= F_i E(Y_i\|X_i,Y_i>0) = F_i X_iβ + σf_i

∂E(Y_i^*\|X_i)/∂X_ij	= β_j
∂E(Y_i\|X_i,Y_i>0)/∂X_ij	= β_j[1-(X_iβ/σ)(f_i/F_i)-(f_i/F_i)²]
∂E(Y_i\|X_i)/∂X_ij	= F_i ∂E(Y_i\|X_i,Y_i>0)/∂X_ij + E(Y_i\|X_i,Y_i>0) ∂F_i/∂X_ij = F_iβ_j

Z2	= Age.
Z3	= Number of years married.
Z5	= Degree of religiousness: 1 (anti-religious), ..., 5 (very religious).
Z7	= Hollingshead scale of occupation: 1, ..., 7.
Z8	= Self-rating of marriage satisfaction: 1 (very unhappy), ..., 5 (very happy).

Y =	0	if Y^*≤ 0
	Y^*	if 0 < Y^* < 4 or {1,2,3}
	4	if Y^*≥ 4 or {7,12}

∂²ll/∂β∂β'	= ∑_i=1,2,...,N[(Y_i-λ(X_i,β))/λ(X_i,β)] [∂²λ(X_i,β)/∂β∂β']
	+ ∑_i=1,2,...,N[-Y_i/λ(X_i,β)²] [∂λ(X_i,β)/∂β'] [∂λ(X_i,β)/∂β]

f(Y\|λ,θ)	= ∫₀^∞ e^-λv(λv)^Y/Y! g(v\|θ) dv
	= (θ^θλ^Y)/(Γ(θ)Y!) ∫₀^∞ e^-(λ+θ)v v^(Y+θ-1) dv
	= [(θ^θλ^Y)/(Γ(θ)Y!)] [Γ(Y+θ)/(λ+θ)^y+θ]
	= [Γ(Y+θ)/(Γ(θ)Y!)] [λ/(λ+θ)]^Y [1-λ/(λ+θ)]^θ

Y =	0	if no extramarital affair
	1	otherwise (e.g., 1,2,3,7,12)

Pr(t<T≤t+Δt\|T>t)	= (Pr(T≤t+Δt)-Pr(T≤t)) / Pr(T>t)
	= (F(t+Δt)-F(t)) / (1-F(t))

Survival Distribution	Median Duration	Expected Duration
Exponential	(1/λ)ln(2)	(1/λ)γ(2)
Weibull	(1/λ)[ln(2)]^1/ρ	(1/λ)γ(1+1/ρ)
Log-Normal	(1/λ)	(1/λ)[exp(1/ρ²)]^½
Log-Logistic	(1/λ)	(1/λ)[exp(π²/(3ρ²)]^½

ll(θ)	= ∑_{t=uncensored obs.} ln f(t\|θ) + ∑_{t=censored obs.} ln S(t\|θ)
	= ∑_{t=uncensored obs.} ln h(t\|θ) + ∑_{t=all obs.} ln S(t\|θ)

Qualitative Choice Models

Table of Contents

Readings and References:

Binary Choice Models

Linear Probability Model

Probit Model

Logit Model

Example 1

Homework

Multinomial Choice Models

Unordered Model

Example

Ordered Model

Limited Dependent Variable Models

Tobit Model

Example 2: Tobit Analysis of Extramarital Affairs

Homework

Count Data and Poisson Regression Model

Heterogeneity and Negative Binomial Regression Model

Homework

Duration Data

Constant Hazard

Linear Hazard

Log-Logistic Hazard

Log-Normal Hazard

Duration Dependence and Weibull Distribution

Model Interpretation

Likelihood Function

Heterogeneity

Example 3