Qualitative Choice Models

Table of Contents

Readings and References:


Binary Choice Models

Consider a linear regression model Y = Xβ + ε, where

Yi = 1 with probability Pi
0 with probability 1-Pi

It is clear that Xi explains the probability of Yi to be 1 or 0. Let

Pi = Prob(Yi=1|Xi) = F(Xiβ)
1-Pi = Prob(Yi=0|Xi) = 1-F(Xiβ)

Since E(Yi|Xi) = (1)F(Xiβ) + (0)(1-F(Xiβ)) = F(Xiβ), the estimated model may be interpreted with the marginal effects defined by

∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β

Given a sample of N independent observations, the likelihood function is

L(β) = ∏i=1,2,...,N PiYi (1-Pi)1-Yi = ∏i=1,2,...,N F(Xiβ)Yi (1-F(Xiβ))1-Yi

Then the log-likelihood function is

ll(β) = ln(L(β)) = ∑i=1,2,...,N (Yi lnF(Xiβ) + (1-Yi) ln(1-F(Xiβ)))

To maximize ll(β) with respect to β, we solve from the first order condition:

ll(β)/∂β = ∑i=1,2,...,N (Yi/Fi-(1-Yi)/(1-Fi)) fiXi
= ∑i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

where Fi = F(Xiβ) and fi = f(Xiβ) = ∂Fi/∂(Xiβ). Note that fiXi = ∂Fi/∂β.

Finally, the Hessian ∂ll2(β)/∂β∂β' must be negative definite, and the estimated variance-covariance matrix of β is Var(β) = [-E(∂ll2(β)/∂β∂β')]-1.

Linear Probability Model

Pi = F(Xiβ) = Xiβ

It is immediately that E(Yi|Xi) = Xiβ. In particular,

E(εi) = (1-Xiβ)Pi + (-Xiβ)(1-Pi) = Pi - Xiβ
Var(εi) = E(εi2) = Pi(1-Xiβ)2 + (1-Pi)(-Xiβ)2
= Pi(1-Pi)2 + (1-Pi)(-Pi)2 = (1-Pi)Pi = (1-Xiβ)(Xiβ)

The range of Var(εi) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Yi|Xi) = F(Xiβ) = Xiβ, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.

Probit Model

Pi = F(Xiβ) = ∫-∞Xiβ 1/(2π)½ exp(-z2/2) dz

Pi, the cumulative normal distribution, is called Probit for the i-th observation. The model Yi = F-1(Pi) + εi is called the Probit Model, where F-1(Pi) = Xiβ is the inverse of cumulative distribution F(Xiβ). The probit model can be derived from a model involving an unobserved, or latent, variable Yi* such that Yi* = Xiβ + εi where εi ~ normal(0,1). Suppose the value of the observed binary variable Yi depends on the sign of Yi*:

Yi = 1 if Yi* > 0
0 if Yi* ≤ 0

Therefore,

Pi = Prob(Yi=1|Xi) = Prob(Yi*>0|Xi) = Prob(εi>-Xiβ)
= ∫-Xiβ 1/(2π)½ exp(-z2/2) dz
= ∫-∞Xiβ 1/(2π)½ exp(-z2/2) dz

For maximum likelihood estimation, we solve the following first order condition:

i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

where Fi = F(Xiβ) = ∫-∞Xiβ 1/(2π)½ exp(-z2/2) dz, and
fi = ∂F(Xiβ)/∂(Xiβ) = 1/(2π)½ exp(-(Xiβ)2/2)

This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Yi = F(Xiβ) + εi with weights given by [F(Xiβ)(1-F(Xiβ))].

Furthermore, it can be shown that for the maximum likelihood estimates β

E([∂2ll(β)/∂β∂β']) = -∑i=1,2,...,N(fi2XiXi')/(Fi(1-Fi))

which is negative definite. The estimated variance-covariance matrix of β is computed as

Var(β) = (-E[∂2ll(β)/∂β∂β'])-1

If the normal probability model is misspecified, then Quasi-Maximum Likelihood (QML) estimation is suggested by correcting the asymptotic variance-covariance matrix with a robust ("sandwich") estimator as follows:

Var(β) = (-H)-1G(-H)-1

where H = E[∂2ll(β)/∂β∂β'], and G = E[∂ll(β)/∂β][∂ll(β)/∂β'].

For model interpretation, the marginal effects of Xi is defined as

∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β = f(Xiβ)β = fiβ

Logit Model

Pi = F(Xiβ) = 1/(1+exp(-Xiβ))

Pi as defined is the logistic curve. The model Yi = F-1(Pi) + εi is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to Xiβ, or the odd ratio model: ln(Pi/(1-Pi)) = Xiβ Solving for Pi, we find that

Pi = exp(Xiβ)/(1+exp(Xiβ)) = 1/(1+exp(-Xiβ))

For maximum likelihood estimation, we solve the following first order condition:

i=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

Because of the logistic functional form,

Fi = F(Xiβ) = 1/(1+exp(-Xiβ)) and
fi = ∂F(Xiβ)/∂(Xiβ) = exp(-Xiβ)/(1+exp(-Xiβ)) = Fi(1-Fi)

it amounts to solve the following simple expression:

i=1,2,...,N (Yi-Fi)Xi = 0

with the negative definite Hessian:

2ll(β)/∂β∂β' = - ∑i=1,2,...,NFi(1-Fi)Xi'Xi

Therefore, the estimate of variance-covariance matrix of β is

Var(β) = [-∂2ll(β)/∂β∂β']-1

For model interpretation, the marginal effects of Xi is defined as

∂E(Yi|Xi)/∂Xi = [∂F(Xiβ)/∂(Xiβ)] β = fiβ = Fi(1-Fi

Example 1

This example (see also, Greene [1999], Example 19.1) examines the effect of a new teaching method on students' grades. The following variables in the data file GRADE.TXT are used:

The qualitative equation is formulated as follows:

GRADE = β0 + β1GPA + β2TUCE + β3PSI + ε

Estimate and interpret the logit and probit probablity model specifications of the above equation, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.

Homework

To consider the potential problem of heteroscedasticity in the above cross section study, we assume a form of multiplicative heterscedasticity associated with the important variable PSI in the qualitative regression equation:

GRADE = β0 + β1GPA + β2TUCE + β3PSI + ε

where for each observation i, the heteroscedastic variance is defined by

σi2 = exp(αPSIi)2

That is, when PSIi = 0, σi2 = 1; when PSIi equals 1, σi2 = exp(α)2, where α is the unknown parameter in addition to βs for model estimation.


Multinomial Choice Models

When a single decision is made among two or more alternatives, the outcome may be ordered (with a preference rank) or unordered.

Unordered Model

Suppose there are J+1 alternatives: j=0,1,...,J. For each observation i (e.g. an individual), the decision of selecting the alternative j is described by:

Yi = Xiβj + εi, where

Yi = 0 with probability Pi0
1 with probability Pi1
...
J with probability PiJ

That is, for each i=1,2,...,N, Pij = Prob(Yi=j|Xi), j=0,1,...,J, and ∑j=0,1,...,JPij = 1.

Notice that the estimated parameter vector βj is alternative specific. For notational convenience, let β = [β01,...,βJ]'. We further assume a binary decision outcome for each individual i on selecting alternative j as follows:

dij = 1 if Yi = j
0 if Yi ≠ j

Then the log-likelihood function for the model is

ll(β) = ∑i=1,2,...,Nj=0,1,...,J dijln(Pij)

Multinomial Logit Model

Pik = exp(Xiβk) / ∑j=0,1,...Jexp(Xiβj), k = 0,1,...,J.

It is clear that if β = [β0, β1, ..., βJ]' maximizes the log-likelihood function, so will be the vector β + γ for any constant γ. Normalize the parameters vector β with β0 = 0, a zero vector, gives a consistent representation of a binary logit model when J=1 as follows:

Pi0 = 1 / (1+∑j=1,...Jexp(Xiβj)),
...
Pik = exp(Xiβk) / (1+∑j=1,...Jexp(Xiβj)), k = 1,...,J.

Finally, the log of the odds between alternatives j and k is simply

ln(Pij/Pik) = Xij - βk).

Conditional Logit Model

When the data consist of choice-specific attributes instead of individual-specific characteristics, the probabilities should be defined by

Pik = exp(Xikβ) / ∑j=1,...Jexp(Xijβ), k = 1,...,J.

Note that Xij can not contain the constant. The interpretation of the model is based on the marginal effects as ∂Pij/∂Xik.

Example

Greene (2003) applies a choice model to a 210-respondent dataset recording travel mode between Sydney and Melbourne in Australia (Data). The modes of travel are air, train, bus, and car, and the chosen mode is recorded with 0 (not chosen) and 1 (chosen). The choice-specific variables include:

GC = Generalized cost constructed from measures on the in-vehicle cost (INVC) and on time spent on traveling (INVT).
TTIME = Terminal time (e.g., zero waiting time for traveling by car).

In addition, there are two individual-specific variables:

HINC = Household income
PSIZE = Party size in the chosen model

Let X consists of the two individual specific variables HINC, PSIZE and a constant term. By allowing choice-specific parameters, formulate and estimate the multinomial logistic model.

By comparing the choice between air and the other modes, define the dummy variable as follow:

AIRINC = Air-Travel*HINC

With the choice-specific variables GC and TTIME are used, while HINC, PSIZE, and AIRINC are the explanatory variables for the multinominal logit model.

Ordered Model

Consider the latent variable model: Yi* = Xiβ + εi, where εi ~ normal(0,1). Let

Yi = 0 if Yi* < α1
1 if α1 ≤ Yi* < α2
...
J if αJ ≤ Yi*

The parameters of this model consist of β and α = [α12,...,αJ]' with αJ > ...> α2 > α1. The αi's are thresholds which determine the value of Yi depending on the specific interval of Yi* will map into. Therefore,

Pi0 = Prob(Yi=0|Xi) = Prob(Yi* < α1|Xi) = Prob(Xiβ + εi < α1) = Prob(εi < α1 - Xiβ)
Pi1 = Prob(Yi=1|Xi) = Prob(α1 ≤ Yi* < α2|Xi) = Prob(εi < α2 - Xiβ) - Prob(εi ≤ α1 - Xiβ)
...
PiJ = Prob(Yi=J|Xi) = Prob(αJ ≤ Yi*|Xi) = Prob(εi ≥ αJ - Xiβ)

It is clear that Pi0 = 1 - ∑j=1,2,...,JPij. The estimated parameters (β,α) and the corresponding variance-covariance matrix are obtainted from maximizing the log-likelihood function:

ll(β,α) = ∑i=1,2,...,Nj=0,1,...,J dijln(Pij)

where

dij = 1 if Yi = j
0 if Yi ≠ j

Ordered Probit Model

Pi0 = ∫-∞α1-Xiβ 1/(2π)½ exp(-z2/2) dz
Pi1 = ∫α1-Xiβα2-Xiβ 1/(2π)½ exp(-z2/2) dz
...
Pij = ∫αj-Xiβαj+1-Xiβ 1/(2π)½ exp(-z2/2) dz, j = 1, 2, ..., J-1
...
PiJ = ∫αJ-Xiβ 1/(2π)½ exp(-z2/2) dz

Since the threshold parameters α = [α12,...,αJ]' must be estimated with the regression parameters β, the explanatory variables Xi should not include a constant term for an ordered probit model.


Limited Dependent Variable Models

If the random decision variable follows a mixture of (discrete or continuous) probability distributions, we have the limited dependent variable (truncated or censored regression) model. If the sample is said to be truncated, there is no corresponding data for the variables. If the sample is said to be censored, the data is observable. We consider only the case of censored data.

Recall the latent variable interpretation of the probit model,

Yi* = Xiβ + εi

where εi ~ normal (0,σ2), and

Yi = 1 if Yi* > 0
0 if Yi* ≤ 0

Suppose, however, that Yi is censored-that is, we restrict the number (or kinds) of values that Yi can take. As an example, consider the following:

Yi = Yi* if Yi* > 0
0 if Yi* ≤ 0

That is, Yi = max(Yi*,0) = max(Xiβ+εi,0).

Tobit Model

Define fi and Fi to be the probability density function and cumulative density function of a standardized normal random variable evaluated at Xiβ/σ. That is,

Fi = F(Xiβ/σ) = ∫-∞Xiβ/σ 1/(2π)½ exp(-z2/2) dz
fi = f(Xiβ/σ) = 1/(2π)½ exp[-(Xiβ/σ)2/2]

For the observations such that Yi = 0 or Yi* = Xiβ + εi ≤ 0, the likelihood function is

Prob(Yi = 0) = Prob(εi ≤ -Xiβ) = Prob(εi/σ ≤ -Xiβ/σ) = F(-Xiβ/σ) = 1-F(Xiβ/σ) = 1-Fi

If Yi > 0, on the other hand, then the likelihood function is simply the normal density function:

1/(2πσ2)½ exp[-(Yi-Xiβ)2/(2σ2)]

Therefore the likelihood function for the Tobit model is a mixture of the above distributions depending on the values taken by the dependent variable (i.e., zero or positive):

L(β,σ2) = ∏{i|Yi=0}(1-F(Xiβ/σ)) ∏{i|Yi>0}1/(2πσ2)½ exp[-(Yi-Xiβ)2/(2σ2)]

The corresponding log-likelihood function is

ll(β,σ2) = ∑{i|Yi=0}ln(1-F(Xiβ/σ)) -1/2 ∑{i|Yi>0}[ln(2π)+ln2)+(Yi-Xiβ)22]

Then, for the maximum likelihood estimation, we solve from the following first-order conditions:

ll/∂β = -(1/σ)∑{i|Yi=0}fiXi/(1-Fi) +(1/σ)∑{i|Yi>0}(Yi-Xiβ)Xi = 0
ll/∂σ2 = (1/2)(1/σ3)∑{i|Yi=0}fiXiβ/(1-Fi) -(1/2)(1/σ2)∑{i|Yi>0}[1-(Yi-Xiβ)22] = 0

If the model error does not follow a normal probability distribution, Quasi-Maximum Likelihood (QML) estimation corrects the estimated asymptotic variance-covariance matrix of β with a robust estimator as follows:

Var(θ) = (-H)-1G(-H)-1

where θ = (β,σ2), H = E[∂2ll(θ)/∂θ∂θ'], and G = E[∂ll(θ)/∂θ][∂ll(θ)/∂θ'].

To interpret the estimated coefficients of the model, we may use three conditional expected values:

E(Yi*|Xi) = Xiβ
E(Yi|Xi,Yi>0) = Xiβ + E(εi|Yi>0)
= Xiβ + E(εii>-Xiβ)
= Xiβ + σfi/Fi > E(Yi*|Xi)
E(Yi|Xi) = Fi E(Yi|Xi,Yi>0)
= Fi Xiβ + σfi

The first expected value (corresponding to the "uncensored" case) is easy to obtain. The last expected value will be of particular interest if our sample contains many censored observations. Accordingly, for the j-th explanatory variable, the corresponding marginal effects are:

∂E(Yi*|Xi)/∂Xij = βj
∂E(Yi|Xi,Yi>0)/∂Xij = βj[1-(Xiβ/σ)(fi/Fi)-(fi/Fi)2]
∂E(Yi|Xi)/∂Xij = Fi ∂E(Yi|Xi,Yi>0)/∂Xij + E(Yi|Xi,Yi>0) ∂Fi/∂Xij
= Fiβj

We note that the last censored marginal effect differs from the first uncensored one by a scale factor equal to the probability of that observation not being censored. In other words, the scale factor is equal to Fi (recall that Fi is 1-Prob(Yi=0)).

The tobit model is often estimated for comparison with the alternative probit or count model specifications. The model can be easily extended to consider more than one censoring point. For instance, we could censor both tails of the distribution. This is an example of a doubly censored regression.

Example 2: Tobit Analysis of Extramarital Affairs

This example is taken from Greene (2002, 22.3.6), which is based on Fair (1978). The study examines the qualitative responses to a question about extramarital affairs from a sample of 601 men and women married for the first time. The dependent variable is:

Y = Number of affairs in the past year: 0, 1, 2, 3, 4-10 (coded as 7), 11 or more (coded as 12).

The preponderance of zeros (no affairs) may not render the tobit model to be the best for the study, here we present only the model using five explanatory variables as follows:

Z2 = Age.
Z3 = Number of years married.
Z5 = Degree of religiousness: 1 (anti-religious), ..., 5 (very religious).
Z7 = Hollingshead scale of occupation: 1, ..., 7.
Z8 = Self-rating of marriage satisfaction: 1 (very unhappy), ..., 5 (very happy).

The regression equation is:

Y = β0 + β2Z2 + β3Z3 + β5Z5 + β7Z7 + β8Z8 + ε

where ε ~ normal(0,σ2I). The estimation and interpretation of the estimated tobit model are left as exercises (Data).

Homework

  1. The Fair Model may be estimated with a probit specification, but the interpretation is entirely different from that of the tobit model. Let the dependent variable Y be:

    Y = 0 if no extramarital affair
    1 otherwise (e.g., 1,2,3,7,12)

    If the specification of the tobit model is correct, then probit estimators should be consistent for β/σ from the tobit model.

  2. Since the coded data of 7 and 12 are not the actual data, these right-hand-side values (7 and 12) for the tobit model are difficult for interpretation. An alternative is to consider the dependent variable not only censored at 0 but also censored at the right, says 4 for "4 or more". Therefore the doubly censored tobit model as:

    Y = 0 if Y*≤ 0
    Y* if 0 < Y* < 4 or {1,2,3}
    4 if Y*≥ 4 or {7,12}

    where Y* = Zβ+ε and ε ~ normal(0,σ2I).

    We have shown that the probability of Y = 0 is:
    Prob(Yi = 0) = Prob(Yi* ≤ 0) = Prob(εi ≤ -Ziβ) = Prob(εi/σ ≤ -Ziβ/σ)

    It is easy to show that the probability of Y = 4 is:
    Prob(Yi = 4) = Prob(Yi*≥4) = Prob(εi ≥ 4-Ziβ) = Prob(εi/σ ≥ (4-Ziβ)/σ)

    Finally, for Yi = {1,2,3}, the likelihood is simply the normal density function:
    1/(2πσ2)½ exp[-(Yi-Ziβ)2/(2σ2)]

    Then the corresponding log-likelihood function is

    ll(β,σ2) = ∑{i|Yi=0}lnF(-Ziβ/σ) -1/2 ∑{i|Yi=1,2,3}[ln(2π)+ln2)+(Yi-Xiβ)22] +∑{i|Yi=4}ln(1-F((4-Ziβ)/σ))

    Estimate and interpret the doubly-censored tobit model.


Count Data and Poisson Regression Model

If a decision variable takes values of nonnegative integers, in which there is no prior upper bound and there are some zeros, this is the model of count data.

Suppose Y = {0,1,2,...} follows a Poisson distribution with a parameter λ>0:

f(Y|λ) = eλY / Y!

It is known that E(Y) = Var(Y) = λ. If Y is to be explained by X such that E(Y|X) > 0, a natural approach is to set λ = E(Y|X) and parameterized by the regression parameter β in the Poisson distribution function. For example,

λ(X,β) = E(Y|X,β) = e > 0

Therefore, given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, the likelihood function is written as:

L(β) = ∏i=1,2,...,N [e-λ(Xi,β) λ(Xi,β)Yi / Yi!]

The corresponding log-likelihood function is

ll(β) = ∑i=1,2,...,NYiln(λ(Xi,β)) - ∑i=1,2,...,Nλ(Xi,β) - ∑i=1,2,...,Nln(Yi!)

Maximum likelihood estimate of β is obtained from:

ll/∂β = ∑i=1,2,...,N[(Yi-λ(Xi,β))/λ(Xi,β)] [∂λ(Xi,β)/∂β] = 0, and
2ll/∂β∂β' = ∑i=1,2,...,N[(Yi-λ(Xi,β))/λ(Xi,β)] [∂2λ(Xi,β)/∂β∂β']
+ ∑i=1,2,...,N[-Yi/λ(Xi,β)2] [∂λ(Xi,β)/∂β'] [∂λ(Xi,β)/∂β]
is negative definite.

If λ(Xi,β) = E(Yi|Xi,β) = eXiβ, the model is interpreted as:

∂E(Yi|Xi,β)/∂Xij = eXiββj = E(Yi|Xi,β)βj, or

βj = ∂E(Yi|Xi,β)/∂Xij / E(Yi|Xi,β)

Heterogeneity and Negative Binomial Regression Model

Maximum likelihood estimation of the Poisson regression model suffers from the problem of overdispersion due to the fact that Var(Y|X) = E(Y|X) when Y follows a Poisson distribution. We generalize the Poisson model by introducing an individual unobservable effect v>0 into the conditional mean:

E(Y|X,β,v) = λv = ev

Then Y follows a Poisson distribution with the density:

f(Y|λv) = e-λv(λv)Y / Y!

Suppose v follows a gamma distribution with E(v) = 1 and Var(v) = 1/θ. That is,

g(v|θ) = θθ/Γ(θ) vθ-1e-θv

Therefore,

f(Y|λ,θ) = ∫0 e-λv(λv)Y/Y! g(v|θ) dv
= (θθλY)/(Γ(θ)Y!) ∫0 e-(λ+θ)v v(Y+θ-1) dv
= [(θθλY)/(Γ(θ)Y!)] [Γ(Y+θ)/(λ+θ)y+θ]
= [Γ(Y+θ)/(Γ(θ)Y!)] [λ/(λ+θ)]Y [1-λ/(λ+θ)]θ

This is one form of negative binomial distribution with mean λ and variance λ(1+λ/θ). By construction, it is a Poisson-Gamma mixture. Typically, the parameter 1/θ is used to measure the extent of overdispersion. Given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, and let λi = λ(Xi,β) = eXiβ, the log-likelihood function is written as:

ll(β,θ) = ∑i=1,2,...N ln f(Yii,θ)

The negative binomial model can be estimated by maximum likelihood without much difficulty. A test of the Poisson distribution is often carried out by testing the hypothesis 1/θ -> 0.

Homework

By examing the data of extramarital affairs in the previous example, the dependent variable is a count, not a continuous measurement. The Poisson or negative binomial regression model may be a better modeling framework.

As discussed earlier, responses of 7 and 12 do not represent the actual data. We have re-coded 4 for both values of 7 and 12 as "4 or more" and treated it as a right censored observation. The Poisson and negative binomial model may be modified to consider the censored data.

Formulate, estimate, and compare the censored and uncensored versions of Poisson and negative binomial regression models, respectively.


Duration Data

A non-negative random variable T may be used to represent the duration of an initial state (e.g., hospital stay, unemployment). Let t ≥ 0 be a particular value of T, then the probability of staying in the initial state at t is denoted by

F(t) = Pr(T ≤ t)

Clearly, F(0) = 0, and

1-F(t) = Pr(T>t) = S(t)

is the probability of "surviving" past t. Therefore S(t) is called survivor function.

Given the distribution of duration F, we can define the probability of "exiting the initial state" in the time interval (t,t+Δt] when the event has survived through t (or T>t) as follows:

Pr(t<T≤t+Δt|T>t) = (Pr(T≤t+Δt)-Pr(T≤t)) / Pr(T>t)
= (F(t+Δt)-F(t)) / (1-F(t))

Taking the limit, we have

limΔt→0Pr(t<T≤t+Δt|T>t)
= f(t)/(1-F(t)) where f(t) = dF(t)/dt is the density function of T at t.
= f(t)/S(t)
= [-dS(t)/dt]/S(t)
= -dln(S(t))/dt, negative rate of survival.

Define the hazard function as:

h(t) = -dln(S(t))/dt

That is, ln(S(t)) = - ∫0th(τ)dτ, or

S(t) = e-∫0th(τ)dτ
F(t) = 1 - S(t) = 1 - e-∫0th(τ)dτ
f(t) = dF(t)/dt = h(t) e-∫0th(τ)dτ

We need to specify the hazard function h(t) in order to study the duration data. Denote h(t|θ) the hazard function of t with unknown parameter θ.

Constant Hazard

Assume h(t|θ) = θ, independent of t. Then, ∫0th(τ|θ)dτ = θt. We have,

S(t|θ) = e-θt,
F(t|θ) = 1-e-θt, and
f(t|θ) = θe-θt

In this case, F is the exponential distribution. θ is the parameter which can be estimated by the reciprocal of the sample mean because E(T) = 1/θ for the exponential distributed random variable T.

Linear Hazard

Assume h(t|θ) = α+βt, where θ={α,β}. Then,

S(t|θ) = e-(α+½βt)t
F(t|θ) = 1-e-(α+½βt)t, and
f(t|θ) = (α+βt)e-(α+½βt)t

The problem is that the estimated hazard h(t|θ) may be negative.

Log-Logistic Hazard

The log duration ln(T) is assumed to have a logistic distribution with mean -ln(λ) and variance π2/(3ρ2). It is clear that the hazards first increase then decrease, where

h(t|θ) = λρ(λt)ρ-1/[1+(λt)ρ]
S(t|θ) = 1/[1+(λt)ρ]

Therefore, f(t|θ) = λρ(λt)ρ-1/[1+(λt)ρ]2

Log-Normal Hazard

The log duration ln(T) is assumed to be normally distributed with mean -ln(λ) and standard deviation 1/ρ. The hazards first increase then decrease, where

S(t|θ) = ∫-∞ln(λt) 1/(2π)½exp(-z2/2)dz
f(t|θ) = (ρ/t) 1/(2π)½exp(-½[ρln(λt)]2)

Duration Dependence and Weibull Distribution

If h(t|θ) = λρ(λt)ρ-1, where θ = {λ,ρ}; λ > 0 is the location parameter and ρ > 0 is the scale parameter. This is the hazard function for the Weibull distribution. The duration dependent hazard function h(t|θ) is monotonically increasing or decreasing depending on ρ < 1 or ρ > 1, respectively. It is easy to show that

S(t|θ) = e-(λt)ρ
F(t|θ) = 1-e-(λt)ρ
f(t|θ) = λρ(λt)(ρ-1) e-(λt)ρ

Model Interpretation

From the estimated survival distribution S(t|θ), we can interpret the model based on expected and median duration. By definition, the median duration is the duration in time t so that S(t|θ) = 0.5.

Survival
Distribution
Median
Duration
Expected
Duration
Exponential(1/λ)ln(2)(1/λ)γ(2)
Weibull(1/λ)[ln(2)]1/ρ  (1/λ)γ(1+1/ρ)
Log-Normal(1/λ)(1/λ)[exp(1/ρ2)]½
Log-Logistic  (1/λ)(1/λ)[exp2/(3ρ2)]½

Likelihood Function

Given sample observations of duration data {t1,t2,...,tN}, based on hazard function h(t|θ) and therefore the probability density f(t|θ) = h(t|θ)S(t|θ), the log-likelihood function for model estimation is

ll(θ) = ∑i=1,2,...,N ln f(ti|θ) = ∑i=1,2,...,N [ln h(ti|θ) + ln S(ti|θ)]

The vector of parameters θ of these models can be estimated by maximum likelihood. Censored observations can be incorporated easily as such:

ll(θ) = ∑t=uncensored obs. ln f(t|θ) + ∑t=censored obs. ln S(t|θ)
= ∑t=uncensored obs. ln h(t|θ) + ∑t=all obs. ln S(t|θ)

For example, the log-likelihood function for Weibull distribution is:

ll(λ,ρ) = ∑i=1,2,...,N [ln(ρ)-ln(ti)] + ∑i=1,2,...,N ρ[ln(λ)+ln(ti)] - ∑i=1,2,...,N (λti)ρ

The introduction of explanatory variables to the duration models is fairly straightforward, although the interpretation of the estimated parameters in the model is difficult. Consider, for example, the Weibull model. Let λi = e-Xiβ where Xi is assumed to be the same from T=0 to T=t. The corresponding log-likelihood function is:

ll(β,λ,ρ) = ∑i=1,2,...,N[ln(ρ)-ln(ti)] + ∑i=1,2,...,Nρ(ln(ti)-Xiβ) - ∑i=1,2,...,N(e-Xiβti)ρ

Heterogeneity

The problem of heterogeneity in duration models can be viewed essentially as the result of an incomplete specification. There are a number of ways of extending duration models to account for heterogeneity. One direct approach is to model heterogeneity in the parametric model.

To incorporate heterogeneity into the Weibull model, suppose the survival function is conditioned on the individual effect v as:

S(t|v) = e-(vλt)ρ

and v follows a gamma distribution with E(v)=1 and Var(v)=1/γ:

g(v) = [γγ/Γ(γ)] vγ-1e-γv

Then, S(t) = Ev[S(t|v)] = ∫vS(t|v)g(v)dv = [1+(1/γ)(λt)ρ]

The limiting value, with 1/γ → 0, is the Weibull survival model, so Var(v) → 0 or no heterogeneity. The corresponding hazard function is

h(t) = λρ(λt)ρ-1[S(t)]1/γ = λρ(λt)ρ-1/[1+(1/γ)(λt)ρ]

Therefore,

f(t) = λρ(λt)ρ-1[S(t)]1+1/γ = λρ(λt)ρ-1/[1+(1/γ)(λt)ρ]1+γ

Example 3

Using a set of strike duration data for the U.S. manufacturing from 1968 to 1976, Kennan [1985] studied 62 cases of strike duration T in days. In addition, a covariate X is used to explain the duration of a strike. X is a measure of "unanticipated" aggregate industrial production net of seasonal and trend components (See Data).


Copyright © Kuan-Pin Lin
Last updated: 10/10/2012