Yi = | 1 with probability Pi |
0 with probability 1-Pi |
It is clear that Xi explains the probability of Yi to be 1 or 0. Let
Pi = Prob(Yi=1|Xi) = F(Xib)
1-Pi = Prob(Yi=0|Xi) = 1-F(Xib)
Since E(Yi|Xi) = (1)F(Xib) + (0)(1-F(Xib)) = F(Xib), the estimated model may be interpreted with the marginal effects defined by
¶E(Yi|Xi)/¶Xi = [¶F(Xib)/¶(Xib)] b
Given a sample of N independent observations, the likelihood function is
L(b) = Õi=1,2,...,N PiYi (1-Pi)1-Yi = Õi=1,2,...,N F(Xib)Yi (1-F(Xib))1-Yi
Then the log-likelihood function is
ll(b) = ln(L(b)) = åi=1,2,...,N (Yi lnF(Xib) + (1-Yi) ln(1-F(Xib)))
To maximize ll(b) with respect to b, we solve from the first order condition:
¶ll(b)/¶b | = åi=1,2,...,N (Yi/Fi-(1-Yi)/(1-Fi)) fiXi |
= åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0 |
where Fi = F(Xib) and fi = f(Xib) = ¶Fi/¶(Xib). Note that fiXi = ¶Fi/¶b.
Finally, the Hessian ¶ll2(b)/¶b¶b' must be negative definite, and the estimated variance-covariance matrix of b is Var(b) = [-E(¶ll2(b)/¶b¶b')]-1.
It is immediately that E(Yi|Xi) = Xib. In particular,
E(ei) | = (1-Xib)Pi + (-Xib)(1-Pi) = Pi - Xib |
Var(ei) | = E(ei2) = Pi(1-Xib)2 + (1-Pi)(-Xib)2 |
= Pi(1-Pi)2 + (1-Pi)(-Pi)2 = (1-Pi)Pi = (1-Xib)(Xib) |
The range of Var(ei) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Yi|Xi) = F(Xib) = Xib, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.
Pi, the cumulative normal distribution, is called Probit for the i-th observation. The model Yi = F-1(Pi) + ei is called the Probit Model, where F-1(Pi) = Xib is the inverse of cumulative distribution F(Xib). The probit model can be derived from a model involving an unobserved, or latent, variable Yi* such that Yi* = Xib + ei where ei ~ normal(0,1). Suppose the value of the observed binary variable Yi depends on the sign of Yi*:
Yi = | 1 if Yi* > 0 |
0 if Yi* £ 0 |
Therefore,
Pi = Prob(Yi=1|Xi) = Prob(Yi*>0|Xi)
= Prob(ei>-Xib)
= ò¥-Xib
1/(2p)½ exp(-z2/2) dz
= ò-¥Xib
1/(2p)½ exp(-z2/2) dz
For maximum likelihood estimation, we solve the following first order condition:
åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0
where Fi = F(Xib) =
ò-¥Xib
1/(2p)½ exp(-z2/2) dz, and
fi =
¶F(Xib)/¶(Xib) =
1/(2p)½
exp(-(Xib)2/2)
This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Yi = F(Xib) + ei with weights given by [F(Xib)(1-F(Xib))]-½.
Furthermore, it can be shown that for the maximum likelihood estimates b
E([¶2ll(b)/¶b¶b']) = -åi=1,2,...,N(fi2XiXi')/(Fi(1-Fi))
which is negative definite. The estimated variance-covariance matrix of b is computed as
Var(b) = (-E[¶2ll(b)/¶b¶b'])-1
If the normal probability model is misspecified, then Quasi-Maximum Likelihood (QML) estimation is suggested by correcting the asymptotic variance-covariance matrix with a robust ("sandwich") estimator as follows:
Var(b) = (-H)-1G(-H)-1
where H = E[¶2ll(b)/¶b¶b'], and G = E[¶ll(b)/¶b][¶ll(b)/¶b'].
For model interpretation, the marginal effects of Xi is defined as
¶E(Yi|Xi)/¶Xi = [¶F(Xib)/¶(Xib)] b = f(Xib)b = fib
Pi as defined is the logistic curve. The model Yi = F-1(Pi) + ei is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to Xib, or the odd ratio model: ln(Pi/(1-Pi)) = Xib Solving for Pi, we find that
Pi = exp(Xib)/(1+exp(Xib)) = 1/(1+exp(-Xib))
For maximum likelihood estimation, we solve the following first order condition:
åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0
Because of the logistic functional form,
Fi = F(Xib) =
1/(1+exp(-Xib)) and
fi = ¶F(Xib)/¶(Xib) =
exp(-Xib)/(1+exp(-Xib)) =
Fi(1-Fi)
it amounts to solve the following simple expression:
åi=1,2,...,N (Yi-Fi)Xi = 0
with the negative definite Hessian:
¶2ll(b)/¶b¶b' = - åi=1,2,...,NFi(1-Fi)Xi'Xi
Therefore, the estimate of variance-covariance matrix of b is
Var(b) = [-¶2ll(b)/¶b¶b']-1
For model interpretation, the marginal effects of Xi is defined as
¶E(Yi|Xi)/¶Xi = [¶F(Xib)/¶(Xib)] b = fib = Fi(1-Fi)b
The qualitative equation is formulated as follows:
GRADE = b0 + b1GPA + b2TUCE + b3PSI + e
Estimate and interpret the logit (Program) and probit (Program) probablity model specifications of the above equation, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.
GRADE = b0 + b1GPA + b2TUCE + b3PSI + e
where for each observation i, the heteroscedastic variance is defined by
si2 = exp(aPSIi)2
That is, when PSIi = 0, si2 = 1; when PSIi equals 1, si2 = exp(a)2, where a is the unknown parameter in addition to bs for model estimation.
Yi = Xibj + ei, where
Yi = | 0 with probability Pi0 |
1 with probability Pi1 | |
... | |
J with probability PiJ |
That is, for each i=1,2,...,N, Pij = Prob(Yi=j|Xi), j=0,1,...,J, and åj=0,1,...,JPij = 1.
Notice that the estimated parameter vector bj is alternative specific. For notational convenience, let b = [b0,b1,...,bJ]'. We further assume a binary decision outcome for each individual i on selecting alternative j as follows:
dij = | 1 | if Yi = j |
0 | if Yi ¹ j |
Then the log-likelihood function for the model is
ll(b) = åi=1,2,...,N åj=0,1,...,J dijln(Pij)
Multinomial Logit Model
Pik = exp(Xibk) / åj=0,1,...Jexp(Xibj), k = 0,1,...,J.
It is clear that if b = [b0, b1, ..., bJ]' maximizes the log-likelihood function, so will be the vector b + g for any constant g. Normalize the parameters vector b with b0 = 0, a zero vector, gives a consistent representation of a binary logit model when J=1 as follows:
Pi0 = 1 / (1+åj=1,...Jexp(Xibj)),
...
Pik = exp(Xibk) /
(1+åj=1,...Jexp(Xibj)), k = 1,...,J.
Finally, the log of the odds between alternatives j and k is simply
ln(Pij/Pik) = Xi(bj - bk).
Conditional Logit Model
When the data consist of choice-specific attributes instead of individual-specific characteristics, the probabilities should be defined by
Pik = exp(Xikb) / åj=1,...Jexp(Xijb), k = 1,...,J.
Note that Xij can not contain the constant. The interpretation of the model is based on the marginal effects as ¶Pij/¶Xik.
GC = Generalized cost constructed from measures on the in-vehicle cost (INVC) and on
time spent on traveling (INVT).
TTIME = Terminal time (e.g., zero waiting time for traveling by car).
In addition, there are two individual-specific variables:
HINC = Household income
PSIZE = Party size in the chosen model
Let X consists of the two individual specific variables HINC, PSIZE and a constant term. By allowing choice-specific parameters, formulate and estimate the multinomial logistic model.
By comparing the choice between air and the other modes, define the dummy variable as follow:
AIRINC = Air-Travel*HINC
With the choice-specific variables GC and TTIME are used, while HINC, PSIZE, and AIRINC are the explanatory variables for the multinominal logit model. (Program)
Yi = | 0 | if Yi* < a1 |
1 | if a1 £ Yi* < a2 | |
... | ||
J | if aJ £ Yi* |
The parameters of this model consist of b and a = [a1,a2,...,aJ]' with aJ > ...> a2 > a1. The ai's are thresholds which determine the value of Yi depending on the specific interval of Yi* will map into. Therefore,
Pi0 = Prob(Yi=0|Xi) = Prob(Yi* < a1|Xi) =
Prob(Xib + ei < a1) =
Prob(ei < a1 - Xib)
Pi1 = Prob(Yi=1|Xi) = Prob(a1 £ Yi* < a2|Xi) =
Prob(ei < a2 - Xib) -
Prob(ei £ a1 - Xib)
...
PiJ = Prob(Yi=J|Xi) = Prob(aJ £ Yi*|Xi) =
Prob(ei ³ aJ - Xib)
It is clear that Pi0 = 1 - åj=1,2,...,JPij. The estimated parameters (b,a) and the corresponding variance-covariance matrix are obtainted from maximizing the log-likelihood function:
ll(b,a) = åi=1,2,...,N åj=0,1,...,J dijln(Pij)
where
dij = | 1 | if Yi = j |
0 | if Yi ¹ j |
Ordered Probit Model
Pi0 = ò-¥a1-Xib
1/(2p)½ exp(-z2/2) dz
Pi1 = òa1-Xiba2-Xib
1/(2p)½ exp(-z2/2) dz
...
Pij = òaj-Xibaj+1-Xib
1/(2p)½ exp(-z2/2) dz, j = 1, 2, ..., J-1
...
PiJ = òaJ-Xib¥
1/(2p)½ exp(-z2/2) dz
Since the threshold parameters a = [a1,a2,...,aJ]' must be estimated with the regression parameters b, the explanatory variables Xi should not include a constant term for an ordered probit model.
Recall the latent variable interpretation of the probit model,
Yi* = Xib + ei
where ei ~ normal (0,s2), and
Yi = | 1 if Yi* > 0 |
0 if Yi* £ 0 |
Suppose, however, that Yi is censored-that is, we restrict the number (or kinds) of values that Yi can take. As an example, consider the following:
Yi = | Yi* | if Yi* > 0 |
0 | if Yi* £ 0 |
That is, Yi = max(Yi*,0) = max(Xib+ei,0).
Fi = F(Xib/s) =
ò-¥Xib/s
1/(2p)½ exp(-z2/2) dz
fi = f(Xib/s) =
1/(2p)½ exp[-(Xib/s)2/2]
For the observations such that Yi = 0 or Yi* = Xib + ei £ 0, the likelihood function is
Prob(Yi = 0) = Prob(ei £ -Xib) = Prob(ei/s £ -Xib/s) = F(-Xib/s) = 1-F(Xib/s) = 1-Fi
If Yi > 0, on the other hand, then the likelihood function is simply the normal density function:
1/(2ps2)½ exp[-(Yi-Xib)2/(2s2)]
Therefore the likelihood function for the Tobit model is a mixture of the above distributions depending on the values taken by the dependent variable (i.e., zero or positive):
L(b,s2) = Õ{i|Yi=0}(1-F(Xib/s)) Õ{i|Yi>0}1/(2ps2)½ exp[-(Yi-Xib)2/(2s2)]
The corresponding log-likelihood function is
ll(b,s2) = å{i|Yi=0}ln(1-F(Xib/s)) -1/2 å{i|Yi>0}[ln(2p)+ln(s2)+(Yi-Xib)2/s2]
Then, for the maximum likelihood estimation, we solve from the following first-order conditions:
¶ll/¶b =
-(1/s)å{i|Yi=0}fiXi/(1-Fi)
+(1/s)å{i|Yi>0}(Yi-Xib)Xi = 0
¶ll/¶s2 =
(1/2)(1/s3)å{i|Yi=0}fiXib/(1-Fi)
-(1/2)(1/s2)å{i|Yi>0}[1-(Yi-Xib)2/s2] = 0
If the model error does not follow a normal probability distribution, Quasi-Maximum Likelihood (QML) estimation corrects the estimated asymptotic variance-covariance matrix of b with a robust estimator as follows:
Var(q) = (-H)-1G(-H)-1
where q = (b,s2), H = E[¶2ll(q)/¶q¶q'], and G = E[¶ll(q)/¶q][¶ll(q)/¶q'].
To interpret the estimated coefficients of the model, we may use three conditional expected values:
E(Yi*|Xi) | = Xib |
E(Yi|Xi,Yi>0) | = Xib + E(ei|Yi>0)
= Xib + E(ei|ei>-Xib) = Xib + sfi/Fi > E(Yi*|Xi) |
E(Yi|Xi) | = Fi E(Yi|Xi,Yi>0)
= Fi Xib + sfi |
The first expected value (corresponding to the "uncensored" case) is easy to obtain. The last expected value will be of particular interest if our sample contains many censored observations. Accordingly, for the j-th explanatory variable, the corresponding marginal effects are:
¶E(Yi*|Xi)/¶Xij | = bj |
¶E(Yi|Xi,Yi>0)/¶Xij | = bj[1-(Xib/s)(fi/Fi)-(fi/Fi)2] |
¶E(Yi|Xi)/¶Xij | = Fi ¶E(Yi|Xi,Yi>0)/¶Xij + E(Yi|Xi,Yi>0) ¶Fi/¶Xij
= Fibj |
We note that the last censored marginal effect differs from the first uncensored one by a scale factor equal to the probability of that observation not being censored. In other words, the scale factor is equal to Fi (recall that Fi is 1-Prob(Yi=0)).
The tobit model is often estimated for comparison with the alternative probit or count model specifications. The model can be easily extended to consider more than one censoring point. For instance, we could censor both tails of the distribution. This is an example of a doubly censored regression.
Y = Number of affairs in the past year: 0, 1, 2, 3, 4-10 (coded as 7), 11 or more (coded as 12).
The preponderance of zeros (no affairs) may not render the tobit model to be the best for the study, here we present only the model using five explanatory variables as follows:
Z2 | = Age. |
Z3 | = Number of years married. |
Z5 | = Degree of religiousness: 1 (anti-religious), ..., 5 (very religious). |
Z7 | = Hollingshead scale of occupation: 1, ..., 7. |
Z8 | = Self-rating of marriage satisfaction: 1 (very unhappy), ..., 5 (very happy). |
The regression equation is:
Y = b0 + b2Z2 + b3Z3 + b5Z5 + b7Z7 + b8Z8 + e
where e ~ normal(0,s2I). The estimation and interpretation of the estimated tobit model are left as exercises (see Program and Data).
Y = | 0 | if no extramarital affair |
1 | otherwise (e.g., 1,2,3,7,12) |
If the specification of the tobit model is correct, then probit estimators should be consistent for b/s from the tobit model.
Y = | 0 | if Y*£ 0 |
Y* | if 0 < Y* < 4 or {1,2,3} | |
4 | if Y*³ 4 or {7,12} |
where Y* = Zb+e and e ~ normal(0,s2I).
We have shown that the probability of Y = 0 is:
Prob(Yi = 0) = Prob(Yi* £ 0) =
Prob(ei £ -Zib) =
Prob(ei/s £
-Zib/s)
It is easy to show that the probability of Y = 4 is:
Prob(Yi = 4) = Prob(Yi*³4) =
Prob(ei ³ 4-Zib) =
Prob(ei/s ³
(4-Zib)/s)
Finally, for Yi = {1,2,3}, the likelihood is simply the normal density function:
1/(2ps2)½ exp[-(Yi-Zib)2/(2s2)]
Then the corresponding log-likelihood function is
ll(b,s2) = å{i|Yi=0}lnF(-Zib/s) -1/2 å{i|Yi=1,2,3}[ln(2p)+ln(s2)+(Yi-Xib)2/s2] +å{i|Yi=4}ln(1-F((4-Zib)/s))
Estimate and interpret the doubly-censored tobit model.
Suppose Y = {0,1,2,...} follows a Poisson distribution with a parameter l>0:
f(Y|l) = e-llY / Y!
It is known that E(Y) = Var(Y) = l. If Y is to be explained by X such that E(Y|X) > 0, a natural approach is to set l = E(Y|X) and parameterized by the regression parameter b in the Poisson distribution function. For example,
l(X,b) = E(Y|X,b) = eXb > 0
Therefore, given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, the likelihood function is written as:
L(b) = Õi=1,2,...,N [e-l(Xi,b) l(Xi,b)Yi / Yi!]
The corresponding log-likelihood function is
ll(b) = åi=1,2,...,NYiln(l(Xi,b)) - åi=1,2,...,Nl(Xi,b) - åi=1,2,...,Nln(Yi!)
Maximum likelihood estimate of b is obtained from:
¶ll/¶b = åi=1,2,...,N[(Yi-l(Xi,b))/l(Xi,b)] [¶l(Xi,b)/¶b] = 0, and
¶2ll/¶b¶b' | = åi=1,2,...,N[(Yi-l(Xi,b))/l(Xi,b)] [¶2l(Xi,b)/¶b¶b'] |
+ åi=1,2,...,N[-Yi/l(Xi,b)2] [¶l(Xi,b)/¶b'] [¶l(Xi,b)/¶b] |
If l(Xi,b) = E(Yi|Xi,b) = eXib, the model is interpreted as:
¶E(Yi|Xi,b)/¶Xij = eXibbj = E(Yi|Xi,b)bj, or
bj = ¶E(Yi|Xi,b)/¶Xij / E(Yi|Xi,b)
E(Y|X,b,v) = lv = eXbv
Then Y follows a Poisson distribution with the density:
f(Y|lv) = e-lv(lv)Y / Y!
Suppose v follows a gamma distribution with E(v) = 1 and Var(v) = 1/q. That is,
g(v|q) = qq/G(q) vq-1e-qv
Therefore,
f(Y|l,q) | = ò0¥ e-lv(lv)Y/Y! g(v|q) dv |
= (qqlY)/(G(q)Y!) ò0¥ e-(l+q)v v(Y+q-1) dv | |
= [(qqlY)/(G(q)Y!)] [G(Y+q)/(l+q)y+q] | |
= [G(Y+q)/(G(q)Y!)] [l/(l+q)]Y [1-l/(l+q)]q |
This is one form of negative binomial distribution with mean l and variance l(1+l/q). By construction, it is a Poisson-Gamma mixture. Typically, the parameter 1/q is used to measure the extent of overdispersion. Given a sample of independent observations {(Yi,Xi), i=1,2,...,N}, and let li = l(Xi,b) = eXib, the log-likelihood function is written as:
ll(b,q) = åi=1,2,...N ln f(Yi|li,q)
The negative binomial model can be estimated by maximum likelihood without much difficulty. A test of the Poisson distribution is often carried out by testing the hypothesis q -> 0.
As discussed earlier, responses of 7 and 12 do not represent the actual data. We have re-coded 4 for both values of 7 and 12 as "4 or more" and treated it as a right censored observation. The Poisson and negative binomial model may be modified to consider the censored data.
Formulate, estimate, and compare the censored and uncensored versions of Poisson and negative binomial regression models, respectively.
F(t) = Pr(T £ t)
Clearly, F(0) = 0, and
1-F(t) = Pr(T>t) = S(t)
is the probability of "surviving" past t. Therefore S(t) is called survivor function.
Given the distribution of duration F, we can define the probability of "exiting the initial state" in the time interval (t,t+Dt] when the event has survived through t (or T>t) as follows:
Pr(t<T£t+Dt|T>t) | = (Pr(T£t+Dt)-Pr(T£t)) / Pr(T>t) |
= (F(t+Dt)-F(t)) / (1-F(t)) |
Taking the limit, we have
limDt®0Pr(t<T£t+Dt|T>t)
= f(t)/(1-F(t)) where f(t) = dF(t)/dt is the density function of T at t.
= f(t)/S(t)
= [-dS(t)/dt]/S(t)
= -dln(S(t))/dt, negative rate of survival.
Define the hazard function as:
h(t) = -dln(S(t))/dt
That is, ln(S(t)) = - ò0th(t)dt, or
S(t) = e-ò0th(t)dt
F(t) = 1 - S(t) = 1 - e-ò0th(t)dt
f(t) = dF(t)/dt = h(t) e-ò0th(t)dt
We need to specify the hazard function h(t) in order to study the duration data. Denote h(t|q) the hazard function of t with unknown parameter q.
S(t|q) = e-qt,
F(t|q) = 1-e-qt, and
f(t|q) = qe-qt
In this case, F is the exponential distribution. q is the parameter which can be estimated by the reciprocal of the sample mean because E(T) = 1/q for the exponential distributed random variable T.
S(t|q) = e-(a+½bt)t
F(t|q) = 1-e-(a+½bt)t, and
f(t|q) = (a+bt)e-(a+½bt)t
The problem is that the estimated hazard h(t|q) may be negative.
h(t|q) = lr(lt)r-1/[1+(lt)r]
S(t|q) = 1/[1+(lt)r]
Therefore, f(t|q) = lr(lt)r-1/[1+(lt)r]2
S(t|q) =
ò-¥-rln(lt)
1/(2p)½exp(-z2/2)dz
f(t|q) =
(r/t)
1/(2p)½exp(-½[rln(lt)]2)
S(t|q) = e-(lt)r
F(t|q) = 1-e-(lt)r
f(t|q) = lr(lt)(r-1)
e-(lt)r
Survival Distribution | Median Duration | Expected Duration |
Exponential | (1/l)ln(2) | (1/l)G(2) |
Weibull | (1/l)[ln(2)]1/r (1/l)G(1+1/r) | |
Log-Normal | (1/l) | (1/l)[exp(1/r2)]½ |
Log-Logistic | (1/l) | (1/l)[exp(p2/(3r2)]½ |
ll(q) = åi=1,2,...,N ln f(ti|q) = åi=1,2,...,N [ln h(ti|q) + ln S(ti|q)]
The vector of parameters q of these models can be estimated by maximum likelihood. Censored observations can be incorporated easily as such:
ll(q) | = åt=uncensored obs. ln f(t|q) + åt=censored obs. ln S(t|q) |
= åt=uncensored obs. ln h(t|q) + åt=all obs. ln S(t|q) |
For example, the log-likelihood function for Weibull distribution is:
ll(l,r) = åi=1,2,...,N [ln(r)-ln(ti)] + åi=1,2,...,N r[ln(l)+ln(ti)] - åi=1,2,...,N (lti)r
The introduction of explanatory variables to the duration models is fairly straightforward, although the interpretation of the estimated parameters in the model is difficult. Consider, for example, the Weibull model. Let li = e-Xib where Xi is assumed to be the same from T=0 to T=t. The corresponding log-likelihood function is:
ll(b,l,r)
To incorporate heterogeneity into the Weibull model, suppose the survival function is conditioned on the individual effect v as:
S(t|v) = e-(vlt)r
and v follows a gamma distribution with E(v)=1 and Var(v)=1/g:
g(v) = [gg/G(g)] vg-1e-gv
Then, S(t) = Ev[S(t|v)] = òvS(t|v)g(v)dv = [1+(1/g)(lt)r]-g
The limiting value, with 1/g->0, is the Weibull survival model, so Var(v)->0 or no heterogeneity. The corresponding hazard function is
h(t) = lr(lt)r-1[S(t)]1/g = lr(lt)r-1/[1+(1/g)(lt)r]
Therefore,
f(t) = lr(lt)r-1[S(t)]1+1/g = lr(lt)r-1/[1+(1/g)(lt)r]1+g