Fixed Effects and Random Effects

Readings and References:

B. H. Baltagi, Econopmetric Analysis of Panel Data, 4th. ed., John Wiley, 2008.
C. Hsiao, Analysis of Panel Data, 2nd. ed., Cambridge University Press, 2003.
W. H. Greene, Econometric Analysis, 5th ed., Prentice Hall, 2002.
K.-P. Lin, Computational Econometrics: GAUSS Programming for Econometricians and Financial Analysts, ETEXT Publishing, 2001; Chinese Edition, Tsinghua University Press, 2003.
J. M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, The MIT Press, 2002.

Model Estimation

Fixed Effects Model
- Dummy Variables Approach
- Deviations Approach
- Testing for Fixed Effects
Random Effects Model
- Partial Deviations Approach
- Maximum Likelihood Estimation
- Testing for Random Effects
Hausman's Test for Fixed or Random Effects

Extensions

Random Coefficents Model
Seemingly Unrelated Regression Model

Example: Cost Function

Data Files (Greene [1999], Chap. 14)
- Cost.txt
- Output.txt
GAUSS Programs:
1. Analyzing One-Way Effects
  - Fixed Effects Model: Dummy Variable Approach
  - Fixed and Random Effects Models: Deviation Approach
2. Analyzing Two-Way Effects

The Model

For each cross section (individual) i=1,2,...N and each time period (time) t=1,2,...T,

Y_it = X_itb_it + e_it

Let b_it = b and assume e_it = u_i + v_t + e_it where u_i represents the individual or cross section differnence in intercept and v_t is the time difference in intercept. Two-ways analysis includes both time and individual effects. For simplicity, we further assume v_t = 0. That is, only the one-way individual effects will be analyzed:

e_it = u_i + e_it

The component e_it is a classical error term, with zero mean, homogeneous variance, and there is no serial correlation and no contemporary correlation. Also, e_it is uncorrelated with the regressors X_it. That is,

E(e_it) = 0
E(e_it²) = s_e²
E(e_ite_it) = 0, for t¹t
E(e_ite_jt) = 0, for i¹j
E(X_ite_it) = 0

Fixed Effects Model

Assume that the error component u_i, the individual differnence, is fixed or nonstochastic (but it varies across individuals). Thus, the model error is simply e_it = e_it. The model is expressed as:

Y_it = (X_itb + u_i) + e_it

where u_i is interpreted as the change in the intercept. Therefore the individual effect is defined as u_i plus the intercept. If X_it does not include the constant term, u_i itself is the individual effect.

Random Effects Model

Assume that the error component u_i, the individual differnence, is random and satisfies the following assumptions:

E(u_i) = 0 (zero mean)
E(u_i²) = s_u² (homoscedasticity)
E(u_iu_j) = 0 for i¹j (no cross-section correlation)
E(u_ie_it) = E(u_ie_it) = 0 (independent from each e_it or e_it)
E(u_ie_it) = E(u_ie_jt) = 0 (independent from each e_it or e_jt)
E(X_itu_i) = 0 for all i and t (Note: this is not assumed in the fixed effects model)

Then, the model error is e_it = u_i + e_it with the following structure:

E(e_it) = E(u_i + e_it) = 0
E(e_it²) = E[(u_i + e_it)²] = s_u² + s_e²
E(e_ite_it) = E[(u_i + e_it)(u_i + e_it)] = s_u², for t¹t
E(e_ite_jt) = E[(u_i + e_it)(u_j + e_jt)] = 0, for i¹j

In other words, for each cross section i, the variance covariance matrix of the model error e_i = [e_i1, e_i2, ...,e_iT]' is the following TxT matrix:

S =

é

ê

ê

ë

s_e²+s_u² s_u² .. s_u²

s_u² s_e²+s_u² .. s_u²

: : : :

s_u² s_u² .. s_e²+s_u²

ù

ú

ú

û

= s_e²I_TxT + s_u²1_TxT

where i is a vector of T ones, 1 = ii' is a TxT matrix of ones, and I is a TxT identity matrix. Let e be a NT-element vector of the stacked errors e₁, e₂, ..., e_N, e = [e₁,e₂, ..., e_N]', then E(e) = 0 and E(ee') = I_NxNÄS_TxT, where S is the TxT variance-covariance matrix defined above.

In summary with matrix notation, the random effects model is defined by

Y_NTx1 = X_NTxKb_Kx1 + u_Nx1Äi_Tx1 + e_NTx1

The assumptions are:

E(e) = 0_NTx1
E(ee') = I_NxNÄS_TxT = I_NxNÄ(s_e²I_TxT+s_u²1_TxT)
E(X'e) = E[X'(uÄi)+X'e] = 0_Kx1

Model Estimation

Let Y_i = [Y_i1,Y_i2,...,Y_iT]', X_i = [X_i1,X_i2,...,X_iT]', and e_i = [e_i1,e_i2,...,e_iT]', then the pooled (stacked) model is

é

ê

ê

ë

Y₁

Y₂

:

Y_N

ù

ú

ú

û

=

é

ê

ê

ë

X₁

X₂

:

X_N

ù

ú

ú

û

b +

é

ê

ê

ë

e₁

e₂

:

e_N

ù

ú

ú

û

or, Y = Xb + e

Fixed Effects Model

Consider the model as follows:

Y_it = (X_itb + u_i) + e_it (i=1,2,...,N; t=1,2,...,T).

é

ê

ê

ë

Y₁

Y₂

:

Y_N

ù

ú

ú

û

=

é

ê

ê

ë

X₁

X₂

:

X_N

ù

ú

ú

û

b +

é

ê

ê

ë

u₁i

u₂i

:

u_Ni

ù

ú

ú

û

+

é

ê

ê

ë

e₁

e₂

:

e_N

ù

ú

ú

û

or, Y = Xb + e = Xb + uÄi + e

Dummy Variables Approach
For each i, define NT´1 vector D_i with the element:

D_ij = 1 if (i-1)´T+1 £ j £ i´T

0 otherwise

Then D = [D₁, D₂, ..., D_N-1] is NT´(N-1) matrix of N-1 dummy variables. Ordinary least squares can be used to estimate the model with dummy variables as follows:
Y = Xb + uÄi + e = Xb + Dd + e
Since X includes a constant term, one less dummy variables are included for estimation and the estimated d measures the individual change from the intercept.
Deviation Approach

Let Y^m_i = (S_t=1,2,...,TY_it)/T, X^m_i = (S_t=1,2,...,TX_it)/T, and e^m_i = (S_t=1,2,...,Te_it)/T. Then the within estimates of the model can be obtained by estimating the mean deviation model:
(Y_it - Y^m_i) = (X_it - X^m_i)b + (e_it - e^m_i)
Or, equivalently
Y_it = X_itb + (Y^m_i - X^m_ib) + (e_it - e^m_i)
Note that the constant term drops out due to mean deviation transformation. The degree of freedom for estimating the above mean deviation model is NT-K-1 (K is the number of explanatory variables including constant term). Therefore, the estimated individual effects of the model is u_i = Y^m_i - X^m_ib. The variance-covariance matrix of individual effects is estimated as follows:
Var(u_i) = v/T + X^m_i [Var(b)] X^m_i'
where v is the estimated variance of the mean deviation regression corrected for the degree of freedom NT-N-K (instead of NT-K-1). That is,
v = S_i=1,2,...,NS_t=1,2,...,T (e_it - e^m_i)² / (NT-N-K).
It may be of interest to estimate the between parameters of the model by estimating
Y^m_i = X^m_ib + u_i + e^m_i
which is related to the estimated individual effects from the within estimates.
Testing for Fixed Effects
Based on the dummy variable approach, this is a Wald F-test for the joint significance of the parameters associated with dummy variables representing the individual effects. If the null hypothesis d = 0 can not be rejected, then there is no fixed effects in the model.
Based on the deviation approach, the equivalent test statistic is computed from the restricted (pooled model) and unrestricted (mean deviation model) sum of squared residuals RSS_R and RSS_UR, repectively. That is,



RSS_R-RSS_U

N-1





RSS_U

NT-N-K



~ F(N-1, NT-N-K)

Random Effects Model

Recall the pooled model for estimation

Y = Xb + e

where e = [e₁,e₂,...,e_N]', e_i = [e_i1,e_i2,...,e_iT]', and the random error components e_it = u_i + e_it. By assumptions, E(e) = 0, and E(ee') = IÄS. The Generalized Least Squares estimates of b is

b = [X'(IÄS^-1)X]^-1 X'(IÄS^-1)Y

= [å_i=1,...,NX_i'S^-1X_i]^-1 [å_i=1,...,NX_i'S^-1Y_i]

where

S =

é

ê

ê

ë

s_e²+s_u² s_u² .. s_u²

s_u² s_e²+s_u² .. s_u²

: : : :

s_u² s_u² .. s_e²+s_u²

ù

ú

ú

û

= s_e²I + s_u²1

and

S^-1 = (s_e²I + s_u²1)^-1 = (1/s_e²)[I+(s_u²/s_e²)1]^-1 = (1/s_e²){I-[s_u²/(Ts_u²+s_e²)]1}

Since S^-1 can be derived from the estimated variance components s_e² and s_u², in practice the model is estimated using the following partial deviation approach.

Partial Deviation Approach
1. Obtain the estimated variance s_e² from the fixed effect model (by either dummy variable approach or deviation approach).
  Let v = s_e².
2. Assuming the randomness of u_i, estimate the between parameters of the model:
  Y^m_i = X^m_ib + (u_i + e^m_i)
  where the error structure of u_i + e^m_i satisfies:
  E(u_i + e^m_i) = 0
  E((u_i + e^m_i)²) = s_u² + s_e²/T
  E((u_i + e^m_i)(u_j + e^m_j)) = 0, for i¹j
  Let v₁ = T s_u² + s_e² = T s_u² + v.
  If v1 > v, then define w = 1 - (v/v₁)^½.
  In case of v1 £ v, the estimate of s_u² becomes negative. The alternative is to use (v₀-v) for s_u², where v₀ is the estimated variance s² obtained from the pooled model:
  Y_it = X_itb + e_it
  v₀ is a consistent estimator of s_u² + s_e², where the estimator of s_e² is v (obtained from the estimated fixed effect model, see Step 1). Then the consistent estimator of s_u² is (v₀-v). If v₀ £ v, we need to use large sample variances to construct the estimator of s_u²:
  v₀ = [(NT-K-1)/NT] s²
  v = [(NT-N-K)/NT] s_e²
  Let v₁ = T (v₀-v) + v, and define w = 1 - (v/v₁)^½.
3. Using w to transform (partial deviations) all the data series as follows:
  Y^*_it = Y_it - w Y^m_i
  X^*_it = X_it - w X^m_i
  Then the model for estimation is:
  Y^*_it = X^*_itb + e^*_it
  where e^*_it = (1-w) u_i + e_it - w e^m_i.
  Or, equivalently
  Y_it = X_itb + w (Y^m_i - X^m_ib) + e^*_it
  It is easy to validate that
  E(e^*_it) = 0
  E(e^*2_it) = s_e²
  E(e^*_ite^*_it) = 0 for t¹t
  E(e^*_ite^*_jt) = 0 for i¹j
  The least squares estimate of [w (Y^m_i - X^m_ib)] is interpreted as the change of individual effects.
Maximum Likelihood Estimation
Assume the panel data model Y = Xe + e ~ Normal(0,IÄS)
or, e_i ~ Normal (0,S), i = 1,2,...,N
where S = s_e²I + s_u²1
Write the probability density of e_i as
f(e_i) = (1/(2p)^½)^T |S|^-½ exp(-½e_i'S^-1e_i)
The corresponding log-likelihood function is
ll_i(b,s_e²,s_u²) = -T/2 ln(2p) -½ ln|S| -½ (Y_i-X_ib)'S^-1(Y_i-X_ib)
Because S^-1 = (1/s_e²){I-[s_u²/(Ts_u²+s_e²)]1} (see above) and
|S| = |s_e²I+s_u²1| = (s_e²)^T|I+(s_u²/s_e²)1| = (s_e²)^T(Ts_u²/s_e²+1)
(Note: |I+(s_u²/s_e²)1| = product of eigenvalues: (Ts_u²/s_e²)+1,1,1,...,1)
Therefore
ll_i(b,s_e²,s_u²) = -T/2 ln(2ps_e²) -½ ln(Ts_u²/s_e²+1)
-½ (1/s_e²) {[å_t=1,...,T(Y_it-X_itb)²] - [s_u²/(Ts_u²+s_e²)] [å_t=1,...,T(Y_it-X_itb)]²}
The maximum likelihood estimator of the parameters vector (b,s_e²,s_u²) is obtained by maximizing the total log-likelihood defined by
ll(b,s_e²,s_u²) = å_i=1,...,N ll_i(b,s_e²,s_u²)

Testing for Random Effects

To test for no correlation relationship of the error terms e_it = u_i + e_it and e_it = u_i + e_it, that is s_u² = 0, the following Breusch-Pagan LM test statistic based on the estimated residuals of the restricted (pooled) model, e_it (i=1,2,...N, t=1,2,...,T), is distributed as Chi-square with one degree of freedom:

LM =



NT

2(T-1)

æ

ç

è

S_i=1,2,...N (S_t=1,2,...Te_it)²

S_i=1,2,...,N S_t=1,2,...,Te_it²



- 1



ö

÷

ø

²

    =



NT

2(T-1)

æ

ç

è

S_i=1,2,...N (Te^m_i)²

S_i=1,2,...,N S_t=1,2,...,Te_it²



- 1



ö

÷

ø

²

Note that e^m_i = S_t=1,2,...,Te_it/T.

Hausman's Test for Fixed or Random Effects

This is a test of the null hypothesis E(X'u) = 0 or the random effects vs. fixed effects. Let b_fixed be the estimated slope parameters of the fixed effects model (using dummy variable approach), and b_random be the estimated slope parameters of the random effects model. Moreover, Var(b_fixed) and Var(b_random) are the corresponding estimated variance-covariance matrix, respectively. Hausman's test for no difference of these two sets of parameters is a Chi-square test in which the degree of freedom corresponds to the number of slope parameters. The test statistic is defined as follows:

H = (b_random-b_fixed)'[Var(b_random)-Var(b_fixed)]^-1(b_random-b_fixed)

The rejection of the null hypothesis implies the fixed effects model. However, not rejecting E(X'u) = 0 could suggest either a random effects or fixed effects model.

Example: Airline Services Cost Function

See Lesson 16.1, Lesson 16.2 of Lin [2001, 2003]. For maximum likelihood estimation of random effects model, see here.

Firm Within Estimates
Fixed Effects Within Estimates
Random Effects ML Estimates
Random Effects

1 9.7059
(0.19323) 9.6378
(0.18313) 9.6319

2 9.6647
(0.19908) 9.5979
(0.18716) 9.5860

3 9.4970
(0.22505) 9.4408
(0.20686) 9.4055

4 9.8905
(0.24185) 9.7780
(0.21918) 9.7892

5 9.7300
(0.26102) 9.6299
(0.23371) 9.6194

6 9.7930
(0.26374) 9.6831
(0.23544) 9.6798

Extensions

Unbalanced Panel Data

Panels in which the group sizes (time periods) differ across groups (individuals) are not unusual in empirical panel data analysis. These panels are called unbalanced panels. Estimation for fixed effects and random effects models discussed above must be modified to reflect the structure of unbalanced panels. Modify the dummy variable or deviation approach for estimating the fixed effects with unbalanced panel data is straightforward. However, for the random effects model, by allowing unequal group sizes, there presents the problem of groupwise heteroscedasticity. In particular, for each cross section unit i the unequal length of the time periods is T_i. Then the total number of observations is å_i=1,...,NT_i. The estimation methods described above can be modified for unbalanced panel data.

Random Coefficients Model

For each corss section i=1,2,...,N, the model is written as:

Y_i = X_ib_i + e_i
b_i = b + u_i

where Y_i = [Y_i1,Y_i2,...,Y_iT]', X_i = [X_i1,X_i2,...,X_iT]', and e_i = [e_i1,e_i2,...,e_iT]'. We note that not only the intercept but also the slope parameters are random across individuals. The assumptions of the model are:

E(e_i) = 0_Tx1
Var(e_i) = E(e_ie_i') = s_i²I_TxT
Cov(e_i,e_j) = 0_TxT, i¹j

and

E(u_i) = 0_Kx1
Var(u_i) = E(u_iu_i') = G_KxK
Cov(u_i,u_j) = 0_KxK, i¹j
Cov(u_i,e_i) = 0_KxT

The model for estimation is

Y_i = X_ib + (X_iu_i + e_i), or
Y_i = X_ib + w_i where w_i = X_iu_i + e_i, and

E(w_i) = 0_Tx1
Var(w_i) = E(w_iw_i') = E(X_iu_iu_i'X_i'+X_iu_ie_i+e_iu_i'X_i+e_ie_i')
= s_i²I_TxT + X_iGX_i' = W_i

The stacked (pooled) model is

Y = Xb + w

where w = [w₁,...,w_N]', and

E(w) = 0_NTx1

Var(w) = E(ww') = V =

é

ê

ê

ë

W₁ 0 .. 0

0 W₂ .. 0

: : : :

0 0 .. W_N

ù

ú

ú

û

GLS is used to estimate the model. That is,

b^* = (X'V^-1X)^-1X'V^-1Y
Var(b^*) = (X'V^-1X)^-1

The computation is based on the following steps (Swamy, 1971):

For each regression equation i, Y_i = X_ib_i + e_i, obtain the OLS estimator of b_i:
b_i = (X_i'X_i)^-1X_i'Y_i
Var(b_i) = (X_i'X_i)^-1(X_i'W_iX_i)(X_i'X_i)^-1 = s_i²(X_i'X_i)^-1+N = V_i+G
(Taking account of heteroscedasticity, where V_i = s_i²(X_i'X_i)^-1)
Note that s_i² is estimated by s²_i = e_i'e_i/(N-K), where e_i = Y_i - X_ib_i.
Then, V_i = s_i²(X_i'X_i)^-1.
For the random coeffcients equation, b_i = b + u_i, the variance of b_i (estimator of b_i) is estimated by:
å_i=1,...,N(b_i-b^m)(b_i-b^m)'/(N-1) = å_i=1,...,N(b_ib_i'-N b^mb^m')/(N-1), where b^m = å_i=1,...,Nb_i/N.
Therefore, G = å_i=1,...,G(b_ib_i'-N b^mb^m')/(N-1) - å_i=1,...,NV_i/N
Concerning the possibility that G may be nonpositive definite, we use
G = å_i=1,...,N(b_ib_i'-N b^mb^m')/(N-1).
Write the GLS estimator of b as:
b^* = (X'V^-1X)^-1X'V^-1Y
= [å_i=1,...,NX_i'W_iX_i]^-1 [å_i=1,...,NX_i'W_iY_i]
= [å_i=1,...,NX_i'W_iX_i]^-1 [å_i=1,...,NX_i'W_iX_ib_i]
= [å_i=1,...,N(G+V_i)^-1]^-1 [å_i=1,...,N(G+V_i)^-1b_i]
= å_i=1,...,NW_ib_i, where W_i = [å_i=1,...,N(G+V_i)^-1]^-1 [(G+V_i)^-1].
Similarly,
Var(b^*) = (X'V^-1X)^-1 = [å_i=1,...,N(G+V_i)^-1]^-1

The individual parameter vectors may be predicted as follows:

b_i^* = (G+V_i)^-1[G^-1b^*+V_i^-1b_i] = A_ib^* + (I-A_i)b_i,
where A_i = (G+V_i)^-1G^-1.

Var(b_i^*) = [A_i    I-A_i]

é

ë

å_i=1,2,...,NW_i(G+V_i)W_i'    W_i(G+V_i)

(G+V_i)W_i'    (G+V_i)

ù

û

é

ë

A_i

I-A_i

ù

û

Seemingly Unrelated System Model

Consider a more general specification of the model:

Y_it = X_itb_i + e_it (i=1,2,...,N; t=1,2,...,T).

Let Y_i = [Y_i1,Y_i2,...,Y_iT]', X_i = [X_i1,X_i2,...,X_iT]', and e_i = [e_i1,e_i2,...,e_iT]', the stacked N equations (T observations each) system is Y = Xb + e, or

é

ê

ê

ë

Y₁

Y₂

:

Y_N

ù

ú

ú

û

=

é

ê

ê

ë

X₁ 0 .. 0

0 X₂ .. 0

: : : :

0 0 .. X_N

ù

ú

ú

û

é

ê

ê

ë

b₁

b₂

:

b_N

ù

ú

ú

û

+

é

ê

ê

ë

e₁

e₂

:

e_N

ù

ú

ú

û

Notice that not only the intercept but also the slope terms of the estimated parameters are different across individuals. The error structure of the model is summarized as follows:

E(e) = 0
E(X'e) = 0
E(ee') = SÄI, where S = [s_ij, i,j=1,2,...N] is the NxN variance-covariance matrix and I is a TxT identity matrix. Notice that contemporary correlation across individuals is assumed although there is no time serial correlation. The error structure of this model is different than that of random effects model described above.

Parameter restrictions can be built into the matrix X and the corresponding parameter vector b. The model is estimated using techniques for systems of regression equations.

The system estimation techniques such as 3SLS and FIML should be used for parameter estimation. It is called the Seemingly Unrelated Regression Estimation (SURE) in the current context. Denote b and S as the estimated b and S, respectively. Then,

b = [X'(S^-1ÄI)X]^-1X'(S^-1ÄI)Y
Var(b) = [X'(S^-1ÄI)X]^-1, and
S = ee'/T, where e = Y-Xb is the estimated error e.

b	= [X'(IÄS^-1)X]^-1 X'(IÄS^-1)Y
	= [å_i=1,...,NX_i'S^-1X_i]^-1 [å_i=1,...,NX_i'S^-1Y_i]

Firm	Within Estimates Fixed Effects	Within Estimates Random Effects	ML Estimates Random Effects
1	9.7059 (0.19323)	9.6378 (0.18313)	9.6319
2	9.6647 (0.19908)	9.5979 (0.18716)	9.5860
3	9.4970 (0.22505)	9.4408 (0.20686)	9.4055
4	9.8905 (0.24185)	9.7780 (0.21918)	9.7892
5	9.7300 (0.26102)	9.6299 (0.23371)	9.6194
6	9.7930 (0.26374)	9.6831 (0.23544)	9.6798

D_ij =	1 if (i-1)´T+1 £ j £ i´T
	0 otherwise

Fixed Effects and Random Effects

Readings and References:

Table of Contents

The Model

Model Estimation

Extensions

Example: Cost Function

The Model

Fixed Effects Model

Model Estimation

Fixed Effects Model

Random Effects Model

Hausman's Test for Fixed or Random Effects

Example: Airline Services Cost Function

Extensions

Unbalanced Panel Data

Random Coefficients Model

Seemingly Unrelated System Model