Panel Data

Model Estimation

Fixed Effects Model
- Dummy Variables Approach
- Deviations Approach
- Testing for Fixed Effects
Random Effects Model
- Partial Deviations Approach
- Testing for Random Effects
Hausman's Test for Fixed or Random Effects

Extensions

Random Coefficents Model
Seemingly Unrelated Regression Model

Example: Cost Function

Data Files (Greene [1999], Chap. 14)
- Cost.txt
- Output.txt

The Model

For each cross section (individual) i=1,2,...N and each time period (time) t=1,2,...T,

Y_it = X_itβ_it + ε_it

Let β_it = β and assume ε_it = u_i + v_t + e_it where u_i represents the individual or cross section differnence in intercept and v_t is the time difference in intercept. Two-ways analysis includes both time and individual effects. For simplicity, we further assume v_t = 0. That is, there is no time effect. In other words, only the one-way individual effects will be analyzed in the following.

The component e_it is a classical error term, with zero mean, homogeneous variance, and there is no serial correlation and no contemporary correlation. Also, e_it is uncorrelated with the regressors X_it. That is,

E(e_it) = 0
E(e_it²) = σ²_e
E(e_ite_iτ) = 0, for t≠τ
E(e_ite_jt) = 0, for i≠j
E(X_ite_it) = 0

Fixed Effects Model

Assume that the error component u_i, the individual differnence, is fixed or nonstochastic (but it varies across individuals). Thus, the model error is simply ε_it = e_it. The model is expressed as:

Y_it = (X_itβ + u_i) + e_it

where u_i is interpreted as the change in the intercept. Therefore the individual effect is defined as u_i plus the intercept.

Random Effects Model

Assume that the error component u_i, the individual differnence, is random and satisfies the following assumptions:

E(u_i) = 0 (zero mean)
E(u_i²) = σ²_u (homoscedasticity)
E(u_iu_j) = 0 for i≠j (no cross-section correlation)
E(u_ie_it) = E(u_ie_jt) = 0 (independent from each e_it or e_jt)

Then, the model error is ε_it = u_i + e_it with the following structure:

E(ε_it) = E(u_i + e_it) = 0
E(ε_it²) = E[(u_i + e_it)²] = σ²_u + σ²_e
E(ε_itε_iτ) = E[(u_i + e_it)(u_i + e_iτ)] = σ²_u, for t≠τ
E(ε_itε_jt) = E[(u_i + e_it)(u_j + e_jt)] = 0, for i≠j

In other words, for each cross section i, the variance covariance matrix of the model error ε_i = [ε_i1, ε_i2, ...,ε_iT]' is the following TxT matrix:

∑ =

⌈

|

|

⌊

σ²_e+σ²_u σ²_u .. σ²_u

σ²_u σ²_e+σ²_u .. σ²_u

: : : :

σ²_u σ²_u .. σ²_e+σ²_u

⌉

|

|

⌋

= σ²_eI + σ²_u1

Let ε be a NT-element vector of the stacked errors ε₁, ε₂, ..., ε_N, ε = [ε₁,ε₂, ..., ε_N]', then E(ε) = 0 and E(εε') = I⊗Σ, where 1 is an NxN matrix of ones, I is an NxN identity matrix, and ∑ is the TxT variance-covariance matrix defined above.

Model Estimation

Let Y_i = [Y_i1,Y_i2,...,Y_iT]', X_i = [X_i1,X_i2,...,X_iT]', and ε_i = [ε_i1,ε_i2,...,ε_iT]', then the pooled (stacked) model is

⌈

|

|

⌊

Y₁

Y₂

:

Y_N

⌉

|

|

⌋

=

⌈

|

|

⌊

X₁

X₂

:

X_N

⌉

|

|

⌋

β +

⌈

|

|

⌊

ε₁

ε₂

:

ε_N

⌉

|

|

⌋

or, Y = Xβ + ε

Fixed Effects Model

Consider the model as follows:

Y_it = (X_itβ + u_i) + e_it (i=1,2,...,N; t=1,2,...,T).

⌈

|

|

⌊

Y₁

Y₂

:

Y_N

⌉

|

|

⌋

=

⌈

|

|

⌊

X₁

X₂

:

X_N

⌉

|

|

⌋

β +

⌈

|

|

⌊

u₁

u₂

:

u_N

⌉

|

|

⌋

+

⌈

|

|

⌊

e₁

e₂

:

e_N

⌉

|

|

⌋

or, Y = Xβ + u + e

Dummy Variables Approach
For each i, define NT×1 vector D_i with the element:

D_ij = 1 if (i-1)×T+1 ≤ j ≤ i×T

0 otherwise

Then D = [D₁, D₂, ..., D_N-1] is NT×(N-1) matrix of N-1 dummy variables. Ordinary least squares can be used to estimate the model with dummy variables as follows:
Y = Xβ + u + e = Xβ + Dδ + e
Since X includes a constant term, one less dummy variables are included for estimation and the estimated δ measures the individual change from the intercept.
Deviation Approach

Let Y^m_i = (∑_t=1,2,...,TY_it)/T, X^m_i = (∑_t=1,2,...,TX_it)/T, and e^m_i = (∑_t=1,2,...,Te_it)/T. Then the within estimates of the model can be obtained by estimating the mean deviation model:
(Y_it - Y^m_i) = (X_it - X^m_i)β + (e_it - e^m_i)
Or, equivalently
Y_it = X_itβ + (Y^m_i - X^m_iβ) + (e_it - e^m_i)
Note that the constant term drops out due to mean deviation transformation. The degree of freedom for estimating the above mean deviation model is NT-K-1 (K is the number of explanatory variables including constant term). Therefore, the estimated individual effects of the model is u_i = Y^m_i - X^m_iβ. The variance-covariance matrix of individual effects is estimated as follows:
Var(u_i) = v/T + X^m_i [Var(β)] X^m_i'
where v is the estimated variance of the mean deviation regression corrected for the degree of freedom NT-N-K (instead of NT-K-1). That is,
v = ∑_i=1,2,...,N∑_t=1,2,...,T (e_it - e^m_i)² / (NT-N-K).
It may be of interest to estimate the between parameters of the model by estimating
Y^m_i = X^m_iβ + u_i + e^m_i
which is related to the estimated individual effects from the within estimates.
Testing for Fixed Effects
Based on the dummy variable approach, this is a Wald F-test for the joint significance of the parameters associated with dummy variables representing the individual effects. If the null hypothesis δ = 0 can not be rejected, then there is no fixed effects in the model.
Based on the deviation approach, the equivalent test statistic is computed from the restricted (pooled model) and unrestricted (mean deviation model) sum of squared residuals. That is,



RSS_R-RSS_U

N-1





RSS_U

NT-N-K



~ F(N-1, NT-N-K)

Random Effects Model

Recall the pooled model for estimation

Y = Xβ + ε

where ε = [ε₁,ε₂,...,ε_N]', ε_i = [ε_i1,ε_i2,...,ε_iT]', and the random error components ε_it = u_i + e_it. By assumptions, E(ε) = 0, and E(εε') = I⊗Σ. The Generalized Least Squares estimates of β is

β = [X'(I⊗Σ^-1)X]^-1X'(I⊗Σ^-1)Y

∑ =

⌈

|

|

⌊

σ²_e+σ²_u σ²_u .. σ²_u

σ²_u σ²_e+σ²_u .. σ²_u

: : : :

σ²_u σ²_u .. σ²_e+σ²_u

⌉

|

|

⌋

= σ²_eI + σ²_u1

Since Σ^-1 = (1/σ²_e)I + [σ²_u/(σ²_e-Tσ²_u)]1 can be derived from the estimated variance components σ²_e and σ²_u, in practice the model is estimated using the following partial deviation approach.

Partial Deviation Approach
1. Obtain the estimated variance σ²_e from the fixed effect model (by either dummy variable approach or deviation approach).
  Let v = σ²_e.
2. Assuming the randomness of u_i, estimate the between parameters of the model:
  Y^m_i = X^m_iβ + (u_i + e^m_i)
  where the error structure of u_i + e^m_i satisfies:
  E(u_i + e^m_i) = 0
  E((u_i + e^m_i)²) = σ²_u + σ²_e/T
  E((u_i + e^m_i)(u_j + e^m_j)) = 0, for i≠j
  Let v₁ = T σ²_u + σ²_e = T σ²_u + v.
  If v1 > v, then define w = 1 - (v/v₁)^½.
  In case of v1 ≤ v, the estimate of σ²_u becomes negative. The alternative is to use (v₀-v) for σ²_u, where v₀ is the estimated variance σ² obtained from the pooled model:
  Y_it = X_itβ + ε_it
  v₀ is a consistent estimator of σ²_u + σ²_e, where the estimator of σ²_e is v (obtained from the estimated fixed effect model, see Step 1). Then the consistent estimator of σ²_u is (v₀-v). If v₀ ≤ v, we need to use large sample variances to construct the estimator of σ²_u:
  v₀ = [(NT-K-1)/NT] σ²
  v = [(NT-N-K)/NT] σ²_e
  Let v₁ = T (v₀-v) + v, and define w = 1 - (v/v₁)^½.
3. Using w to transform (partial deviations) all the data series as follows:
  Y^*_it = Y_it - w Y^m_i
  X^*_it = X_it - w X^m_i
  Then the model for estimation is:
  Y^*_it = X^*_itβ + ε^*_it
  where ε^*_it = (1-w) u_i + e_it - w e^m_i.
  Or, equivalently
  Y_it = X_itβ + w (Y^m_i - X^m_iβ) + ε^*_it
  It is easy to validate that
  E(ε^*_it) = 0
  E(ε^*2_it) = σ²_e
  E(ε^*_itε^*_iτ) = 0 for t≠τ
  E(ε^*_itε^*_jt) = 0 for i≠j
  The least squares estimate of [w (Y^m_i - X^m_iβ)] is interpreted as the change of individual effects.

Testing for Random Effects

To test for no correlation relationship of the error terms u_i + e_it and u_i + e_iτ, the following Breusch-Pagan LM test statistic based on the estimated residuals of the restricted (pooled) model, ε_it (i=1,2,...N, t=1,2,...,T), is distributed as Chi-square with one degree of freedom:

LM =



NT

2(T-1)

⌈

|

⌊

∑_i=1,2,...N (∑_t=1,2,...Tε_it)²

∑_i=1,2,...,N ∑_t=1,2,...,Tε_it²



- 1



⌉

|

⌋

²

    =



NT

2(T-1)

⌈

|

⌊

∑_i=1,2,...N (Tε^m_i)²

∑_i=1,2,...,N ∑_t=1,2,...,Tε_it²



- 1



⌉

|

⌋

²

Note that ε^m_i = ∑_t=1,2,...,Tε_it/T.

Hausman's Test for Fixed or Random Effects

Let b_fixed be the estimated slope parameters of the fixed effects model (using dummy variable approach), and b_random be the estimated slope parameters of the random effects model. Moreover, Var(b_fixed) and Var(b_random) are the corresponding estimated variance-covariance matrix, respectively. Hausman's test for no difference of these two sets of parameters is a Chi-square test in which the degree of freedom corresponds to the number of slope parameters. The test statistic is defined as follows:

H = (b_random-b_fixed)'[Var(b_random)-Var(b_fixed)]^-1(b_random-b_fixed)

Extensions

Unbalanced Panel Data

Panels in which the group sizes (time periods) differ across groups (individuals) are not unusual in empirical panel data analysis. These panels are called unbalanced panels. Estimation for fixed effects and random effects models discussed above must be modified to reflect the structure of unbalanced panels. Modify the dummy variable or deviation approach for estimating the fixed effects with unbalanced panel data is straightforward. However, for the random effects model, by allowing unequal group sizes, there presents the problem of groupwise heteroscedasticity.

Random Coefficients Model

For each corss section i=1,2,...,N, the model is written as:

Y_i = X_iβ_i + ε_i
β_i = β + υ_i

where Y_i = [Y_i1,Y_i2,...,Y_iT]', X_i = [X_i1,X_i2,...,X_iT]', and ε_i = [ε_i1,ε_i2,...,ε_iT]'. We note that not only the intercept but also the slope parameters are random across individuals. The assumptions of the model are:

E(ε_i) = 0_Nx1
Var(ε_i) = E(ε_iε_i') = σ_i²I_NxN
Cov(ε_i,ε_j) = 0_NxN, i≠j

and

E(υ_i) = 0_Kx1
Var(υ_i) = E(υ_iυ_i') = Γ_KxK
Cov(υ_i,υ_j) = 0_KxK, i≠j
Cov(υ_i,ε_i) = 0_Kx1

The model for estimation is

Y_i = X_iβ + (X_iυ_i + ε_i), or
Y_i = X_iβ + ω_i where ω_i = X_iυ_i + ε_i, and

E(ω_i) = 0_Nx1
Var(ω_i) = E(ω_iω_i') = E(X_iυ_iυ_i'X_i'+X_iυ_iε_i+ε_iυ_i'X_i+ε_iε_i')
= σ_i²I_NxN + X_iΓX_i' = Ω_i

The stacked (pooled) model is

Y = Xβ + ω

where ω = [ω₁,...,ω_N]', and

E(ω) = 0_NTx1

Var(ω) = E(ωω') = V =

⌈

|

|

⌊

Ω₁ 0 .. 0

0 Ω₂ .. 0

: : : :

0 0 .. Ω_N

⌉

|

|

⌋

GLS is used to estimate the model. That is,

b^* = (X'V^-1X)^-1X'V^-1Y
Var(b^*) = (X'V^-1X)^-1

The computation is based on the following steps (Swamy, 1971):

For each regression equation i, Y_i = X_iβ_i + ε_i, obtain the OLS estimator of β_i:
b_i = (X_i'X_i)^-1X_i'Y_i
Var(b_i) = (X_i'X_i)^-1(X_i'Ω_iX_i)(X_i'X_i)^-1 = σ_i²(X_i'X_i)^-1+Γ = V_i+Γ
(Taking account of heteroscedasticity, where V_i = σ_i²(X_i'X_i)^-1)
Note that σ_i² is estimated by s²_i = e_i'e_i/(N-K), where e_i = Y_i - X_ib_i.
Then, V_i = s_i²(X_i'X_i)^-1.
For the random coeffcients equation, β_i = β + υ_i, the variance of b_i (estimator of β_i) is estimated by:
∑_i=1,...,G(b_i-b^m)(b_i-b^m)'/(G-1) = ∑_i=1,...,G(b_ib_i'-G b^mb^m')/(G-1), where b^m = ∑_i=1,...,Gb_i/G.
Therefore, Γ = ∑_i=1,...,G(b_ib_i'-G b^mb^m')/(G-1) - ∑_i=1,...,GV_i/G
Concerning the possibility that Γ may be nonpositive definite, we use
Γ = ∑_i=1,...,G(b_ib_i'-G b^mb^m')/(G-1).
Write the GLS estimator of β as:
b^* = (X'V^-1X)^-1X'V^-1Y
= [∑_i=1,...,GX_i'Ω_iX_i]^-1 [∑_i=1,...,GX_i'Ω_iY_i]
= [∑_i=1,...,GX_i'Ω_iX_i]^-1 [∑_i=1,...,GX_i'Ω_iX_ib_i]
= [∑_i=1,...,G(Γ+V_i)^-1]^-1 [(Γ+V_i)^-1b_i]
= ∑_i=1,...,GW_ib_i, where W_i = [∑_i=1,...,G(Γ+V_i)^-1]^-1 [(Γ+V_i)^-1].
Similarly,
Var(b^*) = (X'V^-1X)^-1 = [∑_i=1,...,G(Γ+V_i)^-1]^-1

The individual parameter vectors may be predicted as follows:

b_i^* = (Γ+V_i)^-1[Γ^-1b^*+V_i^-1b_i] = A_ib^* + (I-A_i)b_i,
where A_i = (Γ+V_i)^-1Γ^-1.

Var(b_i^*) = [A_i    I-A_i]

⌈

⌊

∑_i=1,2,...,GW_i(Γ+V_i)W_i'    W_i(Γ+V_i)

(Γ+V_i)W_i'    (Γ+V_i)

⌉

⌋

⌈

⌊

A_i

I-A_i

⌉

⌋

Seemingly Unrelated System Model

Consider a more general specification of the model:

Y_it = X_itβ_i + ε_it (i=1,2,...,N; t=1,2,...,T).

Let Y_i = [Y_i1,Y_i2,...,Y_iT]', X_i = [X_i1,X_i2,...,X_iT]', and ε_i = [ε_i1,ε_i2,...,ε_iT]', the stacked N equations (T observations each) system is Y = Xβ + ε, or

⌈

|

|

⌊

Y₁

Y₂

:

Y_N

⌉

|

|

⌋

=

⌈

|

|

⌊

X₁ 0 .. 0

0 X₂ .. 0

: : : :

0 0 .. X_N

⌉

|

|

⌋

⌈

|

|

⌊

β₁

β₂

:

β_N

⌉

|

|

⌋

+

⌈

|

|

⌊

ε₁

ε₂

:

ε_N

⌉

|

|

⌋

Notice that not only the intercept but also the slope terms of the estimated parameters are different across individuals. The error structure of the model is summarized as follows:

E(ε) = 0
E(X'ε) = 0
E(εε') = Σ⊗I, where Σ = [σ_ij, i,j=1,2,...N] is the NxN variance-covariance matrix and I is a TxT identity matrix. Notice that contemporary correlation across individuals is assumed although there is no time serial correlation. The error structure of this model is different than that of random effects model described above.

Parameter restrictions can be built into the matrix X and the corresponding parameter vector β. The model is estimated using techniques for systems of regression equations.

The system estimation techniques such as 3SLS and FIML should be used for parameter estimation. It is called the Seemingly Unrelated Regression Estimation (SURE) in the current context. Denote b and S as the estimated β and Σ, respectively. Then,

b = [X'(S^-1⊗I)X]^-1X'(S^-1⊗I)Y
Var(b) = [X'(S^-1⊗I)X]^-1, and
S = ee'/T, where e = Y-Xb is the estimated error ε.

D_ij =	1 if (i-1)×T+1 ≤ j ≤ i×T
	0 otherwise

Panel Data

Table of Contents

The Model

Model Estimation

Extensions

Example: Cost Function

The Model

Fixed Effects Model

Model Estimation

Fixed Effects Model

Random Effects Model

Hausman's Test for Fixed or Random Effects

Extensions

Unbalanced Panel Data

Random Coefficients Model

Seemingly Unrelated System Model