Classical Linear Model

Dummy Variables and Structural Change

Dummy Variable

Dummy Variable for Single Category
For each data observation i=1,2,...,N, define the dummy variable:

D_i = 0 if absence of a qualitative attribute describing the category (base case);

1 if presence of a qualitative attribute describing the category.

Using the dummy variable D in a simple regression model
Y_i = α + βX_i + ε_i, i=1,2,...,N:
1. Additive Model
  Y_i = α + βX_i + γD_i + ε_i
  That is,
  
  Y_i = α + βX_i + ε_i if D_i = 0 (base case)
  
  Y_i = (α+γ) + βX_i + ε_i if D_i = 1
2. Multiplicative (Interaction) Model
  Y_i = α + βX_i + δD_iX_i + ε_i
  That is,
  
  Y_i = α + βX_i + ε_i if D_i = 0 (base case)
  
  Y_i = α + (β+δ)X_i + ε_i if D_i = 1
3. Combined (Interaction) Model
  Y_i = α + βX_i + γD_i + δD_iX_i + ε_i
  That is,
  
  Y_i = α + βX_i + ε_i if D_i = 0 (base case)
  
  Y_i = (α+γ) + (β+δ)X_i + ε_i if D_i = 1
Dummy Variables for Multiple Categories
A regression model may include more than one category. In general, there are M categories (such as gender, race, occupation status, etc.). The model may be written as:
Y_i = ∑_kβ_kX_ik + ∑_m∑_kδ_mkD_imX_ik + ε_i
i = 1,2,...,N (Number of observations)
k = 1,2,...,K (Number of quantitative explanatory variables)
m = 1,2,...,M (Number of qualitative attributes describing M categories)
Or, in a matrix form representation:
Y = Xβ + (DX)δ + ε, where
(DX)_NxMK = [D₁X₁,...,D₁X_K,D₂X₁,...,D₂X_K,...,D_MX₁,...,D_MX_K]
δ_MKx1 = [δ₁₁,...,δ_1K,...,δ₂₁,...,δ_2K,...,δ_M1,...,δ_MK]', and

D_mX_k =

⌈

|

|

⌊

D_1mX_1k

D_2mX_2k

:

D_NmX_Nk

⌉

|

|

⌋

, m=1,2,...,M; k=1,2,...,K

The degrees of freedom of the above model is N-(M+1)K.
If there are multiple dummy variables included in a regression equation, the problem of Dummy Variable Trap may result from the perfect colinearity of dummy variables with the explanatory variables (including constant term). In general, one less dummy variable than the number of qualitative attributes describing the categories should be included.
Hypothesis Testing of the Significance of Dummy Variables
Consider a regression model with K explanatory variables in which J of them are dummy variables with the parameter vector δ. To test the statistical significance of the J dummy variables (J < K):

H₀: δ = 0

H₁: δ ≠ 0

The following test statistics may be used
W = d'[Var(d)]^-1d ~ χ²(J)
F = W/J = ((RSS^*-RSS)/J) / (RSS/(N-K)) = ((R²-R^2*)/J) / ((1-R²)/(N-K)) ~ F(J,N-K)
Where d is the least sqaures estimator of δ, which is a subvector of estimated K paramters. RSS and RSS^* are the sums of squared residuals of the unrestricted and restricted models, respectively. Similarly, R² and and R^2* are the R-squares of the unrestricted and restricted models, respectively.

An Example: Seasonality

Quarterly Time Series

Define D = [Q1 Q2 Q3] (Q4 is assumed to be the base case), where

Q1_t = 1 if t is the first calendar quarter (Spring)

0 otherwise

Q2_t = 1 if t is the second calendar quarter (Summer)

0 otherwise

Q3_t = 1 if t is the third calendar quarter (Autumn)

0 otherwise

Q4_t = 1 if t is the fourth calendar quarter (Winter)

0 otherwise

Monthly Time Series

Define D = [M2 ... M12] (M1 is assumed to be the base case), where

M1_t = 1 if t is the first calendar month (January)

0 otherwise

M2_t = 1 if t is the second calendar month (Feburary)

0 otherwise

:

:

M12_t = 1 if t is the twelfth calendar month (December)

0 otherwise

Structural Change

A regression model may have different values of estimated parameters over several different subsamples. Suppose there are two non-overlapping subsamples with N₁ and N₂ observations respectively. Let N = N₁+N₂, and consider the following two separate regression equations:

Y_i = X_iβ⁽¹⁾ + ε_i, i=1,2,...,N₁

Y_j = X_jβ⁽²⁾ + ε_j, j=N₁+1,N₁+2,...,N₁+N₂(=N)

If β⁽¹⁾≠β⁽²⁾, the model is said to have different structure for the two subsamples.

Hypothesis Testing of the Structural Change

H₀:	β⁽¹⁾ = β⁽²⁾
H₁:	β⁽¹⁾ ≠ β⁽²⁾

Dummy Variable Approach
1. Assume β⁽¹⁾ = β⁽²⁾ = β and estimate the regression model Y = Xβ + ε.
  Compute sum of squares residuals RSS^* with degrees of freedom DF^*=N-K.
2. Define the dummy variable:
  
  D_i = 1 if 1 ≤ i ≤ N₁
  
  0 otherwise
  
  Formulate the unrestricted model with K dummy variables DX = D*X as follows:
  Y = Xβ + (DX)δ + ε
  Estimate the model and compute RSS with DF=N-2K.
3. Test for the significance of K dummy variables DX:
  
  H₀: δ = 0
  
  H₁: δ ≠ 0
  
  Based on d, the estimated parameter vector of δ, F-test statistic is defined by
  F = d'[Var(d)]^-1d/K = ((RSS^*-RSS)/K) / (RSS/(N-2K)) ~ F(K,N-2K).
Chow F-Test
1. From the restricted model:
  Y_i= X_iβ + ε_i, i=1,2,...,N,
  compute RSS^* with DF^*=N-K.
2. From the unrestricted model:
  
  Y_i = X_iβ⁽¹⁾ + ε_i, i=1,2,...,N₁
  
  Y_j = X_jβ⁽²⁾ + ε_j, j=N₁+1,N₁+2,...,N₁+N₂
  
  compute RSS₁ with DF₁=N₁-K, and RSS₂ with DF₂=N₂-K.
3. Define RSS = RSS₁+RSS₂ with DF = DF₁+DF₂ = N-2K.
  Then the Chow F-test statistic is defined by:
  F = ((RSS^*-RSS)/K) / (RSS/(N-2K)) ~ F(K,N-2K).
Copyright © Kuan-Pin Lin
Last updated: November 10, 2009

D_i =	0 if absence of a qualitative attribute describing the category (base case);
	1 if presence of a qualitative attribute describing the category.

Y_i = α + βX_i + ε_i	if D_i = 0 (base case)
Y_i = (α+γ) + βX_i + ε_i	if D_i = 1

Y_i = α + βX_i + ε_i	if D_i = 0 (base case)
Y_i = α + (β+δ)X_i + ε_i	if D_i = 1

Y_i = α + βX_i + ε_i	if D_i = 0 (base case)
Y_i = (α+γ) + (β+δ)X_i + ε_i	if D_i = 1

Q1_t =	1 if t is the first calendar quarter (Spring)
	0 otherwise
Q2_t =	1 if t is the second calendar quarter (Summer)
	0 otherwise
Q3_t =	1 if t is the third calendar quarter (Autumn)
	0 otherwise
Q4_t =	1 if t is the fourth calendar quarter (Winter)
	0 otherwise

M1_t =	1 if t is the first calendar month (January)
	0 otherwise
M2_t =	1 if t is the second calendar month (Feburary)
	0 otherwise
:
:
M12_t =	1 if t is the twelfth calendar month (December)
	0 otherwise

Y_i =	X_iβ⁽¹⁾ + ε_i,	i=1,2,...,N₁
Y_j =	X_jβ⁽²⁾ + ε_j,	j=N₁+1,N₁+2,...,N₁+N₂(=N)