For each data observation i=1,2,...,N, define the dummy variable:
Di = | 0 if absence of a qualitative attribute describing the category (base case); |
1 if presence of a qualitative attribute describing the category. |
Using the dummy variable D in a simple regression model
Yi = α + βXi + εi,
i=1,2,...,N:
Yi = α + βXi + εi | if Di = 0 (base case) |
Yi = (α+γ) + βXi + εi | if Di = 1 |
Yi = α + βXi + εi | if Di = 0 (base case) |
Yi = α + (β+δ)Xi + εi | if Di = 1 |
Yi = α + βXi + εi | if Di = 0 (base case) |
Yi = (α+γ) + (β+δ)Xi + εi | if Di = 1 |
A regression model may include more than one category. In general, there are M categories (such as gender, race, occupation status, etc.). The model may be written as:
Yi = ∑kβkXik
+ ∑m∑kδmkDimXik
+ εi
i = 1,2,...,N (Number of observations)
k = 1,2,...,K (Number of quantitative explanatory variables)
m = 1,2,...,M (Number of qualitative attributes describing M categories)
Or, in a matrix form representation:
Y = Xβ + (DX)δ + ε, where
(DX)NxMK
= [D1X1,...,D1XK,D2X1,...,D2XK,...,DMX1,...,DMXK]
δMKx1
= [δ11,...,δ1K,...,δ21,...,δ2K,...,δM1,...,δMK]', and
DmXk = |
|
|
| , m=1,2,...,M; k=1,2,...,K |
The degrees of freedom of the above model is N-(M+1)K.
If there are multiple dummy variables included in a regression equation, the problem of Dummy Variable Trap may result from the perfect colinearity of dummy variables with the explanatory variables (including constant term). In general, one less dummy variable than the number of qualitative attributes describing the categories should be included.
Consider a regression model with K explanatory variables in which J of them are dummy variables with the parameter vector δ. To test the statistical significance of the J dummy variables (J < K):
H0: | δ = 0 |
H1: | δ ≠ 0 |
The following test statistics may be used
W = d'[Var(d)]-1d ~ χ2(J)
F = W/J = ((RSS*-RSS)/J) / (RSS/(N-K))
= ((R2-R2*)/J) / ((1-R2)/(N-K)) ~ F(J,N-K)
Where d is the least sqaures estimator of δ, which is a subvector of estimated K paramters. RSS and RSS* are the sums of squared residuals of the unrestricted and restricted models, respectively. Similarly, R2 and and R2* are the R-squares of the unrestricted and restricted models, respectively.
Define D = [Q1 Q2 Q3] (Q4 is assumed to be the base case), where
Q1t = | 1 if t is the first calendar quarter (Spring) |
0 otherwise | |
Q2t = | 1 if t is the second calendar quarter (Summer) |
0 otherwise | |
Q3t = | 1 if t is the third calendar quarter (Autumn) |
0 otherwise | |
Q4t = | 1 if t is the fourth calendar quarter (Winter) |
0 otherwise |
Define D = [M2 ... M12] (M1 is assumed to be the base case), where
M1t = | 1 if t is the first calendar month (January) |
0 otherwise | |
M2t = | 1 if t is the second calendar month (Feburary) |
0 otherwise | |
: | |
: | |
M12t = | 1 if t is the twelfth calendar month (December) |
0 otherwise |
Yi = | Xiβ(1) + εi, | i=1,2,...,N1 |
Yj = | Xjβ(2) + εj, | j=N1+1,N1+2,...,N1+N2(=N) |
If β(1)≠β(2), the model is said to have different structure for the two subsamples.
H0: | β(1) = β(2) |
H1: | β(1) ≠ β(2) |
Di = | 1 if 1 ≤ i ≤ N1 |
0 otherwise |
H0: | δ = 0 |
H1: | δ ≠ 0 |
Yi = | Xiβ(1) + εi, | i=1,2,...,N1 |
Yj = | Xjβ(2) + εj, | j=N1+1,N1+2,...,N1+N2 |