EC 570/571: Topic 12

Simultaneous Linear Equations System

The Model

Notations
Model Representations
Model Assumptions

Estimation

Limited Information Estimation
1. Ordinary Least Squares
2. Two Stage Least Squares
3. Limited Information Maximum Likelihood
Variance-Covariance Matrix
1. Variance-Covariance Matrix Across Equations
2. Variance-Covariance Matrix of Parameters
Full Information Estimation
1. Three Stage Least Squares
2. Full Information Maximum Likelihood
  1. Instrumental Variables Method
  2. Linearized ML Method
  3. Newton Method

Introduction

Variables

Endogenous Variables
Predetermined Variables
1. Exogenous Variables
2. Lagged Variables (Dynamic Models)
  1. Lagged Endogenous Variables
  2. Lagged Exogenous Variables

Equations

Stochastic Equations
Identities

The Model

Notations

N	Number of observations (i=1,2,...,N)
G	Number of equations (endogenous variables) (j=1,2,...,G)
K	Number of predetermined variables (k=1,2,...,K)
G_j	Number of RHS endogenous variables in the equation j; G_j+1 is the number of endogenous variables in the equation j
K_j	Number of predetermined variables in the equation j
G_j^*	Number of endogenous variables not in the equation j G_j+G_j^*+1 = G
K_j^*	Number of predetermined variables not in the equation j K_j+K_j^* = K
Y	NxG Data matrix of endogenous variables
X	NxK Data matrix of predetermined variables
Z	Z=Y~X, Nx(G+K) Data matrix of all variables
B	GxG parameter (sparse) matrix associated with Y Note: B_jj = -1 (normalization)
Γ	KxG parameter (sparse) matrix associated with X
Δ	Δ=B\|Γ, (G+K)xG parameter (sparse) matrix associated with Z
U, V	NxG error matrices

Model Representations

Structural Model

YB + XΓ = U (or ZΔ = U)

Y_i.B + X_i.Γ = U_i. (i=1,2,...,N)

YB_.j + XΓ_.j = U_.j (j=1,1,...,G)

The equation specification may be expressed as

y_j = Y_jβ_j + X_jγ_j + ε_j

Note: ε_j = 0 if the j-th equation is an identity.

Special Cases:

(Block) Recursive Systems
B is a (block) triangular matrix
Seemingly Unrelated Systems
B = -I (negative identity matrix), or
y_j = X_jγ_j + ε_j (j=1,2,...G)
- VAR (Vector AutoRegression)
- Panel Data

Reduced Form

Y = XΠ + V

Y_i. = X_i.Π+V_i. (i=1,2,...,N)

Y_.j = XΠ_.j+V_.j (j=1,2,...,G)

Where Π = -ΓB^-1 is a KxG parameter matrix and V = UB^-1 is a NxG error matrix, derived from the structural model. Equivalently, for each equation j (j=1,2,...G), ΠB_.j = -Γ_.j

Model Assumptions

For each equation j=1,2,...,G
E(U_.j) = 0 (Nx1 zero vector)
Var(U_.j) = E(U'_.jU_.j) = σ²_jI
(NxN diagnal matrix)
For each observation i=1,2,...,N
E(U_i.) = 0 (1xG zero vector)
Var(U_i.) = E(U'_i.U_i.) = [σ_mn, m,n=1,2,...G] = Σ
(GxG positive definite symmetric matrix with the j-th diagnal element σ_jj = σ²_j)
Derived assumptions on the reduced form, for each observation i=1,2,...,N
E(V_i.) = E(U_i.B^-1) = 0 (1xG zero vector)
Var(V_i.) = B^-1'E(U'_i.U_i.)B^-1 = B^-1'ΣB^-1 = Ω
(GxG positive definite symmetric matrix)

Identification

Consider the j-th stochastic equation of a linear system model, its reduced form reprensentation y_j = Y_.j = XΠ_.j + V_.j can be estimated consistently using ordinary least squares. That is,

Π_.j = (X'X)^-1X'y_j

Given the parameter estimator of Π_.j for each equation j, can we derive or solve the corresponding structural paramters B_.j and Γ_.j through the non-linear relationship Π = -ΓB^-1?

The j-th stochastic equation is identified if the structural paramters B_.j and Γ_.j are derivable from the reduced form parameters in Π. An identity equation is automatically identified. A linear system model is identified if all the stochastic equations are identified.

Order Condition

From the structural and reduced form parameters relationship for the j-th equation ΠB_.j = -Γ_.j, or ΠB_.j + Γ_.j = 0:

[Π I]

⌈

⌊

B_.j

Γ_.j

⌉

⌋

= 0

Where Π (KxG parameter matrix)

I (KxK identity matrix)

Β_.j (Gx1 parameter vector)

Γ_.j (Kx1 parameter vector)

Since one element of B_.j is -1 (normalization), and many elements of B_.j and Γ_.j are 0 (zero restrictions), there are G_j+K_j unknowns must be solved from K rows of [Π, I]. In other words, there must be at least G_j+K_j equations to find a set of solution for the unknown elements of B_.j and Γ_.j. That is,

K ≥ G_j+K_j or K_j^* ≥ G_j.

Equivalently, K_j^* + G_j^* ≥ G-1, since G = G_j + G_j^* + 1.

Rank Condition

In more details, for the j-th equation, the parameter relationship ΠB_.j = -Γ_.j can be re-arranged as follows:

⌈

⌊

Π₁ Π₂ Π₃

Π₁^* Π₂^* Π₃^*

⌉

⌋

⌈

|

⌊

-1

β_j

0

⌉

|

⌋

= -

⌈

⌊

γ_j

0

⌉

⌋

Remember that y_j = Y_jβ_j + X_jγ_j + ε_j.

Where Π₁ (K_jx1 scalar)

Π₂ (K_jxG_j matrix)

Π₃ (K_jxG_j^* matrix)

Π₁^* (K_j^*x1 vector)

Π₂^* (K_j^*xG_j matrix)

Π₃^* (K_j^*xG_j^* matrix)

Solving β_j and γ_j from the reduced form parameters in Π can be accomplished from solving the following two set of equations:

Π₁ - Π₂β_j = γ_j
G_j+K_j unknowns (β_j and γ_j) in K_j equations.
Π₁^* - Π₂^*β_j = 0
G_j unknowns (β_j) in K_j^* equations.

From (2), Π₂^*β_j = Π₁^*. Solving β_j with the least squares: β_j = (Π₂^*'Π₂^*)^-1Π₂^*'Π₁^*, the full rank condition is required. That is,

rank([Π₁^* Π₂^*]) = rank(Π₂^*) = G_j.

Once β_j is solved, γ_j is obtained from (1).

In practice, the rank condition as derived is difficult to check because the dense matrix Π₂^* is not known prior estimation. The alternative method is to check the structural parameters in B and Γ in relation with the zeros restrictions for each equation. That is, for each equation j, there must exist a matrix of rank G-1 obtained from the non-zero coefficients appeared in the other equations but not in the jth equation.

Estimation

Limited Information Estimation

Consider the structural equation j, YB_.j + XΓ_.j = U_.j or y_j = Y_jβ_j + X_jγ_j + ε_j. Let's write the j-th equation for estimation as the following

y_j = Z_jδ_j + ε_j

where Z_j = [Y_j X_j] and δ_j = [β_j γ_j]. Denote d_j as the estimator of δ_j.

Ordinary Least Squares
d_j = (Z'_jZ_j)^-1Z'_jy_j
Var(d_j) = s²_j(Z'_jZ_j)^-1
s²_j = e'_je_j/N is the estimate of σ²_j;
e_j = y_j - Z_jd_j is the estimated residuals.
Note: the OLS estimator of δ_j (that is, d_j) is biased and inconsistent due to the random regressors problem (because in general there are RHS endogenous variables in the equation). The method of instrumental variables is recommended instead. The appropriate instrumental variables for the RHS endogenous variables can be constructed from the least squares estimator of Π_.j (that is, (X'X)^-1X'y_j), for the reduced form equation: y_j = Y_.j=XΠ_.j+V_.j.
Two Stage Least Squares
For the j-th equation, subsitituting the RHS endogenous variables Y_j with the instrumental variables X(X'X)^-1X'Y_j (that is, the fitted Y_j). Write the j-th equation for estimation as:
y_j = W_jδ_j + ε_j
where W_j = [X(X'X)^-1X'Y_j X_j]. Recall that Z_j = [Y_j X_j] and thus W'_jZ_j = W'_jW_j. Then the 2SLS estimator of δ_j is the following:
d_j = (W'_jZ_j)^-1W'_jy_j = (Z'_j[X(X'X)^-1X']Z_j)^-1Z'_j[X(X'X)^-1X']y_j
Var(d_j) = s²_j(W'_jZ_j)^-1 = s²_j(Z'_j[X(X'X)^-1X']Z_j)^-1
s²_j = e'_je_j/N, and
e_j = y_j - Z_jd_j
Note: the 2SLS estimator of δ_j (that is, d_j) does not take account of cross equation correlation although the instrumental variables are obtained from all the predetermined variables in the model.
Limited Information Maximum Likelihhod
Consider the reduced form revelant to the j-th structural equation: Y⁰_j = XΠ⁰_j + V⁰_j where Y⁰_j = [y_j Y_j] and Π⁰_j is the corresponding reduced form parameter matrix.
By assuming the normal distribution for the reduced form error matrix V⁰_j with zero mean and variance-covariance matrix Ω⁰_j, LIML estimator of δ is obtained from maximizing the log-likelihood function:
L(Π⁰_j) = -½ N ((G_j+1)log(2π) + log(|Ω⁰_j|) + trace[(Y⁰_j - XΠ⁰_j)'Ω⁰_j^-1(Y⁰_j - XΠ⁰_j])
subject to the identification constraint:
ΠB_.j = -Γ_.j
LIML estimator is the same as the least variance ratio estimator, which is a special case of k-class estimator.

Variance-Covariance Matrix

Variance-Covariance Matrix Across Equations
Let e = [e₁, e₂, ..., e_G], where the estimated residual is e_j = y_j - Z_jd_j (j=1,2,...,G). Then the estimated variance-covariance matrix for Σ is a GxG matrix defined as:
S = e'e/N.
Variance-Covariance Matrix of Parameters
Extending from the estimated variance-covariance matrix of parameters Var(d_j) for j=1,2,...,G, the variance-covariance matrix of all parameters is defined as:
Var(d) = [ s_mn(W'_mZ_m)^-1(W'_mZ_n)(W'_nZ_n)^-1, m,n=1,2,...,G ]
Where s_mn is the (m,n)-th element of S, and s_mm=s²_m, s_nn=s²_n.

Full Information Estimation

Limited information estimation techniques such as 2SLS and LIML does not take account of cross equation correlations embedded in the system as a whole.

Three Stage Least Squares
From the 2SLS estimation for the j-th equation: y_j = W_jδ_j + ε_j, where
W_j = [X(X'X)^-1X'Y_j X_j] and Z_j = [Y_j X_j] and δ_j = [β_j γ_j].

By stacking all the stochastic equations y_j = W_jδ_j + ε_j (j=1,2,...,G) as follows:

⌈

|

|

⌊

y₁

y₂

:

y_G

⌉

|

|

⌋

=

⌈

|

|

⌊

W₁ 0 .. 0

0 W₂ .. 0

: : : :

0 0 .. W_G

⌉

|

|

⌋

⌈

|

|

⌊

δ₁

δ₂

:

δ_G

⌉

|

|

⌋

+

⌈

|

|

⌊

ε₁

ε₂

:

ε_G

⌉

|

|

⌋

Write the above stacked-equation system as: y = Wδ + ε, where

y NGx1 data vector

W NGx(Σ_j=1,2,...,GG_j+K_j) data matrix

δ (Σ_j=1,2,...,GG_j+K_j)x1 parameter vector

ε NGx1 error vector

The error structure ε satisfies:

E(ε) = 0 and
Var(ε) = E(εε') = Σ⊗I = [σ_ijI (i,j=1,2,...,G)]

ε is clearly heterogeneous and correlated across equations. Denote d as the Generalized Least Squares (GLS) estimator of δ. Then

d = [W'(S^-1⊗I)Z]^-1W'(S^-1⊗I)y

=

⌈

|

|

⌊

s¹¹W₁'Z₁ s¹²W₁'Z₂ .. s^1GW₁'Z_G

s²¹W₂'Z₁ s²²W₂'Z₂ .. s^2GW₂'Z_G

: : : :

s^G1W_G'Z₁ s^G2W_G'Z₂ .. s^GGW_G'Z_G

⌉

|

|

⌋

^-1

⌈

|

|

⌊

∑_j=1,2,...,G s^1jW₁'y_j

∑_j=1,2,...,G s^2jW₂'y_j

:

∑_j=1,2,...,G s^GjW_G'y_j

⌉

|

|

⌋

Var(d) = [W'(S^-1⊗I)Z]^-1

S = e'e/N is the estimated variance-covariance matrix Σ, where e = [e₁, e₂, ..., e_G] and the estimated residual is e_j = y_j - Z_jd_j (j=1,2,...,G). Furthermore, S^-1 denotes the inverse of S with the element s^jk (j,k=1,2,...,G).

Note: Since S^-1 depends on d, iterations of 3SLS may be performed until convergence.

Full Information Maximum Likelihood
Assuming normal distribution of the serially independent residuals with zero mean and positive definite variance-covariance matrix Σ, the concentrated log-likelihood function for the system model YB + XΓ = U is

L^*(B,Γ) = -½ NG(1 + log(2π)) + N log(|B|) - ½ N log(|(YB+XΓ)'(YB+XΓ)|/N)

Since log(|B|) = ½ log(|B'Y'YB|) - ½ log(|Y'Y|), we can also write

Instrumental Variables Method
FIML estimator using IV method is obtained by maximizing
L^*1(B,Γ) = N log(|B|) - ½ N log(|(YB+XΓ)'(YB+XΓ)/N|)

The first derivatives of L^*1(B,Γ) are used to set up the normal equations similar to the iterative 3SLS estimation. Let S = |(YB+XΓ)'(YB+XΓ)|/N, the normal equations for maximizing L^*1(B,Γ) are:

∂L^*1/∂B = NB'^-1 - Y'(YB+XΓ)S^-1 = 0
∂L^*1/∂Γ = -X'(YB+XΓ)S^-1 = 0

By subsitituting out N, combining terms, and using the parameter restrictions Π = -ΓB^-1 in the first equation, it can be re-written as follows:

∂L^*1/∂B = - Π'X'(YB+XΓ)S^-1 = 0

Together with the second equation, the normal equation in matrix form looks like this:

[XΓB^-1 X]'(YB+XΓ)S^-1 = 0

We need to re-arrange the equations and parameters, and define W_j^* = [(-XΓB^-1)_j X_j] and write the typical j-th equation: y_j = W_j^*δ_j + ε_j (j=1,2,...,G). The corresponding stacked-equations system y = W^*δ + ε:

⌈

|

|

⌊

y₁

y₂

:

y_G

⌉

|

|

⌋

=

⌈

|

|

⌊

W₁^* 0 .. 0

0 W₂^* .. 0

: : : :

0 0 .. W_G^*

⌉

|

|

⌋

⌈

|

|

⌊

δ₁

δ₂

:

δ_G

⌉

|

|

⌋

+

⌈

|

|

⌊

ε₁

ε₂

:

ε_G

⌉

|

|

⌋

As in the 3SLS, the FIML estimator for δ is:

d = [W^*'(S^-1⊗I)Z]^-1W^*'(S^-1⊗I)y

=

⌈

|

|

⌊

s¹¹W₁^*'Z₁ s¹²W₁^*'Z₂ .. s^1GW₁^*'Z_G

s²¹W₂^*'Z₁ s²²W₂^*'Z₂ .. s^2GW₂^*'Z_G

: : : :

s^G1W_G^*'Z₁ s^G2W_G^*'Z₂ .. s^GGW_G^*'Z_G

⌉

|

|

⌋

^-1

⌈

|

|

⌊

∑_j=1,2,...,G s^1jW₁^*'y_j

∑_j=1,2,...,G s^2jW₂^*'y_j

:

∑_j=1,2,...,G s^GjW_G^*'y_j

⌉

|

|

⌋

Var(d) = [W^*'(S^-1⊗I)Z]^-1

S = e'e/N and e = [e₁, e₂, ..., e_G] with e_j = y_j - Z_jd_j (j=1,2,...,G).

Linearized ML Method
FIML estimator using the linearized ML method is obtained by maximizing
L^*2(B,Γ) = log(|B'Y'YB|/N) - log(|(YB+XΓ)'(YB+XΓ)|/N)

Let Q = |B'Y'YB|/N and S = |(YB+XΓ)'(YB+XΓ)|/N, then the normal equations for maximizing L^*2(B,Γ) are:

∂L^*2/∂B = Y'YBQ^-1 - Y'(YB+XΓ)S^-1 = 0
∂L^*2/∂Γ = -X'(YB+XΓ)S^-1 = 0

By re-arranging the equations and parameters, and let Z_j = [Y_j X_j] and Z⁰_j = [Y_j 0].

Then the FIML estimator of δ is derived from the following

d = [Z'(S^-1⊗I)Z - Z'⁰(Q^-1⊗I)Z⁰]^-1 [Z'(S^-1⊗I)y - Z'⁰(Q^-1⊗I)y]

Where S = e'e/N, e = [e₁, e₂, ...,e_G], and e_j = y_j - Z_jd_j (j=1,2,...G). Similarly, Q = e⁰'e⁰/N, e⁰ = [e⁰₁, e⁰₂, ...,e⁰_G], and e⁰_j = y_j - Z⁰_jd_j (j=1,2,...,G).

Newton Method
Both the first derivatives (gradient) and second derivatives (hessian) of L^*2(B,Γ) are used in the iterative estimation.

Applications

Dynamic Model Simulation

Deriving from the structural from: YB + XΓ = U, the reduced form is Y = XΠ + V where Π = -ΓB^-1 and V = UB^-1. Since the predetermined variables X may include lagged endogenous variables and current and lagged exogenous variables, we can write:

Y = Y_-1Π₁ + XΠ₂ + V

From now on, X denotes the data matrix of current and lagged exogenous variables and Y_-1 includes lagged endogenous variables. Then,

Π =

⌈

⌊

Π₁

Π₂

⌉

⌋

Impact Multipliers: Π₂
Dynamic Multipliers: Π₂Π₁, Π₂Π₁², ..., Π₂Π₁^t, ...
Equilibirum Multipliers: Π₂[I+Π₁+Π₁²+...] = Π₂[I-Π₁]^-1

The stability of the model requires that the characteristic roots of Π₁ lie inside the unit circle. A plot of the period (dynamic) multipliers against the lag length is call the Impulse Response Function.

Example: Klein's Model I

The Model
C_t = α₀ + α₁P_t + α₂P_t-1 + α₃(W1_t+W2_t) + ε_1t
I_t = β₀ + β₁P_t + β₂P_t-1 + β₃K_t-1 + ε_2t
W1_t = γ₀ + γ₁X_t + γ₂X_t-1 + γ₃A_t + ε_3t
X_t = C_t + I_t + G_t
P_t = X_t - T_t - W1_t
K_t = K_t-1 + I

Variables and Equations
Y = [C I W1 X P K]
X = [X_-1 P_-1 K_-1 W2 G T A 1]
U = [-ε₁ -ε₂ -ε₃ 0 0 0]

Structural Form: YB + XΓ = U

B =

⌈

|

|

|

|

⌊

-1 0 0 1 0 0

0 -1 0 1 0 1

α₃ 0 -1 0 -1 0

0 0 γ₁ -1 1 0

α₁ β₁ 0 0 -1 0

0 0 0 0 0 -1

⌉

|

|

|

|

⌋

Γ =

⌈

|

|

|

|

|

|

⌊

0 0 γ₂ 0 0 0

α₂ β₂ 0 0 0 0

0 β₃ 0 0 0 1

α₃ 0 0 0 0 0

0 0 0 1 0 0

0 0 0 0 -1 0

0 0 γ₃ 0 0 0

α₀ β₀ γ₀ 0 0 0

⌉

|

|

|

|

|

|

⌋

Where	Π	(KxG parameter matrix)
	I	(KxK identity matrix)
	Β_.j	(Gx1 parameter vector)
	Γ_.j	(Kx1 parameter vector)

Where	Π₁	(K_jx1 scalar)
	Π₂	(K_jxG_j matrix)
	Π₃	(K_jxG_j^* matrix)
	Π₁^*	(K_j^*x1 vector)
	Π₂^*	(K_j^*xG_j matrix)
	Π₃^*	(K_j^xG_j^ matrix)

y	NGx1 data vector
W	NGx(Σ_j=1,2,...,GG_j+K_j) data matrix
δ	(Σ_j=1,2,...,GG_j+K_j)x1 parameter vector
ε	NGx1 error vector