EC 571 Advanced Econometrics

Supplemental Notes


Appendix 1: Stability of a Dynamic Model

The stability of a dynamic model hinges on the characteristic equation for the autoregressive part of the model. The roots of the characteristic equation:

1 - r1B - r2B2 - ... - rpBp = 0

must be great than 1 in absoulte value for the model to be stable.

For example, consider the AR(1) model. The characteristic equation is 1 - r1B = 0. The single root of this equation is B = 1/r1, which is greater than 1 in absolute value if |r1| < 1. Similarly, for an AR(2) model, the two roots of the characteristic equation 1 - r1B - r2B2 = 0 are B1,B2 = [r1±Ö(r12+4r2)]/2. Therefore, the stability conditions are:

A more general AR(p) model may be represented by VAR(1):

é
ê
ê
ë
Yt
Yt-1
:
Yt-p+1
ù
ú
ú
û
=
é
ê
ê
ë
a
0
:
0
ù
ú
ú
û
+
é
ê
ê
ë
r1 r2.. rp
10..0
::::
0..10
ù
ú
ú
û
é
ê
ê
ë
Yt-1
Yt-2
:
Yt-p
ù
ú
ú
û
+
é
ê
ê
ë
et
0
:
0
ù
ú
ú
û

That is, Yt = a + r Yt-1 + et

By successive substitution, we obtain Yt = a + ra + r2a + ... (so that the equilibrium Y¥ = (I-r)-1a).

The roots of the asymmetric matrix r may be complex in the form a±bi, where i=Ö(-1). The stability requires that all the roots of r must be less than 1 in absolute value. That is, |a+bi| = Ö(a2+b2) < 1.

The unit circle refers to the two-dimentional set of values of a and b defined by a2+b2=1, which defines a circle centered at the origin with radius 1. Therefore, for a stable dynamic model, the roots of the characteristic equation

1 - r1B - r2B2 - ... - rpBp = 0

which are the the reciprocals of the characteristic roots of the matrix r must lie outside the unit circle.


Appendix 2: Frequency Domain Representation of Time Series

For many disaggregated microeconomic data, usually observed at greater frequency, the frequency domain representation of the time series process (or spectral analysis) is useful. In this framework, we view an observed time series as a weighted sum of underlying series that have different cyclical patterns (e.g., seasonality, business cycle). For autocovariances, in the time domain, we study variations as a function of time. In the frequency domain, the variances of a time series is studied as a function of frequency or wave length of the variation.

Let Y = {Yt}t=-¥,...,¥ be a covariance stationary process with mean m = E(Yt) and j-th autocovariance gj = E[(Yt-m)(Yt-j-m), j=0,1,2.... We assume gj = g-j, and gj is absolutely summable or åj=0,...,¥ |gj| < ¥.

Autocovariance Generating Function

The autocovariance generating function for the time series process Y is

gY(z) = åj=-¥,...,¥ gjzj

where z denotes a complex scalar.

We note that a complex number can be represented in a two-dimentional (x,y)-space such as

z = x + y i,     where i = Ö(-1)

or in the equivalent polar coordinates c (radius) and w (angle):

c = (x2+y2)½
x = c cos(w)
y = c sin(w)
z = c [cos(w) + i sin(w] = c eiw

Examples

Spectral Density Function (Spectrum)

We now evaluate the autocovariance generating function gY(z) at the complex value z = e-iw (on unit circle with negative angle -w) and divide it by 2p:

sY(w) = gY(e-iw)/(2p) = 1/(2p) åj=-¥,...,¥ gje-iwj

where w is a real number.

sY(w) is the spectrum or spectral density function for the time series process Y. In other words, for a time series process Y that has the set of autocovariances gj, the spectral density can be computed at any particular value of w. The spectrum contains no new information beyond that in the autocovariances.

Examples

Consider the following facts:

  1. gj = g-j (Symmetry of the autocovariances)
  2. exp(±iwj) = cos(wj) ± i sin(wj) (DeMoivre's theorem)
    Therefore, exp(iwj) + exp(-iwj) = 2 cos(wj), which is always real.
  3. cos(0) = 1, cos(p) = 0, sin(0) = 0, sin(p) = 1
  4. cos(-w) = cos(w), sin(-w) = -sin(w)

The spectral density function can be simplied as:

sY(w) = 1/(2p) [g0+2åj=1,...,¥ gjcos(wj)],     for wÎ[0,p]

This is a strictly real-valued, continuous function of w. We have sY(w) = sY(-w) and sY(w) = sY(w+M2p) for any integer M. That is, sY(w) is fully defined for wÎ[0,p].

There is also a correspondence between the spectrum and the aucovariances:

gj = ò-pp sY(w)eiwjdw = ò-pp sY(w)cos(wj)dw

In particular, g0 = ò-pp sY(w)dw = 2 ò0p sY(w)dw.

Therefore, spectral analysis can be used to decompose the variance of a time series, which can be viewed as the sum of the spectral densities over all possible frequencies. For example, consider integration over only some of the frequencies:

t(wk) = (2/g0) ò0wk sY(w)dw, where 0 < wk £ p.

Thus, 0 < t(wk) £ 1 is interpreted as the proportion of the total variance of the time series that is associated with frequencies less than or equal to wk.

Spectral Representation Theorem

Any covariance stationary time series process can be expressed in the form:

Yt = m + ò0p [a(w) cos(wt) + d(w) sin(wt)] dw

where a(w) and d(w) are random variables, for any fixed frequency w in [0 p], with the following properties:

Sample Periodogram

For any given w, we can construct the sample analog of population spectrum, which is known as the sample periodogram.

Given an observed sample of N observations {Y1,Y2,...,YN}, using the same notation m for the sample mean:

m = åt=1,...,NYt/N

we can calculate the sample autocorvariances gj (j = 0,1,2,...,N-1) as follows:

gj = åt=j+1,...,N (Yt-m)(Yt-j-m)/N

We set gj = g-j. The sample periodogram is defined by

sY(w) = 1/(2p) åj=-N+1,...,N-1 gje-iwj
= 1/(2p) [g0+2åj=1,...,N-1 gjcos(wj)]

The area under the periodogram is the sample variance of Yt:
g0 = åt=1,...,N (Yt-m)2/N
= ò-pp sY(w)dw
= 2 ò0p sY(w)dw

For a time series Yt, we observe N periods. That is, {Y1,Y2,..., YN}. Since a wave cycle is completed in 2p radians. Therefore each period should correspond to 2p/N radians (frequency). We let

w1 = 2p/N
w2 = 4p/N
...
wM = 2Mp/N

The highest frequency is obtained at M = (N-1)/2. That is, (N-1)p/N < p.

Sample Spectral Representation Theorem

Given any N observations on a time series process {Y1,Y2,...,YN}, there exist frequencies {w1,w1,...,wM} and coefficients m, {a1,a2,...,aM}, {a1,d2,...,dM} such that

Yt = m + åk=1,...,M {akcos[wk(t-1)] + dksin[wk(t-1)]}

where akcos[wk(t-1)] is orthogonal to ajcos[wj(t-1)] for k¹j, dksin[wk(t-1)] is orthogonal to djsin[wj(t-1)] for k¹j, and akcos[wk(t-1)] is orthogonal to djsin[wj(t-1)] for all k and j. Furthermore,

m = åt=1,...,NYt/N
ak = (2/N) åt=1,...,N Yt cos[wk(t-1)], k=1,2,...,M
dk = (2/N) åt=1,...,N Yt sin[wk(t-1)], k=1,2,...,M

The sample variance of Yt can be expressed as

g0 = åt=1,...,N (Yt-m)2/N = (1/2) åk=1,...,M (ak2+dk2)

The portion of the sample variance of Yt that can be attributed to cycles of frequency wk is given by:

(1/2) (ak2+dk2) = (4p/N) sY(wk)

where sY(wk) is the sample perodogram at frequency wk.

Equivalently,

sY(wk) = [N/(8p] (ak2 + dk2]
= [1/(2pN)] { {åt=1,...,N Yt cos[wk(t-1)]}2 + {åt=1,...,N Yt sin[wk(t-1)]}2 }


Appendix 3: Vector Autoregressive Model

VAR(1) Model

Generalizing from the univariate time series AR(1) model:

Yt = d + rYt-1 + et

the mutivariate system of G variables can be written as follows:

Yit = di + åj=1,2,...,G rijYj,t-1 + eit (i=1,2,...,G)

This is called Vector AutoRegressive of order 1, or VAR(1). The matrix representation of the model as a simultaneous linear equations system looks like this:

[Y1t,Y2t,...,YGt] = [d1,d2,...,dG]

    + [Y1,t-1,Y2,t-1,...,YG,t-1]
é
ê
ê
ë
r11r21..rG1
r12r22..rG2
::::
r1Gr2G..rGG
ù
ú
ú
û
+ [e1,e2,...,eG]
The alternative is the stacked form suitable for estimation as a system of regression equations:

é
ê
ê
ë
Y1t
Y2t
..
YGt
ù
ú
ú
û
=
é
ê
ê
ë
d1
d2
..
dG
ù
ú
ú
û
+
é
ê
ê
ë
r11r12..r1G
r21r22..r2G
::::
rG1rG2..rGG
ù
ú
ú
û
é
ê
ê
ë
Y1,t-1
Y2,t-1
..
YG,t-1
ù
ú
ú
û
+
é
ê
ê
ë
e1t
e2t
:
eGt
ù
ú
ú
û

In a shorthand notation,

Yt = d + r Yt-1 + et
(Gx1) (Gx1) (GxG)(Gx1) (Gx1)

with the following assumptions:

Example: Representing AR(p) as VAR(1)

First, we can write the univariate AR(p) model as the system:

Yt = d + r1Yt-1 + r2Yt-2 + ... +rpYt-p + et
Yt-1 = Yt-1
Yt-2 = Yt-2
:
Yt-p+1 = Yt-p+1

Or,

é
ê
ê
ë
Yt
Yt-1
:
Yt-p+1
ù
ú
ú
û
=
é
ê
ê
ë
d
0
:
0
ù
ú
ú
û
+
é
ê
ê
ë
r1 r2.. rp
10..0
::::
0..10
ù
ú
ú
û
é
ê
ê
ë
Yt-1
Yt-2
:
Yt-p
ù
ú
ú
û
+
é
ê
ê
ë
et
0
:
0
ù
ú
ú
û

That is,

Yt = d + r Yt-1 + et
(px1) (px1) (pxp)(px1) (px1), t=p+1,...,N

This is a system of p equations with restricted parameters matrix r (pxp). The first equation is stochastic, while others are identities. The usable time series observations are from p+1 to N (N-p in total).

VAR(p) Model

For the multivariate VAR(p) system, the model is expressed in terms of the stacked G endogenous variables Yt = [Y1t,Y2t,...,YGt]':

Yt = d + r1 Yt-1 + ... + rp Yt-p + et

where Yt, Yt-1, ..., and Yt-p are Gx1 vectors, and rs are GxG coefficient matrix. We can represent this model by a big VAR(1) system in the same way we represent AR(p) as a VAR(1):

Yt = d + r Yt-1 + et
(Gpx1) (Gpx1) (GpxGp)(Gpx1) (Gpx1), t=p+1,...,N

The size of the problem is (N-p)Gp with the first set of G stochastic equations. Then the parameter matrix r of the lag variable Yt-1 is

r =
é
ê
ê
ë
r1r2....rp
I0....0
0I::0
00..I0
ù
ú
ú
û

where, for each k = 1,2,...,p, rk = [rij,k (i,j=1,2...,G)]. Furthermore, I is GxG identity matrix, and 0 is GxG zeros matrix.

It is of particular interest to examine the first set of G (stochastic) equations:

Yt = d + rYt-1 + et
(Gx1) (Gx1) (GxGp)(Gpx1) (Gx1), t=p+1,...,N

where r = [r1,r2,...,rp]. The model can be easily generalized to include K exogenous variables Xt as follows:

Yt = d + rYt-1 + bXt + et
(Gx1) (Gx1) (GxGp)(Gpx1) (GxK)(Kx1) (Gx1), t=p+1,...,N

with the following assumptions:

Example: Granger Causality

A two-variable VAR model can be used to test the causality in Granger sense for a pair of variables. The question is Does X Granger cause Y or Y Granger cause X?

Let X and Y are expressed in deviation form, and denote:
X ® Y   X Granger cause Y
Y ® X   Y Granger cause X
X « Y   X Granger cause Y and
  Y Granger cause X (Feedback)

Does X Granger Cause Y?Does Y Granger Cause X?
HypothesisH0: X does not cause Y
H1: X causes Y
H0: Y does not cause X
H1: Y causes X
Unrestricted
Model
Yt = åj=1,2,...,majYt-j + åk=1,2,...,nbkXt-k + et Xt = åj=1,2,...,majYt-j + åk=1,2,...,nbkXt-k + et
Restricted
Model
Yt = åj=1,2,...,majYt-j + et Xt = åk=1,2,...,nbkXt-k + et
Test
Statistic
F = ((RSSR-RSSUR)/n) /
(RSSUR/(N-m-n))
F = ((RSSR-RSSUR)/m) /
(RSSUR/(N-m-n))
Granger
Causality
Test
If F³Fc(n,N-n-m) then reject H0. That is, X does cause Y.    
Otherwise, X does not cause Y.
If F³Fc(m,N-n-m) then reject H0. That is, Y does cause X.    
Otherwise, Y does not cause X.
Conclusion
X ® YReject H0Not Reject H0
Y ® XNot Reject H0Reject H0
X « YReject H0Reject H0

The better approach is to estimate the system of two equations as follows:

Yt = åj=1,2,...,majYt-j + åk=1,2,...,nbkXt-k + et
Xt = åj=1,2,...,majYt-j + åk=1,2,...,nbkXt-k + et

If X ® Y, all a's equal to 0.
If Y ® X, all b's equal to 0.

Impulse Response Functions

Deriving from a general VAR(1) system, Yt = d + r Yt-1 + et, we write:

[I-rB]Yt = d + et

where B is the backshift operator. Then,

Yt = [I-r]-1d + åi=0,1,...,¥ riet-i

= Y* + (et + r1et-1 + r2et-2 + ...)

For the G stochastic equations system, the parameters of interest is ri[1:G,1:G] = [rkji, k,j=1,2,...,G], i=1,2,...

Y* is the equilibrium and et is the innovation. By shocking one element of et, says ejt, Yt will move away from the equilibrium Y*. Note that the effect of Yt due to change of ejt is not just on the jth variable alone but also on other variables in the system. The path whereby the variables returns to equilibirum is called the Impulse Responses of a stable VAR system. The Impulse Response Function traces the effects of a one-time innovation ejt on the k-th variable over time (i=0,1,2,...) as measured by rikj (k,j = 1,2,...,G).

Orthogonal Impulse Response Functions

Because of the assumption E(etet') = S, et are contemporaneously correlated. A shock to one variable is likely to be accompanied by shocks to some of the other variables, so we cannot assume that everything else is held constant when one element of et is changed. Therefore the impulse response functions do not have the causal interpretation.

Let S = P'P (Cholesky decomposition, for example), and define et = P-1et. Then E(etet') = I. We can use P-1 to orthogonalize the et so that

Yt = [I-r]-1d + åi=0,2...,¥ r*iet-i

= Y* + (et + r*1et-1 + r*2et-2 + ...)

where r*i = riP (i=0,1,2,...).

Choosing a P is very similar to placing identification restrictions on a system of dynamic simultaneous equations. Given the chosen matrix P, et would be mutually orthogonal implying that r*i would have the usual causal interpretation: the element rkj*i of r*i[1:G,1:G] gives the effect of a one-time unit change in the j-th element of et on the k-th element of Yt after i periods, holding everything else constant.

Structural VAR(1) Model

Consider the general representation of the model,

Yt = d + r Yt-1 + et

A short-run structural VAR model can be written as

A(Yt - d - r Yt-1) = Aet = Bet

or, Yt - d - r Yt-1 = et = A-1Bet

where A and B are GxG nonsiguar matrices of parameters to be estimated, et is a Gx1 vector of disturbances with et ~ N(0,I), and E(etet') = 0. Sufficient constraints must be placed on A and B so that P=A-1B is identified.

For example, to define a recursive system for G = 3, let

A =
é
ê
ë
  1   0   0  
  .   1   0  
  .   .   1  
ù
ú
û
  and   B =
é
ê
ë
  .   0   0  
  0   .   0  
  0   0   .  
ù
ú
û

(To Be Continued)


Appendix 4: Vector Error Correction Model

Any VAR(p) model can be re-written as a Vector Error Correction (VEC) model. Consider a general VAR(p) with G equations (variables):

Yt = d + r1 Yt-1 + ... + rp Yt-p + et

Then,

DYt = d + pYt-1 + g1DYt-1 + ... + gp-1DYt-p+1 + et

where p = åj=1,...,prj - I and gi = - åj=i+1,...,prj, i=1,2,...p-1.

The GxG matrix p has rank r (0 £ r < G). Assume that p has a reduced rank (that is 0 < r < G, the variables cointegrate) so that it can be expressed as p = ab' where a and b has are both Gxr matrices of rank r.

Allowing for a constant and a linear trend and assuming that there are r cointegrating relations, the VEC model is written as:

DYt = d + tt + ab'Yt-1 + g1DYt-1 + ... + gp-1DYt-p+1 + et

Because VEC models the differences of the data, the constant d implies a linear time trend in the levels, and the time trend tt implies a quardratic trend in the levels of the data. By the following reparameterization,

The model can be re-written as:

DYt = d1 + t1t + a(b'Yt-1 + d0 + t0t) + g1DYt-1 + ... + gp-1DYt-p+1 + et

Note that a is a Kxr rank matrix, d0 and t0 are rx1 vectors of parameters and d1 and t1 are Kx1 vectors of parameters. Also, d1'ad0 = 0 and t1'at0 = 0.

Placing restrictions on the trend terms in the above VEC model yields five cases:

  1. Unrestricted trend model
  2. Restricted trend model: t1 = 0
  3. Unrestricted constant model: t1 = 0, t0 = 0
  4. Restricted constant model: t1 = 0, t0 = 0, d1 = 0
  5. No trend and no constant model: t1 = 0, t0 = 0, d1 = 0, d0 = 0

Therefore, for model estimation, we have to identify the type of model (1-5), the number of lags (p), and the number of cointegrating ranks (r).

The parameters of interest are:

Parameter restrictions may be necessary for identification and stability of the model.

(To Be Continued)


Copyright © Kuan-Pin Lin
Last updated: March 14, 2007