Regime Switching Regression Models

Readings and References:

Techniques of switching regression are useful to study the random variable with mixture of probability distributions, structural change, and disequilibrium. The recent development of Markov swtiching can be applied to dynamic time series models.

Consider the M-state (regime or structure) linear regression equation:

Yt = Xtb(i) + et(i)
et(i) ~ normal(0,s2(i))

where [Yt,Xt] is the observed sample data, t=1,2,...,N, and (i) is the state (regime) indicator, i=1,2,...,M. In particular, assuming normal distribution, the conditional probability density function of Yt occured in the regime i is written as:

f(Yt;b(i),s2(i)) = 1/Ö(2ps2(i)) exp(-½(Yt-Xtb(i))2/s2(i))

In general, the regression parameters could change several times (that is, switching backward and forward) within M possible regimes. Typically, M=2.

Deterministic Switching

If the sample separation is known at a given break point, then dummy variable approach and classical Chow test may be used to study the structural switch in the model. For examples, the well-known break points in time series are the World War II (end in 1945), OPEC Oil Embargo (began in October 1973), Black Monday (stock market crashed on Monday, October 19, 1987).

Dummy Variable Approach

Suppose the entire sample can be divided into M non-overlapping sub-samples: N(i), i=1,2,...,M. åi=1,2,...,M N(i) = N. Let Dit be the dummy variable defined by:

Dit = 1, if t in N(i)
0 otherwise

Since Dit is observed, given the conditional probability density function f(Yt;b(i),s2(i)) of Yt, the log-likelihood function for model estimation is:

ll(q) = åt=1,2,...,Nln åi=1,2,...,M Ditf(Yt;b(i),s2(i))

where q = (b(i),s2(i), i=1,2,...,M)' is the parameter vector of the model.

Stochastic Switching

If there is no a prior information can be based upon to make structure or regime classification, it is completely up to the state of nature to decide the regression structure. Let St be the discrete state variable which takes an index for the regime at t. That is, St = 1,2,...,M. Using the dummy variable notation:

Dit = 1, if St = i (the regime is i at time t)
0 otherwise

Dit is a random discrete variable taking value either 1 or 0, which is not directly observed. Let Dit* be the latent variable for Dit such that Dit* > 0 if Dit = 1, and Dit* £ 0 otherwise (Dit = 0). We write:

Dit* = a(i) + ut

Or, in a more general framework, Dit* = Zta(i) + ut where Zt is a set of exogenous or predetermined variables determining the regime structure.

When Dit* > 0, ut > - a(i). Assuming ut ~ normal(0,1), the probability of Dit = 1 is represented by the cumulative normal density evaluated from -¥ to a(i). Then the probability that St=i is expressed as:

Pr(St=i;a(i)) = Pr(Dit=1) = Pr(Dit*>0) = Pr(ut>-a(i))
= ò-¥ a(i) 1/Ö(2p) exp(-½ z2)dz

Of course, åi=1,2,...,M Pr(St=i;a(i)) = 1. Then the conditional joint probability density function of Yt and St=i is

f(Yt,St=i;b(i),s2(i),a(i))
= Pr(St=i;a(i)) f(Yt|St=i;b(i),s2(i))
= ò-¥ a(i)1/Ö(2p) exp(-½ z2)dz [1/Ö(2ps2(i)) exp(-½(Yt-Xtb(i))2/s2(i))]

Let q(i) = (b(i),s2(i),a(i))' and q =(q(i), i=1,2,...,M)'. Then, the unconditional probability density function of Yt is

f(Yt;q) = åi=1,2,...,M f(Yt,St=i;q(i))

The resulting log-likelihood function is:

ll(q) = åt=1,2,...,N ln f(Yt;q)
= åt=1,2,...,N ln åi=1,2,...,M f(Yt,St=i;q(i))

As a by-product of the maximum likelihood estimation of the parameter vector q, the estimated conditional probability for the observation t occurred in regime i is computed as follows:

Pr(St=i|Yt;q) = f(Yt,St=i;q(i)) / f(Yt;q)
= f(Yt,St=i;q(i)) / åi=1,2,...,M f(Yt,St=i;q(i))

Mixture of Probability Distributions Revisited

Recall the previous study of INCOME variable drawn from the mixture of two normal probability distributions (Program, Data). That is, for each observation of INCOME variable Yt, it is the mixture of two normal distributions:

f(Yt|St=i;m(i),s2(i)) = 1/Ö(2ps2(i)) exp(-½(Yt-m(i))2/s2(i)) with probability Pr(St=i), i=1,2.

Let Pr(St=1)=p, then Pr(St=2) = 1-p. Here p>0 must be estimated together with the parameters m(i) and s2(i) for i=1,2. The log-likelihood function is:

ll(p,m(1),m(2),s2(1),s2(2)) = åt=1,2,...,N ln [p f(Yt|St=1;m(1),s2(1)) + (1-p) f(Yt|St=2;m(2),s2(2))]

Finally, the conditional probability of INCOME variable Yt drawn from regime 1 is

p f(Yt|St=1;m(1),s2(1)) / [p f(Yt|St=1;m(1),s2(1)) + (1-p) f(Yt|St=2;m(2),s2(2))]

Markov Switching

If the switching is state-dependent (as in most cases of dynamic models), we assume the simplest case of first-order Markov process for the state indicator St with the following transition probability:

Pr(St=i|St-1=j) = pij > 0
åi=1,2,...,M pij = 1

Then the joint probability of St=i and St-1=j is defined by

Pr(St=i,St-1=j) = Pr(St=i|St-1=j)Pr(St-1=j) = pijPr(St-1=j)
and Pr(St=i) = åj=1,2,...,M Pr(St=i,St-1=j)
= åj=1,2,...,M pijPr(St-1=j)

For a two-state Markov switching process (M=2), we write:

p = p11 = Pr(St=1|St-1=1)     1-p = p21 = Pr(St=2|St-1=1)
q = p22 = Pr(St=2|St-1=2)     1-q = p12 = Pr(St=1|St-1=2)

Then,

Pr(St=1) = p Pr(St-1=1) + (1-q) Pr(St-1=2)
Pr(St=2) = (1-p) Pr(St-1=1) + q Pr(St-1=2)

The probabilities of a steady state (that is, S = St = St-1) are:

Pr(S=1) = (1-p)/(2-p-q)
Pr(S=2) = (1-q)/(2-p-q)

Let Pr(St=1) = Pr(St*£0) and Pr(St=2) = Pr(St*>0), where St* is a latent variable for the discrete state indicator St defined by:

St* = a0 + a1 (St-1-1) + ut, ut ~ normal(0,1)

Therefore,

p = Pr(St=1|St-1=1) = Pr(St*£0|St-1=1) = Pr(ut£-a0)
= ò-¥ -a01/Ö(2p) exp(-½ z2)dz
q = Pr(St=2|St-1=2) = Pr(St*>0) = Pr(ut>-a0-a1)
= ò-a0-a1 ¥1/Ö(2p) exp(-½ z2)dz
= 1 - ò-¥ -a0-a11/Ö(2p) exp(-½ z2)dz

It clear that through probability transformation, 0 < p < 1 and 0 < q < 1. An alternative approach to ensure that 0 < p < 1 and 0 < q < 1 is to use logistic transformation as follows:

p = exp(p0)/(1+exp(p0))
q = exp(q0)/(1+exp(q0))

where p0 and q0 are unrestricted real numbers.

Dynamic Model

We are interested in a dynamic model of Yt resulting from a combination of M states and L lags. Then there are r cases or events (r = ML+1) must be considered: (St,St-1,...,St-r+1). For simplicity, we consider a 2-state 1-lag dynamic model. There are four cases: (St=1,St-1=1), (St=2,St-1=1), (St=1,St-1=2), (St=2,St-1=2).

Let Ht-1 = {Yt-1,Yt-2,...} be the historical information up to time t-1. Suppose the conditional probability density function of Yt (conditional to St=i, St-1=j and Ht-1, ignoring the parameter vector q for the moment) is f(Yt|St=i,St-1=j,Ht-1). Then the conditional joint probability density function of Yt, St, and St-1 (conditional to Ht-1) is:

f(Yt,St=i,St-1=j|Ht-1) = Pr(St=i,St-1=j|Ht-1) f(Yt|St=i,St-1=j,Ht-1)

and

f(Yt|Ht-1;q) = åi=1,2åj=1,2 f(Yt,St=i,St-1=j|Ht-1;q)

where q is the parameter vector. Finally, the conditional log-likelihood function is:

ll(q) = åt=1,2,...,N ln f(Yt|Ht-1;q)

Hamilton's Algorithm

To evaluate the log-likelihood function ll(q), first we need to compute:

(1)     Pr(St=i,St-1=j|Ht-1) = Pr(St=i|St-1=j) Pr(St-1=j|Ht-1)

Starting t=1, let Pr(S0=1|H0) = (1-p)/(2-p-q) and Pr(S0=2|H0) = (1-q)/(2-p-q). These are the steady-state or unconditional probability of St=1 and St=2, respectively.

From the computed Pr(St=i,St-1=j|Ht-1) and given f(Yt|St=i,St-1=j,Ht-1), compute the joint pdfs f(Yt,St=i,St-1=j|Ht-1) for all combination of i,j=1,2, and therefore the conditional likelihood at t is:

(2)     f(Yt|Ht-1) = åi=1,2åj=1,2 f(Yt,St=i,St-1=j|Ht-1)
= åi=1,2åj=1,2 Pr(St=i,St-1=j|Ht-1) f(Yt|St=i,St-1=j,Ht-1)

By definition, Pr(St=i,St-1=j|Ht) = Pr(St=i,St-1=j|Yt,Ht-1)
= f(Yt,St=i,St-1=j|Ht-1) / f(Yt|Ht-1)
= f(Yt|St=i,St-1=j,Ht-1)Pr(St=i,St-1=j|Ht-1) / f(Yt|Ht-1)
and Pr(St=i|Ht)= åj=1,2 Pr(St=i,St-1=j|Ht)

To iterate from t=1 to N, use Pr(St=i|Ht) to compute Pr(St+1=k,St=i|Ht) as (1) and the conditional likelihood f(Yt+1|Ht) as (2). Finally, we sum all the log of component log-likelihood for maximization with respect to the parameter vector q.

The above algorithm may be generalized straightforwardly to consider M cases (M>2) and L lags (L>1), in which the probabilities and pdfs are evaluated over a large number of ML+1 cases.

For parameter estimation, we apply standard numerical maximization of the log-likelihood function for the entire sample. The alternative method is the EM (Expectation-Maximization) Algorithm.

To make inference about the state variable St, we can use filtered probability: Pr(St=i|Ht), or smoothed probability: Pr(St=i|HN). The later refers to the probability of St=i conditional on all the information in the sample. The smoothed probability is obtained by updating the filtered probability using all information.

Example: Hamilton's Model of Business Fluctuations

Define the growth in real GDP (Data) as follows:

Yt = 100*(ln(GDPt)-ln(GDPt-1))

Then 2-state ("boom" and "burst", or expansion and recession) growth in real GDP is modeled as an AR(4) process:

Yt = m(i) + et, i=1,2
et = r1et-1 + r2et-2 + r3et-3 + r4et-4 + ut

where ut ~ normal(0,s2). By assuming the first-order Markov switching process, the transition probabilities between two consecutive states St and St-1 are Pr(St=1|St-1=1)=p and Pr(St=2|St-1=2)=q. By definition, Pr(St=1|St-1=2)=1-q and Pr(St=2|St-1=1)=1-p.

The model can be represented equivalently as:

(Yt-m(i)) = r1(Yt-1-m(j)) + r2(Yt-2-m(k)) + r3(Yt-3-m(l)) + r4(Yt-4-m(m)) + ut

with i,j,k,l,m = 1,2. In total, there are 32 (=24+1) cases of probabilities and pdfs forming the likelihood function for maximization with respect to the parameter vector. In addition to the model estimation, both filtered and smoothed probabilities are computed for inference about the state variable St: Pr(St|Ht) and Pr(St|HN) for each observation t. (See Program, require GPE2 application module MARKOV1.GPE [draft version]).

Example: Three-State Markov Switching Model of the Real Interest Rate

Garcia and Perron [1996] switching regression model of real interest rate is a 3-state AR(2) process defined as follows:

(Yt-m(i)) = r1(Yt-1-m(j)) + r2(Yt-2-m(k)) + ut
ut ~ normal(0,s2(i))

with i,j,k = 1,2,3, where Yt is the ex-post real interest rate calculated by subtracting the CPI-based inflation rate from the nominal interest rate (three-month treasury bill rate). See Data.

Transition probability for the 3-state 1st-order Markov process is defined by pij = Pr(St=i|St-1=j) > 0, where åi=1,2,3 pij = 1 for j=1,2,3.

In total, there are 27(=32+1) cases of probabilities and density functions forming the likelihood function for maximization with respect to 14 parameters in the model:

m(1), m(2), m(3)
r1, r2
s2(1), s2(2), s2(3)
p11, p12, p13, p21, p22, p23

Formulate and estimate the log-likelihood function for Garcia-Perron Model of real interest rate. (Program)


Copyright © Kuan-Pin Lin
Last updated: April 14, 2005