Topic 4d

Regime Switching Regression Models

Readings and References:

S. M. Goldfeld and R. E. Quandt, "Techniques for Estimating Switching Regressions," in Nonlinear Methods in Econometrics, North-Holland, 1972, Chapter 1, 3-35.
J. D. Hamilton, "Estimation, Inference and Forecasting of Time Series Subject to Changes in Regime," in Handbook of Statistics, Vol. 11, ed. by G. S. Maddala, C. R. Rao, and H. D. Vinod, North-Holland, 1993, 231-260.
R. Garcia and P. Perron, "An Analysis of Real Interest Rate under Regime Shift," Review of Economics and Statistics, 1996, 78, 115-125. (Paper)
C.-J. Kim and C. R. Nelson, State-Space Models with Regime Switching, Classical and Gibbs-Sampling Approaches with Applications, The MIT Press, 1999, Chapter 4: Markov-Switching Models, 59-95.

Techniques of switching regression are useful to study the random variable with mixture of probability distributions, structural change, and disequilibrium. The recent development of Markov swtiching can be applied to dynamic time series models.

Consider the M-state (regime or structure) linear regression equation:

Y_t = X_tb⁽ⁱ⁾ + e_t⁽ⁱ⁾
e_t⁽ⁱ⁾ ~ normal(0,s²⁽ⁱ⁾)

where [Y_t,X_t] is the observed sample data, t=1,2,...,N, and (i) is the state (regime) indicator, i=1,2,...,M. In particular, assuming normal distribution, the conditional probability density function of Y_t occured in the regime i is written as:

f(Y_t;b⁽ⁱ⁾,s²⁽ⁱ⁾) = 1/Ö(2ps²⁽ⁱ⁾) exp(-½(Y_t-X_tb⁽ⁱ⁾)²/s²⁽ⁱ⁾)

In general, the regression parameters could change several times (that is, switching backward and forward) within M possible regimes. Typically, M=2.

Deterministic Switching

If the sample separation is known at a given break point, then dummy variable approach and classical Chow test may be used to study the structural switch in the model. For examples, the well-known break points in time series are the World War II (end in 1945), OPEC Oil Embargo (began in October 1973), Black Monday (stock market crashed on Monday, October 19, 1987).

Dummy Variable Approach

Suppose the entire sample can be divided into M non-overlapping sub-samples: N(i), i=1,2,...,M. å_i=1,2,...,M N(i) = N. Let D_it be the dummy variable defined by:

D_it = 1, if t in N(i)

0 otherwise

Since D_it is observed, given the conditional probability density function f(Y_t;b⁽ⁱ⁾,s²⁽ⁱ⁾) of Y_t, the log-likelihood function for model estimation is:

ll(q) = å_t=1,2,...,Nln å_i=1,2,...,M D_itf(Y_t;b⁽ⁱ⁾,s²⁽ⁱ⁾)

where q = (b⁽ⁱ⁾,s²⁽ⁱ⁾, i=1,2,...,M)' is the parameter vector of the model.

Stochastic Switching

If there is no a prior information can be based upon to make structure or regime classification, it is completely up to the state of nature to decide the regression structure. Let S_t be the discrete state variable which takes an index for the regime at t. That is, S_t = 1,2,...,M. Using the dummy variable notation:

D_it = 1, if S_t = i (the regime is i at time t)

0 otherwise

D_it is a random discrete variable taking value either 1 or 0, which is not directly observed. Let D_it^* be the latent variable for D_it such that D_it^* > 0 if D_it = 1, and D_it^* £ 0 otherwise (D_it = 0). We write:

D_it^* = a⁽ⁱ⁾ + u_t

Or, in a more general framework, D_it^* = Z_ta⁽ⁱ⁾ + u_t where Z_t is a set of exogenous or predetermined variables determining the regime structure.

When D_it^* > 0, u_t > - a⁽ⁱ⁾. Assuming u_t ~ normal(0,1), the probability of D_it = 1 is represented by the cumulative normal density evaluated from -¥ to a⁽ⁱ⁾. Then the probability that S_t=i is expressed as:

Pr(S_t=i;a⁽ⁱ⁾) = Pr(D_it=1) = Pr(D_it^*>0) = Pr(u_t>-a⁽ⁱ⁾)

= ò_-¥ ^a⁽ⁱ⁾ 1/Ö(2p) exp(-½ z²)dz

Of course, å_i=1,2,...,M Pr(S_t=i;a⁽ⁱ⁾) = 1. Then the conditional joint probability density function of Y_t and S_t=i is

f(Y_t,S_t=i;b⁽ⁱ⁾,s²⁽ⁱ⁾,a⁽ⁱ⁾)
= Pr(S_t=i;a⁽ⁱ⁾) f(Y_t|S_t=i;b⁽ⁱ⁾,s²⁽ⁱ⁾)
= ò_-¥ ^a⁽ⁱ⁾1/Ö(2p) exp(-½ z²)dz [1/Ö(2ps²⁽ⁱ⁾) exp(-½(Y_t-X_tb⁽ⁱ⁾)²/s²⁽ⁱ⁾)]

Let q⁽ⁱ⁾ = (b⁽ⁱ⁾,s²⁽ⁱ⁾,a⁽ⁱ⁾)' and q =(q⁽ⁱ⁾, i=1,2,...,M)'. Then, the unconditional probability density function of Y_t is

f(Y_t;q) = å_i=1,2,...,M f(Y_t,S_t=i;q⁽ⁱ⁾)

The resulting log-likelihood function is:

ll(q) = å_t=1,2,...,N ln f(Y_t;q)

= å_t=1,2,...,N ln å_i=1,2,...,M f(Y_t,S_t=i;q⁽ⁱ⁾)

As a by-product of the maximum likelihood estimation of the parameter vector q, the estimated conditional probability for the observation t occurred in regime i is computed as follows:

Pr(S_t=i|Y_t;q) = f(Y_t,S_t=i;q⁽ⁱ⁾) / f(Y_t;q)

= f(Y_t,S_t=i;q⁽ⁱ⁾) / å_i=1,2,...,M f(Y_t,S_t=i;q⁽ⁱ⁾)

Mixture of Probability Distributions Revisited

Recall the previous study of INCOME variable drawn from the mixture of two normal probability distributions (Program, Data). That is, for each observation of INCOME variable Y_t, it is the mixture of two normal distributions:

f(Y_t|S_t=i;m⁽ⁱ⁾,s²⁽ⁱ⁾) = 1/Ö(2ps²⁽ⁱ⁾) exp(-½(Y_t-m⁽ⁱ⁾)²/s²⁽ⁱ⁾) with probability Pr(S_t=i), i=1,2.

Let Pr(S_t=1)=p, then Pr(S_t=2) = 1-p. Here p>0 must be estimated together with the parameters m⁽ⁱ⁾ and s²⁽ⁱ⁾ for i=1,2. The log-likelihood function is:

ll(p,m⁽¹⁾,m⁽²⁾,s²⁽¹⁾,s²⁽²⁾) = å_t=1,2,...,N ln [p f(Y_t|S_t=1;m⁽¹⁾,s²⁽¹⁾) + (1-p) f(Y_t|S_t=2;m⁽²⁾,s²⁽²⁾)]

Finally, the conditional probability of INCOME variable Y_t drawn from regime 1 is

p f(Y_t|S_t=1;m⁽¹⁾,s²⁽¹⁾) / [p f(Y_t|S_t=1;m⁽¹⁾,s²⁽¹⁾) + (1-p) f(Y_t|S_t=2;m⁽²⁾,s²⁽²⁾)]

Markov Switching

If the switching is state-dependent (as in most cases of dynamic models), we assume the simplest case of first-order Markov process for the state indicator S_t with the following transition probability:

Pr(S_t=i|S_t-1=j) = p_ij > 0
å_i=1,2,...,M p_ij = 1

Then the joint probability of S_t=i and S_t-1=j is defined by

Pr(S_t=i,S_t-1=j) = Pr(S_t=i|S_t-1=j)Pr(S_t-1=j) = p_ijPr(S_t-1=j)

and Pr(S_t=i) = å_j=1,2,...,M Pr(S_t=i,S_t-1=j)

= å_j=1,2,...,M p_ijPr(S_t-1=j)

For a two-state Markov switching process (M=2), we write:

p = p₁₁ = Pr(S_t=1|S_t-1=1) 1-p = p₂₁ = Pr(S_t=2|S_t-1=1)

q = p₂₂ = Pr(S_t=2|S_t-1=2) 1-q = p₁₂ = Pr(S_t=1|S_t-1=2)

Then,

Pr(S_t=1) = p Pr(S_t-1=1) + (1-q) Pr(S_t-1=2)
Pr(S_t=2) = (1-p) Pr(S_t-1=1) + q Pr(S_t-1=2)

The probabilities of a steady state (that is, S = S_t = S_t-1) are:

Pr(S=1) = (1-p)/(2-p-q)
Pr(S=2) = (1-q)/(2-p-q)

Let Pr(S_t=1) = Pr(S_t^*£0) and Pr(S_t=2) = Pr(S_t^*>0), where S_t^* is a latent variable for the discrete state indicator S_t defined by:

S_t^* = a₀ + a₁ (S_t-1-1) + u_t, u_t ~ normal(0,1)

Therefore,

p = Pr(S_t=1|S_t-1=1) = Pr(S_t^*£0|S_t-1=1) = Pr(u_t£-a₀)

= ò_-¥ ^-a01/Ö(2p) exp(-½ z²)dz

q = Pr(S_t=2|S_t-1=2) = Pr(S_t^*>0) = Pr(u_t>-a₀-a₁)

= ò_-a₀-a₁ ^¥1/Ö(2p) exp(-½ z²)dz

= 1 - ò_-¥ ^-a0-a11/Ö(2p) exp(-½ z²)dz

It clear that through probability transformation, 0 < p < 1 and 0 < q < 1. An alternative approach to ensure that 0 < p < 1 and 0 < q < 1 is to use logistic transformation as follows:

p = exp(p₀)/(1+exp(p₀))
q = exp(q₀)/(1+exp(q₀))

where p₀ and q₀ are unrestricted real numbers.

Dynamic Model

We are interested in a dynamic model of Y_t resulting from a combination of M states and L lags. Then there are r cases or events (r = M^L+1) must be considered: (S_t,S_t-1,...,S_t-r+1). For simplicity, we consider a 2-state 1-lag dynamic model. There are four cases: (S_t=1,S_t-1=1), (S_t=2,S_t-1=1), (S_t=1,S_t-1=2), (S_t=2,S_t-1=2).

Let H_t-1 = {Y_t-1,Y_t-2,...} be the historical information up to time t-1. Suppose the conditional probability density function of Y_t (conditional to S_t=i, S_t-1=j and H_t-1, ignoring the parameter vector q for the moment) is f(Y_t|S_t=i,S_t-1=j,H_t-1). Then the conditional joint probability density function of Y_t, S_t, and S_t-1 (conditional to H_t-1) is:

f(Y_t,S_t=i,S_t-1=j|H_t-1) = Pr(S_t=i,S_t-1=j|H_t-1) f(Y_t|S_t=i,S_t-1=j,H_t-1)

and

f(Y_t|H_t-1;q) = å_i=1,2å_j=1,2 f(Y_t,S_t=i,S_t-1=j|H_t-1;q)

where q is the parameter vector. Finally, the conditional log-likelihood function is:

ll(q) = å_t=1,2,...,N ln f(Y_t|H_t-1;q)

Hamilton's Algorithm

To evaluate the log-likelihood function ll(q), first we need to compute:

(1) Pr(S_t=i,S_t-1=j|H_t-1) = Pr(S_t=i|S_t-1=j) Pr(S_t-1=j|H_t-1)

Starting t=1, let Pr(S₀=1|H₀) = (1-p)/(2-p-q) and Pr(S₀=2|H₀) = (1-q)/(2-p-q). These are the steady-state or unconditional probability of S_t=1 and S_t=2, respectively.

From the computed Pr(S_t=i,S_t-1=j|H_t-1) and given f(Y_t|S_t=i,S_t-1=j,H_t-1), compute the joint pdfs f(Y_t,S_t=i,S_t-1=j|H_t-1) for all combination of i,j=1,2, and therefore the conditional likelihood at t is:

(2) f(Y_t|H_t-1) = å_i=1,2å_j=1,2 f(Y_t,S_t=i,S_t-1=j|H_t-1)

= å_i=1,2å_j=1,2 Pr(S_t=i,S_t-1=j|H_t-1) f(Y_t|S_t=i,S_t-1=j,H_t-1)

To iterate from t=1 to N, use Pr(S_t=i|H_t) to compute Pr(S_t+1=k,S_t=i|H_t) as (1) and the conditional likelihood f(Y_t+1|H_t) as (2). Finally, we sum all the log of component log-likelihood for maximization with respect to the parameter vector q.

The above algorithm may be generalized straightforwardly to consider M cases (M>2) and L lags (L>1), in which the probabilities and pdfs are evaluated over a large number of M^L+1 cases.

For parameter estimation, we apply standard numerical maximization of the log-likelihood function for the entire sample. The alternative method is the EM (Expectation-Maximization) Algorithm.

To make inference about the state variable S_t, we can use filtered probability: Pr(S_t=i|H_t), or smoothed probability: Pr(S_t=i|H_N). The later refers to the probability of S_t=i conditional on all the information in the sample. The smoothed probability is obtained by updating the filtered probability using all information.

Example: Hamilton's Model of Business Fluctuations

Define the growth in real GDP (Data) as follows:

Y_t = 100*(ln(GDP_t)-ln(GDP_t-1))

Then 2-state ("boom" and "burst", or expansion and recession) growth in real GDP is modeled as an AR(4) process:

Y_t = m⁽ⁱ⁾ + e_t, i=1,2
e_t = r₁e_t-1 + r₂e_t-2 + r₃e_t-3 + r₄e_t-4 + u_t

where u_t ~ normal(0,s²). By assuming the first-order Markov switching process, the transition probabilities between two consecutive states S_t and S_t-1 are Pr(S_t=1|S_t-1=1)=p and Pr(S_t=2|S_t-1=2)=q. By definition, Pr(S_t=1|S_t-1=2)=1-q and Pr(S_t=2|S_t-1=1)=1-p.

The model can be represented equivalently as:

(Y_t-m⁽ⁱ⁾) = r₁(Y_t-1-m^(j)) + r₂(Y_t-2-m^(k)) + r₃(Y_t-3-m^(l)) + r₄(Y_t-4-m^(m)) + u_t

with i,j,k,l,m = 1,2. In total, there are 32 (=2⁴⁺¹) cases of probabilities and pdfs forming the likelihood function for maximization with respect to the parameter vector. In addition to the model estimation, both filtered and smoothed probabilities are computed for inference about the state variable S_t: Pr(S_t|H_t) and Pr(S_t|H_N) for each observation t. (See Program, require GPE2 application module MARKOV1.GPE [draft version]).

Example: Three-State Markov Switching Model of the Real Interest Rate

Garcia and Perron [1996] switching regression model of real interest rate is a 3-state AR(2) process defined as follows:

(Y_t-m⁽ⁱ⁾) = r₁(Y_t-1-m^(j)) + r₂(Y_t-2-m^(k)) + u_t
u_t ~ normal(0,s²⁽ⁱ⁾)

with i,j,k = 1,2,3, where Y_t is the ex-post real interest rate calculated by subtracting the CPI-based inflation rate from the nominal interest rate (three-month treasury bill rate). See Data.

Transition probability for the 3-state 1st-order Markov process is defined by p_ij = Pr(S_t=i|S_t-1=j) > 0, where å_i=1,2,3 p_ij = 1 for j=1,2,3.

In total, there are 27(=3²⁺¹) cases of probabilities and density functions forming the likelihood function for maximization with respect to 14 parameters in the model:

m⁽¹⁾, m⁽²⁾, m⁽³⁾
r₁, r₂
s²⁽¹⁾, s2⁽²⁾, s²⁽³⁾
p₁₁, p₁₂, p₁₃, p₂₁, p₂₂, p₂₃

Formulate and estimate the log-likelihood function for Garcia-Perron Model of real interest rate. (Program)

Pr(S_t=i;a⁽ⁱ⁾)	= Pr(D_it=1) = Pr(D_it^*>0) = Pr(u_t>-a⁽ⁱ⁾)
	= ò_-¥ ^a⁽ⁱ⁾ 1/Ö(2p) exp(-½ z²)dz

ll(q)	= å_t=1,2,...,N ln f(Y_t;q)
	= å_t=1,2,...,N ln å_i=1,2,...,M f(Y_t,S_t=i;q⁽ⁱ⁾)

Pr(S_t=i\|Y_t;q)	= f(Y_t,S_t=i;q⁽ⁱ⁾) / f(Y_t;q)
	= f(Y_t,S_t=i;q⁽ⁱ⁾) / å_i=1,2,...,M f(Y_t,S_t=i;q⁽ⁱ⁾)

Pr(S_t=i,S_t-1=j)	= Pr(S_t=i\|S_t-1=j)Pr(S_t-1=j) = p_ijPr(S_t-1=j)
and Pr(S_t=i)	= å_j=1,2,...,M Pr(S_t=i,S_t-1=j)
	= å_j=1,2,...,M p_ijPr(S_t-1=j)

p = p₁₁ = Pr(S_t=1\|S_t-1=1)	1-p = p₂₁ = Pr(S_t=2\|S_t-1=1)
q = p₂₂ = Pr(S_t=2\|S_t-1=2)	1-q = p₁₂ = Pr(S_t=1\|S_t-1=2)

p = Pr(S_t=1\|S_t-1=1)	= Pr(S_t^*£0\|S_t-1=1) = Pr(u_t£-a₀)
	= ò_-¥ ^-a01/Ö(2p) exp(-½ z²)dz
q = Pr(S_t=2\|S_t-1=2)	= Pr(S_t^*>0) = Pr(u_t>-a₀-a₁)
	= ò_-a₀-a₁ ^¥1/Ö(2p) exp(-½ z²)dz
	= 1 - ò_-¥ ^-a0-a11/Ö(2p) exp(-½ z²)dz

(2)	f(Y_t\|H_t-1)	= å_i=1,2å_j=1,2 f(Y_t,S_t=i,S_t-1=j\|H_t-1)
		= å_i=1,2å_j=1,2 Pr(S_t=i,S_t-1=j\|H_t-1) f(Y_t\|S_t=i,S_t-1=j,H_t-1)

By definition, Pr(S_t=i,S_t-1=j\|H_t)	= Pr(S_t=i,S_t-1=j\|Y_t,H_t-1)
	= f(Y_t,S_t=i,S_t-1=j\|H_t-1) / f(Y_t\|H_t-1)
	= f(Y_t\|S_t=i,S_t-1=j,H_t-1)Pr(S_t=i,S_t-1=j\|H_t-1) / f(Y_t\|H_t-1)
and Pr(S_t=i\|H_t)	= å_j=1,2 Pr(S_t=i,S_t-1=j\|H_t)