Least Squares Prediction

The Classical Regression Model

The Estimated Model

The Predicted Model

That is, Ypi is an unbiased predictor of E(Yi) = Xiβ, with heterogeneous variance Var(Ypi) = s2Xi(X'X)-1Xi.

If Xi is known for i=N+1,N+2,..., then Ypi = Xib is called the unconditional forecast of Yi. If Xi is not known and must be forecasted first, then Ypi = Xib is called the conditional forecast of Yi.

If Yi is known for i=N+1,N+2,..., then Ypi = Xib is called the ex-post forecast of Yi. If Yi is not known, then Ypi = Xib is called the ex-ante forecast of Yi.

For ex-post forecasts Ypi, i=N+1,N+2,..., the difference Yi-Ypi is called the forecast error. That is:

ei = Yi-Ypi (i=N+1,N+2,...)
= [Yi-E(Yi)] + [E(Yi)-Ypi]
= [Yi-E(Yi)] + [E(Ypi)-Ypi]
= εi + Xi(β-b)
  (1)   (2)

Sources of Forecast Error

  1. Forecast error due to population error
  2. Forecast error due to sampling error
    This may include model misspecification and forecast error of Xi in a conditional forecast.

Probability Distribution of Forecast Error

For i=N+1,N+2,...:
E(ei) = 0
Var(ei) = Var(εi) + Var(Ypi)
= σ2[1+Xi(X'X)-1Xi]

By normality assumption ε|X ~ Normal(0,σ2I), we have ei ~ Normal(0,σ2i), where σ2i = σ2[1+Xi(X'X)-1Xi], for i=N+1,N+2,...

Therefore, eii = (Yi-Ypi)/(σ2[1+Xi(X'X)-1Xi])½ ~ Normal(0,1).

Subsituting the unknown σ2 by the unbiased sample estimator s2, we have
ei/si = (Yi-Ypi)/(s2[1+Xi(X'X)-1Xi])½ ~ t(N-K).

Confidence Interval of Forecasts

For i=N+1,N+2,..., given a level of significance α > 0,
Pr[-tα/2 ≤ (Yi-Ypi)/si ≤ tα/2] = 1-α, or
Pr[Ypi-tα/2si ≤ Yi ≤ Ypi+tα/2si] = 1-α.

Where
si = (s2[1+Xi(X'X)-1Xi])½
Ypi-tα/2si = lower bound of forecast
Ypi+tα/2si = upper bound of forecast

Forecasting with a Log-Normal Model

If the classical regression model is represented in the log form as:

ln(Y) = Xβ + ε

The normality assumption of ε implies that Y is log-normal.

Let Z = ln(Y), or Y = eZ = exp(Z). Then:
E(Y) = E(exp(Z)) = exp(Xβ+σ2/2)
Var(Y) = Var(exp(Z)) = exp(2Xβ+σ2) [exp2)-1]

As N → ∞, Y ~ Normal(E(Y),Var(Y)).
Note: Median(Y) = exp(Xβ) < E(Y).

The Estimated Model

ln(Yi) = Xib + ei (i=1,2,...,N)

The Predicted Model

The forecast error is ei = ln(Yi)-ln(Ypi) (i=N+1,N+2,...). Given a level of significance α > 0, the confidence interval of forecast of ln(Yi) is defined by:

Pr[ln(Ypi)-tα/2siln(Yi) ≤ ln(Ypi)+tα/2si] = 1-α

Unlogged Forecast

If the forecast of Yi (i=N+1,N+2,...) is obtained from the exponential (anti-log) transformation:

Ypi = exp(ln(Ypi)) = exp(Xib)

Ypi is the unbiased predictor of the Median of Yi or exp(Xib), which always under-predict the Mean of Yi or E(Yi) = exp(Xib+si2/2). Therefore, the unbiased predictor of E(Yi) should be:

Ypi = exp(Xib+si2/2), with the variance
Var(Ypi) = exp(2Xib+si2)[exp(si2)-1]

As N → ∞, Ypi ~ Normal(E(Ypi),Var(Ypi)).

Given a level of significance α > 0, the confidence interval of forecast of Yi is defined by:

Pr[Ypi-zα/2si ≤ Yi ≤ Ypi+zα/2si] = 1-α

where zα/2 is the critical value of standard normal and si = Var(Ypi)½.


Model Evaluation: Forecast Error Statistics

Model Estimation: N Estimation Periods, 1,2,...,N
Ex-Post Forecasting: F Forecasting Periods, N+1,N+2,...,N+F

Evaluating the forecast performance by comparing the actuals and the predicted:
YN+1,YN+2,...,YN+F
YpN+1,YpN+2,...,YpN+F

Let
Ym = ∑i=N+1,...,N+FYi/F
Ypm = ∑i=N+1,...,N+FYpi/F
σ2Y = ∑i=N+1,...,N+F(Yi-Ym)2/F
σ2Yp = ∑i=N+1,...,N+F(Ypi-Ypm)2/F
σY,Yp = ∑i=N+1,...,N+F(Yi-Ym)(Ypi-Ypm)/F

Forecast Error Statistics

Components of MSE

MSE = ∑i=N+1,...,N+F(Yi-Ypi)2/F
= ∑i=N+1,...,N+F[(Yi-Ym)+(Ym-Ypm)-(Ypi-Ypm)]2/F
= (Ym-Ypm)2 + ∑i=N+1,...,N+F(Yi-Ym)2/F +
i=N+1,...,N+F(Ypi-Ypm)2/F -
  2∑i=N+1,...,N+F(Yi-Ym)(Ypi-Ypm)/F
= (Ym-Ypm)2 + σ2Y + σ2Yp - 2rσYσYp
= (Ym-Ypm)2 + (σYYp)2 + 2(1-r)σYσYp
= Bias Component + Variance Component + Covariance Component
= (Ym-Ypm)2 + (σYp-rσY)2 + (1-r22Y
= Bias Component + Regression Component + Disturbance Component

Then, UM+US+UC = 1 and UM+UR+UD = 1. Idealy UM → 0 and US and UR are small for a good forecast.

Theil Inequality Coefficient

Theil U Statistic is defined by
(∑i=N+1,...,N+F(Yi-Ypi)2/F)½
U =
(∑i=N+1,...,N+FYi2/F)½ + (∑i=N+1,...,N+FYpi2/F)½
RMSE
  =
(∑i=N+1,...,N+FYi2/F)½ + (∑i=N+1,...,N+FYpi2/F)½

Examples: U. S. GDP


Copyright © Kuan-Pin Lin
Last Updated: 01/01/2016