Least Squares Prediction

The Classical Regression Model

Y = Xβ + ε, rank(X) = K
E(ε|X) = 0
Var(ε|X) = E(εε'|X) = σ²I
ε|X ~ Normal(0,σ²I)

The Estimated Model

Y_i = X_ib + e_i (i=1,2,...,N)
b = (X'X)^-1X'Y
Var(b) = s²(X'X)^-1
s² = e'e/(N-K)

The Predicted Model

Y^p_i = X_ib (i=N+1,N+2,...)
E(Y^p_i) = E(Y_i) = X_iβ
Var(Y^p_i) = X_i[Var(b)]X_i' = s²X_i(X'X)^-1X_i

That is, Y^p_i is an unbiased predictor of E(Y_i) = X_iβ, with heterogeneous variance Var(Y^p_i) = s²X_i(X'X)^-1X_i.

If X_i is known for i=N+1,N+2,..., then Y^p_i = X_ib is called the unconditional forecast of Y_i. If X_i is not known and must be forecasted first, then Y^p_i = X_ib is called the conditional forecast of Y_i.

If Y_i is known for i=N+1,N+2,..., then Y^p_i = X_ib is called the ex-post forecast of Y_i. If Y_i is not known, then Y^p_i = X_ib is called the ex-ante forecast of Y_i.

For ex-post forecasts Y^p_i, i=N+1,N+2,..., the difference Y_i-Y^p_i is called the forecast error. That is:

e_i = Y_i-Y^p_i (i=N+1,N+2,...)

= [Y_i-E(Y_i)] + [E(Y_i)-Y^p_i]

= [Y_i-E(Y_i)] + [E(Y^p_i)-Y^p_i]

= ε_i + X_i(β-b)

(1) (2)

Sources of Forecast Error

Forecast error due to population error
Forecast error due to sampling error
This may include model misspecification and forecast error of X_i in a conditional forecast.

Probability Distribution of Forecast Error

For i=N+1,N+2,...:

E(e_i)	= 0
Var(e_i)	= Var(ε_i) + Var(Y^p_i)
	= σ²[1+X_i(X'X)^-1X_i]

By normality assumption ε|X ~ Normal(0,σ²I), we have e_i ~ Normal(0,σ²_i), where σ²_i = σ²[1+X_i(X'X)^-1X_i], for i=N+1,N+2,...

Therefore, e_i/σ_i = (Y_i-Y^p_i)/(σ²[1+X_i(X'X)^-1X_i])^½ ~ Normal(0,1).

Subsituting the unknown σ² by the unbiased sample estimator s², we have
e_i/s_i = (Y_i-Y^p_i)/(s²[1+X_i(X'X)^-1X_i])^½ ~ t(N-K).

Confidence Interval of Forecasts

For i=N+1,N+2,..., given a level of significance α > 0,
Pr[-t_α/2 ≤ (Y_i-Y^p_i)/s_i ≤ t_α/2] = 1-α, or
Pr[Y^p_i-t_α/2s_i ≤ Y_i ≤ Y^p_i+t_α/2s_i] = 1-α.

Where
s_i = (s²[1+X_i(X'X)^-1X_i])^½
Y^p_i-t_α/2s_i = lower bound of forecast
Y^p_i+t_α/2s_i = upper bound of forecast

Forecasting with a Log-Normal Model

If the classical regression model is represented in the log form as:

ln(Y) = Xβ + ε

The normality assumption of ε implies that Y is log-normal.

Let Z = ln(Y), or Y = e^Z = exp(Z). Then:
E(Y) = E(exp(Z)) = exp(Xβ+σ²/2)
Var(Y) = Var(exp(Z)) = exp(2Xβ+σ²) [exp(σ²)-1]

As N → ∞, Y ~ Normal(E(Y),Var(Y)).
Note: Median(Y) = exp(Xβ) < E(Y).

The Estimated Model

ln(Y_i) = X_ib + e_i (i=1,2,...,N)

The Predicted Model

ln(Y^p_i) = X_ib (i=N+1,N+2,...)
E(ln(Y^p_i)) = X_ib
Var(ln(Y^p_i)) = X_i[Var(b)]X_i' = s²X_i(X'X)^-1X_i

The forecast error is e_i = ln(Y_i)-ln(Y^p_i) (i=N+1,N+2,...).

E(e_i) = 0
Var(e_i) = s²[1+X_i(X'X)^-1X_i] = s²_i

Given a level of significance α > 0, the confidence interval of forecast of ln(Y_i) is defined by:

Pr[ln(Y^p_i)-t_α/2s_i ≤ ln(Y_i) ≤ ln(Y^p_i)+t_α/2s_i] = 1-α

Unlogged Forecast

If the forecast of Y_i (i=N+1,N+2,...) is obtained from the exponential (anti-log) transformation:

Y^p_i = exp(ln(Y^p_i)) = exp(X_ib)

Y^p_i is the unbiased predictor of the Median of Y_i or exp(X_ib), which always under-predict the Mean of Y_i or E(Y_i) = exp(X_ib+s_i²/2). Therefore, the unbiased predictor of E(Y_i) should be:

Y^p_i = exp(X_ib+s_i²/2), with the variance
Var(Y^p_i) = exp(2X_ib+s_i²)[exp(s_i²)-1]

As N → ∞, Y^p_i ~ Normal(E(Y^p_i),Var(Y^p_i)).

Given a level of significance α > 0, the confidence interval of forecast of Y_i is defined by:

Pr[Y^p_i-z_α/2s_i ≤ Y_i ≤ Y^p_i+z_α/2s_i] = 1-α

where z_α/2 is the critical value of standard normal and s_i = Var(Y^p_i)^½.

Model Evaluation: Forecast Error Statistics

Model Estimation: N Estimation Periods, 1,2,...,N
Ex-Post Forecasting: F Forecasting Periods, N+1,N+2,...,N+F

Evaluating the forecast performance by comparing the actuals and the predicted:

Y_N+1,Y_N+2,...,Y_N+F

↓

Y^p_N+1,Y^p_N+2,...,Y^p_N+F

Let

Y^m = ∑_{i=N+1,...,N+F}Y_i/F

Y^pm = ∑_{i=N+1,...,N+F}Y^p_i/F

σ²_Y = ∑_{i=N+1,...,N+F}(Y_i-Y^m)²/F

σ²_Y^p = ∑_{i=N+1,...,N+F}(Y^p_i-Y^pm)²/F

σ_Y,Y^p = ∑_{i=N+1,...,N+F}(Y_i-Y^m)(Y^p_i-Y^pm)/F

Forecast Error Statistics

Squared Correlation Between Observed and Predicted

⌈ ∑_{i=N+1,...,N+F}(Y_i-Y^m)(Y^p_i-Y^pm) ⌉²

r² = |
|

⌊ [∑_{i=N+1,...,N+F}(Y_i-Y^m)²]^½ [∑_{i=N+1,...,N+F}(Y^p_i-Y^pm)²]^½ ⌋

That is, r² = (σ_Y,Y^p/σ_Yσ_Y^p)²
Mean Absolute (Percentage) Error
- MAE = ∑_{i=N+1,...,N+F}|Y_i-Y^p_i|/F
- MAPE = ∑_{i=N+1,...,N+F}|100(Y_i-Y^p_i)/Y_i|/F
Mean Squared (Percentage) Error
- MSE = ∑_{i=N+1,...,N+F}(Y_i-Y^p_i)²/F
- MSPE = ∑_{i=N+1,...,N+F}(100(Y_i-Y^p_i)/Y_i)²/F
Root Mean Squared (Percentage) Error
- RMSE = MSE^½
- RMSPE = MSPE^½

Components of MSE

MSE	= ∑_{i=N+1,...,N+F}(Y_i-Y^p_i)²/F
	= ∑_{i=N+1,...,N+F}[(Y_i-Y^m)+(Y^m-Y^pm)-(Y^p_i-Y^pm)]²/F
	= (Y^m-Y^pm)² + ∑_{i=N+1,...,N+F}(Y_i-Y^m)²/F + ∑_{i=N+1,...,N+F}(Y^p_i-Y^pm)²/F - 2∑_{i=N+1,...,N+F}(Y_i-Y^m)(Y^p_i-Y^pm)/F
	= (Y^m-Y^pm)² + σ²_Y + σ²_Y^p - 2rσ_Yσ_Y^p
	= (Y^m-Y^pm)² + (σ_Y-σ_Y^p)² + 2(1-r)σ_Yσ_Y^p
	= Bias Component + Variance Component + Covariance Component
	= (Y^m-Y^pm)² + (σ_Y^p-rσ_Y)² + (1-r²)σ²_Y
	= Bias Component + Regression Component + Disturbance Component

U_M = (Y^m-Y^pm)²/MSE = Bias Component of MSE
U_S = (σ_Y-σ_Y^p)²/MSE = Variance Component of MSE
U_C = 2(1-r)σ_Yσ_Y^p/MSE = Covariance Component of MSE
U_R = (σ_Y^p-rσ_Y)²/MSE = Regression Component of MSE
U_D = (1-r²)σ²_Y/MSE = Disturbance Component of MSE

Then, U_M+U_S+U_C = 1 and U_M+U_R+U_D = 1. Idealy U_M → 0 and U_S and U_R are small for a good forecast.

Theil Inequality Coefficient

Theil U Statistic is defined by

	(∑_{i=N+1,...,N+F}(Y_i-Y^p_i)²/F)^½
U =
	(∑_{i=N+1,...,N+F}Y_i²/F)^½ + (∑_{i=N+1,...,N+F}Y^p_i²/F)^½
	RMSE
=
	(∑_{i=N+1,...,N+F}Y_i²/F)^½ + (∑_{i=N+1,...,N+F}Y^p_i²/F)^½

0 ≤ U ≤ 1
U statistic is scale independent, and it reflects the models ability to track turning points in the data.
The worse of the forecast, the larger value of the U statistic. For a perfect forecast, Y^p_i = Y_i (i=N+1,N+2,...,N+F), U = 0.

Examples: U. S. GDP

Modeling and Forecasting Trend
1. Linear Trend
  Y_i = β₀ + β₁TIME_i + ε_i
2. Exponential (or Log-Linear) Trend
  ln(Y_i) = β₀ + β₁TIME_i + ε_i
3. Quadratic Trend
  Y_i = β₀ + β₁TIME_i + β₂TIME²_i + ε_i
Modeling and Forecasting Seasonality
Suppose K is the number of seasonal variations (e.g. K=4 for quarters, 12 for months, 52 for weeks in a year). Let

D_ik = 1 if observation i is in the season k, k=1,2,...,K

0 otherwise
1. Seasonality
  Y_i = ∑_k=1,2,...K β_kD_ik + ε_i
2. Trend and Seasonality
  Y_i = β₀TIME_i + ∑_k=1,2,...K β_kD_ik + ε_i
Modeling and Forecasting Structural Change
If there is a structural break at time point i0, let D_i = 1 if i>i0 (D_i = 0 otherwise).
1. Structural Change
  Y_i = β₀ + β₁D_i + ε_i
2. Trend and Structural Change
  Y_i = β₀ + β₁D_i + β₂TIME_i + β₃(D_i*TIME_i) + ε_i

e_i	= Y_i-Y^p_i (i=N+1,N+2,...)
	= [Y_i-E(Y_i)] + [E(Y_i)-Y^p_i]
	= [Y_i-E(Y_i)] + [E(Y^p_i)-Y^p_i]
	= ε_i + X_i(β-b)
	(1) (2)

	⌈	∑_{i=N+1,...,N+F}(Y_i-Y^m)(Y^p_i-Y^pm)	⌉²
r² =	\|		\|
	⌊	[∑_{i=N+1,...,N+F}(Y_i-Y^m)²]^½ [∑_{i=N+1,...,N+F}(Y^p_i-Y^pm)²]^½	⌋

D_ik =	1 if observation i is in the season k, k=1,2,...,K
	0 otherwise