or
Yt = μ + (1+ψ1B+ψ2B2+...)εt = μ + ψ(B)εt
where
ψ(B) = 1 + ψ1B + ψ2B2 + ..., and
εt ~ ii(0,σ2),
t = 1,2,...,N.
Stationarity requirement for the process:
Mean | μ = E(Yt) < ∞ |
Variance | γ0 = σ2∑i=0,...,∞ψi2 < ∞ |
Autocovariance | γj = σ2∑i=0,...,∞ψiψj+i < ∞ |
Autocorrelation | φj = γj / γ0 |
The plot of aucorrelation coefficients at each lag j = 0,1,2,... is called autocorrelation function. In general, the autocorrelation function of a stationary time series falls to 0 quickly. The autocorrelation function of a non-stationary time series does not converge to zero. However, by differencing the time series, the differenced series may become stationary.
A linear stochastic process called an integrated process of order d, I(d), if the d-th difference of the series is stationary. Note that d is the lowest number of differences required for the resulting series to be stationary. That is, Yt ~ I(d) if ΔdYt is stationary, where
ΔYt = Yt-Yt-1
Δ2Yt = ΔYt-ΔYt-1
...
ψj = | -θj, j=1,2,...q |
0 for j > q |
then
Yt | = μ - θ1εt-1 - θ2εt-2 - ... - θqεt-q + εt |
= μ + (1 - θ1B - θ2B2 - ... - θqBq)εt | |
= μ + θ(B) εt |
where
θ(B) = 1 - θ1B -
θ2B2 - ... - θqBq, and
εt ~ ii(0,σ2),
t = 1,2,...,N.
or
γj = | (-θj+θ1θj+1+θ2θj+2+...+θq-jθq)σ2, j=1,2,...q |
0 otherwise |
or
φj = |
| , j = 1,2,...q | |||
0 | otherwise |
Plot of autocorrelation coefficients of a stationary MA(q) process indicates a finite memory pattern up to the q-th lag. Beyond that the autocorrelation function is zero-valued.
Examples
Yt = δ + ρ1Yt-1 + ρ2Yt-2 + ρ3Yt-3 + ... εt
Consider only the case with finite number of autoregressive lags. That is,
Yt = δ +
ρ1Yt-1 +
ρ2Yt-2 + ... +
ρpYt-p +
εt
or
ρ(B)Yt = δ + εt,
where
ρ(B) = 1 - ρ1B -
ρ2B2 - ... - ρpBp, and
εt ~ ii(0,σ2),
t = 1,2,...,N.
γ0 = E(yt2) | = E[(ρ1yt-1 + ρ2yt-2 + ... + ρpyt-p + εt)2] |
= ρ1γ1 + ρ2γ2 + ... + ρpγp + σ2 |
That is the Yule-Walker Equations of AR(p) Process:
φ1 =
ρ1 +
ρ2φ1 +
ρ3φ2 + ... +
ρpφp-1
φ2 =
ρ1φ1 +
ρ2 +
ρ3φ1 + ... +
ρpφp-2
...
φp =
ρ1φp-1 +
ρ2φp-2 +
ρ3φp-3 + ... +
ρp
...
φj =
ρ1φj-1 +
ρ2φj-2 +
ρ3φj-3 + ... +
ρpφj-p, for j > p
Plot of autocorrelation coefficients φj (j=1,2,...) of a stationary AR(p) process indicates an infinite but decay memory pattern.
Examples
or
ρ(B)Yt = δ + θ(B)εt,
where
ρ(B) = 1 - ρ1B -
ρ2B2 - ... - ρpBp,
θ(B) = 1 - θ1B -
θ2B2 - ... - θqBq, and
εt ~ ii(0,σ2),
t = 1,2,...,N.
Therefore,
γ0 =
E(yt2) =
∑i=1,...,p
ρiγi +
[1-∑i=1,...,q
θi(ρi-θi)]σ2.
That is,
σ2 = |
|
or,
for j = 1,2,...,q
γj =
∑i=1,...,p
ρiγ|i-j| -
[θj+∑i=1,...,q-j
θi+j(ρi-θi)]σ2.
and, for j > p
γj =
∑i=1,...,p
ρiγ|i-j|
or,
for j = 1,2,...,q
φj = | ∑i=1,...,p ρiφ|i-j| - |
|
Plot of autocorrelation coefficients φj (j=1,2,...) of a stationary mixed ARMA(p,q) process reflects the combination of infinite memory AR and finite memory MA processes. However, after q lags, only the AR process continues.
Examples
φ1 = |
|
φ11 = φ1
(this is ρ1 of AR(1))
φ22 =
(φ2-φ12) /
(1-φ12)
(this is ρ2 of AR(2)),
and for additional lags j = 3,4,...:
φjj = |
|
Plot of partial autocorrelation can reveal the correct order of AR(p) process. In other words, For AR(p), φjj = 0 for j > p. For MA(q), φjj is non-zeros for all j and exhibits a geometrically decaying pattern. For ARMA(p,q), the decay of partial autocorrelations φjj starts after the p-th lag.
The structural identification of a time series includes (1) the minimum order d of differencing required on the sample to achieve stationarity; (2) the appropriate order q of a moving-average process; and (3) the appropriate order p of an autoregressive process for the stationary time series. First of all, a rapidly decline pattern of sample autocorrelation plot or correlogram is needed to ensure a stationary time series for further identification and analysis.
Bartlett Test
Testing H0: φj = 0 for each j > q (no autocorrelation at each lag j longer than q)
based on the Bartlett distribution of the estimated φj.
That is, the estimated φj ~ normal(0,√(Var(φj)) approximately, where
Var(φj) = 1/N
(1 + 2 ∑j=1,...,qφj2).
Box-Pierece Test and Ljung-Box Test
Testing H0: φ1 =... = φk =0 (zero autocorrelation coefficients up to some k lags) based on Box-Pierece Q or Ljung-Box Q' Statistic defined as follows:
Q = N ∑j=1,...,kφj2
Q' = N(N+2) ∑j=1,...,kφj2/(N-j)
Both Q and Q' ~ Chi-Square(k).
Let φjj be the partial autocorrelation coefficient at the j-th lag. That is, φjj = ρj obtained from the autoregressive regression of the AR(j) model:
Yt = δ + ρ1Yt-1 + ρ2Yt-2 + ... + ρjYt-j + εt
If the sample series follows a AR(p) process, then the autoregressive coefficient for each lag j longer than p must be zero.
Testing H0: φjj = 0 for each j > p based on the approximated distribution for the estimated φjj ~ normal(0,1/√N). Alternatively, the standard error of φjj can be obtained from the corresponding estimated autoregressive regression equation for each lag.
In order to use all N data observations, initialization may be needed for the following:
The model may be written in the "inverted" form as
θ(B)-1(-δ+ρ(B)Yt) = εt
where εt ~ ii(0,σ2). Conditional to the historical information (YN, ..., Y1), and data initialization (Y0, ..., Y-p+1), (ε0, ..., ε-q+1), the sum-of-squares is defined by
S = ∑t=1,2,...,Nεt2
Assuming εt ~ nii(0,σ2) for each observation i, the concentrated log-likelihood function is
ll = -N/2 (1+ln(2π)-ln(N)+ln(S))
The conditional maximum likelihood estimators of ρs, θs, and δ are obtained by maximizing the nonlinear function ll. A set of initial values for the parameters ρs and θs are needed to start the iteration of nonlinear model estimation.
Further identification on the estimated residuals, with N-(p+q+1) of observations as the estimated model is an ARMA(p,q) process.
Select the best model according to the Information Criteria:
where K = p+q+1 and N is the sample size used for model estimation.
ρ(B)Yt = δ + θ(B)εt, t=1,2,...N
where
ρ(B) = 1-ρ1B-ρ2B2-...-ρpBp,
θ(B) = 1-θ1B-θ2B2-...-θqBq.
Because μ = ρ(B)-1δ and
ψ(B)
The forecasting model can be represented as:
Yt = μ + ψ1εt-1 + ψ2εt-2 + ... + εt
Given the historical information available at the end of estimation period N,
HN = (Y-p+1,...,Y-1,Y0,Y1,...,YN; ε-q+1,...,ε-1,ε0,ε1,...,εN)
One-step ahead forecast is the conditional expectation of YN+1:
YN(1) = E(YN+1|HN) = μ + ψ1εN + ψ2εN-1 + ...
Compared with the observed YN+1 which is:
YN+1 = μ + εN+1 + ψ1εN + ψ2εN-1 + ...
One-step ahead forecast error is defined by:
εN(1) = YN+1-YN(1) = εN+1
E(εN(1)) = 0
σN2(1) =
Var(εN(1)) =
Var(εN+1) =
σ2
Similarly, two-step ahead forecast is the conditional expectation of YN+2:
YN(2) = E(YN+2|HN) = μ + ψ2εN + ψ3εN-1 + ...
Compared with the observed YN+2 which is:
YN+2 = μ + εN+2 + ψ1εN+1 + ψ2εN + ...
Two-step ahead forecast error is defined by:
εN(2) = YN+2-YN(2) = εN+2 + ψ1εN+1 = (1 + ψ1B)εN+2
E(εN(2)) = 0
σN2(2) =
Var(εN(2)) =
Var((1+ψ1B)εN+2) =
(1+ψ12)σ2 >
σN2(1)
f-step ahead forecast is the conditional expectation of YN+f:
YN(f) = E(YN+f|HN) = | μ + ψfεN + ψf+1εN-1 + ... |
μ, as f → ∞ |
Compared with the observed YN+f which is:
YN+f = μ + εN+f + ψ1εN+f-1 + ψ2εN+f-2 + ...
f-step ahead forecast error is defined by:
εN(f) = YN+f-YN(f) = (1+ψ1B+...+ψf-1Bf-1)εN+f
E(εN(f)) = 0
σN2(f) =
Var(εN(f)) =
(1+ψ12+...+ψf-12)σ2
> ... > σN2(1)
In general, f+1-step ahead forecast is written as
YN(f+1) = E(YN+f+1|HN) = μ + ψf+1εN + ψf+2εN-1 + ...
Compared with the f-step ahead forecast at N+1 (with the historical information HN+1 = (HN,YN+1,εN+1)):
YN+1(f) = E(YN+f+1|HN+1) = μ + ψfεN+1 + ψf+1εN + ...
Then the forecast revision for YN+f+1 is defined by
YN+1(f) - YN(f+1) = ψfεN+1 = ψfεN(1)
Therefore, with additional information available at N+1, the f-step ahead forecast is just the (f+1)-step ahead forecast made at previous date N, adjusted for one-step forecast error εN(1) weighted by the error learning coefficient ψf as follows:
YN+1(f) = YN(f+1) + ψfεN(1)
where Zt is a deterministic fixed variable (e.g., trend, dummy, or step variable). The interpretation of the intervention variable Zt may be of interest.
Denote s the seasonal span (s=4 for quarterly data, s=12 for monthly data), then
Yt = (1-Bs)Zt
ρs(Bs)Yt = δ + θs(Bs)εt
ρs(Bs)ρ(B)Yt = δ + θs(Bs)θ(B)εt
Yt = (1-B)δXt, -1 < δ < 1
where B is the backshift operator, and
(1-B)δ = ∑j=0,1,...,∞ |
| (-B)j |
We write Xt = (1-B)-δYt. If δ > 0, it is indicative of long memory, and δ < 0 a short memory process. When δ = 0, the process is memoryless. It can be shown that Xt is stationary if -1/2 < δ < 1/2. For δ ≥ 1/2, Xt is nonstationary!
Let Y = {Yt}t=-∞,...,∞ be a covariance stationary process with mean μ = E(Yt) and j-th autocovariance γj = E[(Yt-μ)(Yt-j-μ), j=0,1,2.... We assume γj = γ-j, and γj is absolutely summable or ∑j=0,...,∞ |γj| < ∞.
gY(z) = ∑j=-∞,...,∞ γjzj
where z denotes a complex scalar.
We note that a complex number can be represented in a two-dimentional (x,y)-space such as
z = x + y i, where i = √(-1)
or in the equivalent polar coordinates c (radius) and ω (angle):
c = (x2+y2)½
x = c cos(ω)
y = c sin(ω)
z = c [cos(ω) + i sin(ω] =
c eiω
Examples
σ2 | |
gY(z) = | |
ρ(z)ρ(z-1) |
σ2θ(z)θ(z-1) | |
gY(z) = | |
ρ(z)ρ(z-1) |
sY(ω) = gY(e-iω)/(2π) = 1/(2π) ∑j=-∞,...,∞ γje-iωj
where ω is a real number.
sY(ω) is the spectrum or spectral density function for the time series process Y. In other words, for a time series process Y that has the set of autocovariances γj, the spectral density can be computed at any particular value of ω. The spectrum contains no new information beyond that in the autocovariances.
Consider the following facts:
The spectral density function can be simplied as:
sY(ω) = 1/(2π) [γ0+2∑j=1,...,∞ γjcos(ωj)], for ω∈[0,π]
This is a strictly real-valued, continuous function of ω. We have sY(ω) = sY(-ω) and sY(ω) = sY(ω+M2π) for any integer M. That is, sY(ω) is fully defined for ω∈[0,π].
Examples
σ2 θ(e-iω) θ(eiω) | |
sY(ω) = | |
2π |
σ2 | |
sY(ω) = | |
2π ρ(e-iω) ρ(eiω) |
σ2 θ(e-iω) θ(eiω) | |
sY(ω) = | |
2π ρ(e-iω) ρ(eiω) |
There is also a correspondence between the spectrum and the aucovariances:
γj = ∫-ππ sY(ω)eiωjdω = ∫-ππ sY(ω)cos(ωj)dω
In particular, γ0 = ∫-ππ sY(ω)dω = 2 ∫0π sY(ω)dω.
Therefore, spectral analysis can be used to decompose the variance of a time series, which can be viewed as the sum of the spectral densities over all possible frequencies. For example, consider integration over only some of the frequencies:
τ(ωk) = (2/γ0) ∫0ωk sY(ω)dω, where 0 < ωk ≤ p.
Thus, 0 < τ(ωk) ≤ 1 is interpreted as the proportion of the total variance of the time series that is associated with frequencies less than or equal to ωk.
Yt = μ + ∫0π [α(ω) cos(ωt) + δ(ω) sin(ωt)] dω
where α(ω) and δ(ω) are random variables, for any fixed frequency ω in [0 π], with the following properties:
Given an observed sample of N observations {Y1,Y2,...,YN}, using the same notation μ for the sample mean:
μ = ∑t=1,...,NYt/N
we can calculate the sample autocorvariances γj (j = 0,1,2,...,N-1) as follows:
γj = ∑t=j+1,...,N (Yt-μ)(Yt-j-μ)/N
We set γj = γ-j. The sample periodogram is defined by
sY(ω) | = 1/(2π) ∑j=-N+1,...,N-1 γje-iωj |
= 1/(2π) [γ0+2∑j=1,...,N-1 γjcos(ωj)] |
The area under the periodogram is the sample variance of Yt:
γ0 | = ∑t=1,...,N (Yt-μ)2/N |
= ∫-ππ sY(ω)dω | |
= 2 ∫0π sY(ω)dω |
For a time series Yt, we observe N periods. That is, {Y1,Y2,..., YN}. Since a wave cycle is completed in 2π radians. Therefore each period should correspond to 2π/N radians (frequency). We let
ω1 = 2π/N
ω2 = 4π/N
...
ωM = 2Mπ/N
The highest frequency is obtained at M = (N-1)/2. That is, (N-1)π/N < π.
Sample Spectral Representation Theorem
Given any N observations on a time series process {Y1,Y2,...,YN}, there exist frequencies {ω1,ω1,...,ωM} and coefficients μ, {α1,α2,...,αM}, {δ1,δ2,...,δM} such that
Yt = μ + ∑k=1,...,M {αkcos[ωk(t-1)] + δksin[ωk(t-1)]}
where αkcos[ωk(t-1)] is orthogonal to αjcos[ωj(t-1)] for k≠j, δksin[ωk(t-1)] is orthogonal to δjsin[ωj(t-1)] for k≠j, and αkcos[ωk(t-1)] is orthogonal to δjsin[ωj(t-1)] for all k and j. Furthermore,
μ = ∑t=1,...,NYt/N
αk = (2/N) ∑t=1,...,N
Yt cos[ωk(t-1)], k=1,2,...,M
δk = (2/N) ∑t=1,...,N
Yt sin[ωk(t-1)], k=1,2,...,M
The sample variance of Yt can be expressed as
γ0 = ∑t=1,...,N (Yt-μ)2/N = (1/2) ∑k=1,...,M (αk2+δk2)
The portion of the sample variance of Yt that can be attributed to cycles of frequency ωk is given by:
(1/2) (αk2+δk2) = (4π/N) sY(ωk)
where sY(ωk) is the sample perodogram at frequency ωk.
Equivalently,
sY(ωk) | = [N/(8π] (αk2 + δk2] |
= [1/(2πN)] { {∑t=1,...,N Yt cos[ωk(t-1)]}2 + {∑t=1,...,N Yt sin[ωk(t-1)]}2 } |