3  Exponential Smoothing Forecasting

David Gerbing
The School of Business
Portland State University

3.1 Definition

Of the many time series forecasting methods, exponential smoothing is perhaps the most widely used. Exponential smoothing provides a set of weights for computing a forecast as a weighted average of what occurred at previous times. The exponential smoothing algorithm computes the weights to decrease the forecasting error.

Self-adjusting forecast. Adjust the next forecast at Time t+1 to compensate for error in the current forecast at Time t.

If the current forecast \(\hat Y_t\) is larger than the actual obtained value of \(Y_t\), a positive difference, adjust the next forecast \(\hat Y_{t+1}\) smaller than the current forecast. On the contrary, if the current forecast \(\hat Y_{t}\) is too small, a negative difference, then adjust the next forecast \(\hat Y_{t+1}\) upward.

The definition of forecasting error is the same as before, the difference between what is and what was forecasted to be.

Forecasting error: Difference between the actual value of Y and the forecasted value at Time t, \(\; Y_t – \hat Y_t\).

The error in any specific forecast consists of two components. One component is any systematic error inherent in the forecast, systematically under-forecasting or over-forecasting the next value of \(Y\). Exponential smoothing is directed to adjust to this type of forecasting error.

The second type of error inherent in any forecast is purely random. Flip a fair coin 10 times and get six heads. Flip the same fair coin another 10 times and get four heads. Why? Random, unpredictable fluctuation. There is no effective adjustment to such variation. Indeed, trying to adjust to random errors leads to worse forecasting accuracy than doing no adjustments.

As such, there needs to be some way to moderate the adjustment of the forecasting error from the current time period to the next forecast. Specify the extent of self-adjustment from the current forecast to the next forecast with a parameter named \(\alpha\) (alpha).

Smoothing parameter \(\alpha\): Specifies a proportion of the forecasting error according to \(\alpha(Y_t – \hat Y_t), \quad 0 \leq \alpha \leq 1\)

The adjustment made for the next forecast is some proportion of this forecasting error, a value from 0 to 1. Choose a value less than 1, usually considerably less than 1. Adjusting the next forecast by the entire amount of the random error results in the model overreacting in a futile attempt to model the random error component. In practice, \(\alpha\) typically ranges from .1 to .3 or so.

The exponential smoothing forecast for the next time period, \(Y_{t+1}\), is the current forecast, \(Y_t\), plus the adjusted forecasting error, \(\alpha (Y_t - \hat Y_{t})\).

Exponential smoothing forecast: \(\quad \hat Y_{t+1} = Y_t + \alpha (Y_t - \hat Y_{t}), \quad 0 \leq \alpha \leq 1\)

To illustrate, suppose that the current forecast at Time t, \(\hat Y_{t} = 128\), and the actual obtained value is larger, \(Y_t = 133\). Compute the forecast for the next value at Time t+1, with \(\alpha = .3\):

\[\begin{align*} \hat Y_{t+1} &= Y_t + \alpha (Y_t - \hat Y_{t})\\ &= 128 + 0.3(133-128)\\ &= 128 + 0.3(5)\\ &= 128 + 1.5\\ &= 129.50 \end{align*}\]

The current forecast of 128 is 5 below the actual value of 133. Accordingly, partially compensate for this difference from the forecasted and actual values. Raise the new forecast from 129 by .3(133–128) = (.3)(5) = 1.50 to 129.50.

A little algebraic rearrangement of the above definition yields a computationally simpler expression. In practice, this expression generates the next forecast at time t+1 as a weighted average of the current forecast and the forecasting error of the current forecast.

Exponential smoothing forecast computation: \(\quad \hat Y_{t+1} = (\alpha) Y_t + (1 – \alpha) \hat Y_{t}, \quad 0 \leq \alpha \leq 1\)

For a given value of smoothing parameter \(\alpha\), all that is needed to make the next forecast is the current value of Y and the current forecasted value of Y.

To illustrate, return to the previous example with \(\alpha = .3\), the current forecast at Time t, \(\hat Y_{t} = 128\), and the actual obtained value is larger, \(Y_t = 133\). Compute the forecast for the next value at time t+1 as:

\[\begin{align*} \hat Y_{t+1} &= (\alpha) Y_t + (1 – \alpha) \hat Y_t\\ &= (.30)133 + (.70)128\\ &= 39.90 + 89.60\\ &= 129.50 \end{align*}\]

Again, raise the new forecast by .3(133–128) = (.3)(5) = 1.50 to 129.50 to partially compensate for this difference from the forecasted and actual values.

3.2 Smoothing the Past

Why is this model called a smoothing model? The definition of the exponential smoothing model for a given value of \(\alpha\) is expressed only in terms of the current time period t.

The definition of an exponential smoothing forecast in terms of the values at the current time implies a set of weights for each previous time period, an example of moving averages.

To identify these weights, consider the model for the next forecast, based on the current time period, t.

\[\hat Y_{t+1} = (\alpha) Y_t + (1-\alpha) \hat Y_t\]

Now, shift the equation down one time period. Replace t+1 with t, and replace t with t-1.

\[\hat Y_{t} = (\alpha) Y_{t-1} + (1-\alpha) \hat Y_{t-1}\]

We can substitute that expression back into the expression for \(\hat Y_t\) in the previous equation. Apply some algebra to this definition, as shown in the appendix, results in the following weights going back one time periods, for Times t and t+1.

\[\hat Y_{t+1}= (\alpha) Y_t + \alpha (1-\alpha) Y_{t-1} + (1-\alpha)^2 \, \hat Y_{t-1}\]

And, going back two time periods,

\[\hat Y_{t+1} = (\alpha) Y_t + \alpha (1-\alpha) Y_{t-1} + \alpha (1-\alpha)^2 \, Y_{t-2} + (1-\alpha)^3 \, \hat Y_{t-3}\]

In each of the above expressions for the forecast \(\hat Y_{t+1}\), the forecast is a weighted sum of some past time periods plus the forecast for the last time period considered. This pattern generalizes to all existing previous time periods. The following table in Figure 3.1 shows the specific values of the weights over the current and 10 previous time periods for four different values of \(\alpha\). More than 10 previous time periods are necessary for the weights for lower values of \(\alpha\), \(\alpha = .1\) and \(\alpha = .3\), to sum to 1.00.

Figure 3.1: The weights from exponential smoothing models for alpha = .1, .3, .5, .7 for the present value of Y and the previous ten values of Y.

The reason for the word exponential in the name of this smoothing method is shown by Figure 3.2, Figure 3.3, and Figure 3.4 of the smoothing weights for three different values of \(\alpha\). Each set of weights in the following three figures exponentially decreases from the current time period back into previous time periods.


Figure 3.2: Smoothing weights with alpha = .5 for the forecast of the next time period.



Figure 3.3: Smoothing weights with alpha = .3 for the forecast of the next time period.



Figure 3.4: Smoothing weights with alpha = .1 for the forecast of the next time period.

Note that the actual forecast is made according to (\(\alpha\)) \(\hat Y_{t}\) + (1 – \(\alpha\)) \(\hat Y_{t}\), so these weights across the previous time periods are not actually computed to make the forecast, but are implicit in the forecast.

3.3 Select the Value of \(\alpha\)

What value of \(\alpha\) should be chosen for a particular model in a particular setting? Base the choice of \(\alpha\) on some combination of empirical partly theoretical considerations.

3.3.1 Theoretical Basis

The theoretical reason for the choice of the value of \(\alpha\) follows from the previous table and graphs that illustrate the smoothing weights for different values of \(\alpha\).

The larger the value of \(\alpha\), the more relative emphasis placed on the current and immediate time periods.

If the time series is relatively free from random error, then a larger value of \(\alpha\) permits the series to more quickly adjust to any underlying changes. For time series that contain a substantial random error component, however, smaller values of \(\alpha\) should be used so as not to “overreact” to the random sampling fluctuations inherent in the data.

3.3.2 Empirical Basis

How to compare the utility of one value of \(\alpha\) another? The answer is based on the same technique for regression analysis. For each time period there is a forecasted value and an actual value. The difference between the two is the forecasting error. Summarize the extent of this error with the mean squared error, more specifically, its square root: \(\sqrt{MSE}\).

Choose the value of \(\alpha\) that minimizes \(\sqrt{MSE}\).

The discovery of the value of \(\alpha\) that provides this minimization is, of course, generally provided by the exponential smoothing software.

One problem in computing these error indices for an exponential smoothing model is that the current forecast is needed to compute the next forecast. What value should be used for the forecast of the first time period in which no previous forecasts exist? One technique is to set the first forecast equal to the first data value. Another technique is to set the first forecast equal to the average of the first four or five data values.

3.4 More General Models

Refer to the exponential smoothing model described in the previous sections as simple exponential smoothing. Applying the smoothing to the data results in a self-correcting model that adjusts as forecasts are made from the starting time point through the latest time period of the series. However, as shown by the weighted average form of the model in Figure 3.5), the next forecast only depends on the current forecast and current value.

Figure 3.5: Forecasts from simple expontial smoothing.

3.4.1 Simple Exponential Smoothing

The application of these relatively simple models to true forecasting of future events for which the value of Y is not yet known does not provide a means for the model to self-correct. The simple exponential smoothing model only accounts for the level of the forecasting data, the forecast from the last time point for which the value of Y exists.

As this weighted-average version of the exponential smoothing model demonstrates, the model does not provide a means by which to forecast a different value more than a single time point into the future.

Regardless of the form of the time series data, exponential smoothing provides a “flat” forecast function, all forecasts take the same value as the last fitted value of the time series at Time t.

This flatness is fine for data without trend or seasonality, as illustrated in Figure @ref(fig:05) for the stable process data from HW #3. The problem arises when the data do possess trend and/or seasonality. Unfortunately, the simple exponential smoothing forecasts are “blind” to the trend and seasonality in the data.

Figure 3.6 illustrates the unresponsive flatness of the forecast of a simple exponential smoothing model, here applied to the time series data of positive trend and quarterly seasonality.

Figure 3.6: Forecasts from trend with simple expontial smoothing.

3.4.2 Trend

To account for these deficiencies, the simple exponential smoothing models have been further refined.

Holt’s linear trend adaption of exponential smoothing allows for the forecasting of data with a trend.

This method provides for two smoothing equations, one for the level of the time series and one for the trend. As with the simple exponential smoothing model, the level equation forecasts as a weighted average of the current value, with weight \(\alpha\) and the current forecasted value with weight \(1-\alpha\). Now, however, the forecasted value is the level plus the trend.

Similarly, the trend gets its own smoothing parameter, \(\beta\), which follows the same logic as the \(\alpha\) smoothing parameter. The trend, \(\beta\), is a weighted average of the estimated trend at Time t based the previous estimate of the trend. The result is that the forecast now accounts for trend, as shown in Figure 3.7 of the trend and seasonal data.

Figure 3.7: Forecasts from Holt’s method.

3.4.3 Trend and Seasonality

Good to account for trend, and an appropriate method for data without seasonality. Fortunately, the exponential smoothing method has also been extended to account for seasonality.

The Holt and Winters adapation for exponetial smoothing adds a smoothing parameter, gamma, to account for seasonality.

This adaption to exponential smoothing is referred to as the Holt-Winters seasonal method. This method is based on three smoothing parameters and corresponding equations — one for the level, \(\alpha\) (alpha), one for the trend, \(\beta\) (beta), and one for the seasonality, \(\gamma\) (gamma).

Apply the model to trend and seasonal data in Figure 3.8.

Figure 3.8: Forecasts from Holt-Winter’s additive method.

This more general model accounts for the trend and the seasonality. Because the time series is so well structured, that is, a regular pattern with relatively small random error, the forecasts show relatively small prediction intervals.

This Holt-Winters exponential smoothing forecast of this trend and seasonal data contrasts favorably with the linear regression forecast of the same data. The advantage of the Holt-Winters technique is that it does not rely upon the assumption of linearity, and so is more general. As such, it is one of the most widely used forecasting techniques in business forecasting.

3.5 Appendix

The exponential smoothing model for the forecast of the next time period, t+1 is defined only in terms of the current time period t:

\[\hat Y_{t+1} = (\alpha) Y_t + (1-\alpha) \hat Y_t\]

Now, project the model back one time period to obtain the expression for the current forecast \(\hat Y_t\),

\[\hat Y_t = (\alpha) Y_{t-1} + (1-\alpha) \hat Y_{t-1}\]

Now, substitute this expression for \(\hat Y_t\) back into the model for the next forecast,

\[\hat Y_{t+1} = (\alpha) Y_t + (1-\alpha) \, \left[(\alpha) Y_{t-1} + (1-\alpha) \hat Y_{t-1}\right]\]

A little algebra reveals that the next forecast can be expressed in terms of the current and previous time period as,

\[\hat Y_{t+1}= (\alpha) Y_t + \alpha (1-\alpha) Y_{t-1} + (1-\alpha)^2 \, \hat Y_{t-1}\]

Moreover, this process can be repeated for each previous time period. Moving back two time periods from t+1, express the model is expressed as,

\[\hat Y_{t-1} = (\alpha) Y_{t-2} + (1-\alpha) \hat Y_{t-2}\]

Substituting in the value of \(\hat Y_{t-1}\) into the previous expression for \(\hat Y_{t+1}\) yields,

\[\hat Y_{t+1} = (\alpha) Y_t + \alpha (1-\alpha) Y_{t-1} + (1-\alpha)^2 \, \left[(\alpha) Y_{t-2} + (1-\alpha) \hat Y_{t-2}\right]\]

Working through the algebra results in an expression for the next forecast in terms of the current time period and the two immediately past time periods,

\[\hat Y_{t+1} = (\alpha) Y_t + \alpha (1-\alpha) Y_{t-1} + \alpha (1-\alpha)^2 \, Y_{t-2} + (1-\alpha)^3 \, \hat Y_{t-3}\]