❝ Prediction is difficult, especially when dealing with the future. ❞

Danish Proverb

Concept

Exponential smoothing is one of the most widely applied of the many available time series forecasting methods. One of its strengths is that can directly address both additive and multiplicative trends and additive and multiplicative seasonality, and without assuming a linear relationship between time and the variable being forecasted. The result is a general, widely applicable forecasting technique.

What is “smoothing” and why is it “exponential”? These questions are answered below, but first, a review of basic vocabulary that applies to all predictive model-building methods.

Data

From lessR version 4.5.4.

Choose a type of predictive model, such as an exponential smoothing model, and then estimate specific details of that model from the data analysis. Evaluate some aspects of its effectiveness by inputting the data into the estimated model to evaluate how well it reconstructs the data. These procedures have been discussed in more detail in linear regression, especially Section~2.4 on Residuals. These same concepts are briefly reviewed here but applied to time series data.

When developing a model with a single predictor variable, we require two columns in our data table: values of the variable from which to forecast and current values for the variable that oh God sakewe wish to predict future, unknown values. With time series data, the variable from which we seek to forecast, the predictor variable, is Time. Table 1 depicts the general form of the data table.

Table 1: General form of a data table from which to estimate a predictive model from time series data.

Time	y
1	\(y_1\)
2	\(y_2\)
3	\(y_3\)
…	…
n	\(y_n\)

In practice, the time values are usually entered not as numbers but as dates, such as 08/18/2024. These dates can represent data collected daily, weekly, monthly, quarterly, or annually. The variable with values to be predicted is generically referred to as \(y\), a variable such as Sales or Production output. Generically, refer to a specific data value as \(y_t\), the value of \(y\) at Time \(t\).

Training data

Existing data values from which the forecasting model has been estimated.

Error

Submit the data organized as in Table 1 to a computer program that can perform the exponential smoothing forecasting analysis. The analysis results in a model from which a date can be entered, and the corresponding value of \(y\) consistent with the estimated model. For each data value, \(y_t\), there is a corresponding value fitted by the model, \(\hat y_t\).

This is a review of concepts we have already discussed in several past weeks.

Fitted values

A fitted value, \(\hat y_t\), is computed from the model for a given value of the predictor variable, here Time, that specifies what data should occur at the given time.

Given the fitted value obtained from applying the model to any row of the data we have already collected, we can see how close the fitted value matches the actual data value, a fundamental concept in constructing predictive models. How well does the model recover the data from which it was estimated?

Residual or error term

The discrepancy between the actual data value that occurred at Time \(t\) and the corresponding value computed, that is, fitted, by the model, \(e_t = y_t - \hat y_t\).

Table 2 shows the organization of the data table for the \(n\) data values across time and, from the model, the subsequently computed fitted values and error terms for each row of the data table. Also illustrated are the predicted values of the value of \(y\) projected two time periods into the future, \(\hat y_{n+1}\) and \(\hat y_{n+2}\).

Table 2: Data, variables Time and \(y\), and information obtained from the analysis, variables \(\hat y\) and \(e\), with two future predicted values of \(y\) beginning with Time n+1.

Time	y	\(\hat y\)	e
1	\(y_1\)	\(\hat y_1\)	\(e_1\)
2	\(y_2\)	\(\hat y_2\)	\(e_2\)
3	\(y_3\)	\(\hat y_3\)	\(e_3\)
…	…	…	…
n	\(y_n\)	\(\hat y_n\)	\(e_n\)
—-	——-	————-	——–
n+1		\(\hat y_{n+1}\)
n+2		\(\hat y_{n+2}\)

This concept of a residual or error term is fundamental to the development of predictive models, whether through regression analysis, exponential smoothing, or other forecasting techniques. We build predictive models to minimize the errors across the rows of data defined, for each row of data, by the difference of what is true from what the model indicates is true.

Error minimization

Choose the estimated weights (coefficients) of a predictive equation estimated from the data that minimize some function of the error, \(y_t -\hat y_t\), across the data values.

Our goal is to develop a model that explains our data at a given Time, \(t\), by computing a fitted value, \(\hat y_t\), that is reasonably close to the actual data value that has occurred or will occur, \(y_t\). The explanation of existing data is an initial step to evaluate the effectiveness of a model to obtain predictive accuracy on new, currently unknown data. If the model cannot account for what has already occurred, then it certainly cannot account for what is not yet known. Still, when the model is applied to data that have already occurred, there is no forecast because we already know the value of \(y\). A forecast applies to events with unknown values, such as for future events.

Forecasted values

A forecasted value is a fitted value, \(\hat y_t\), computed by the model that estimates an unknown value, when applied to time series data, a future value of the time series.

We need precise terminology to describe statistical models. For prediction from a time series, we need data from past events to estimate the model, but our focus is on the future. To avoid confusion, it’s better to reserve the term “forecast” for predicting unknown values of \(y\), usually future values, from the model. For example, what are the forecasted sales for the next four quarters, \(\hat y_{t+1}\) through \(\hat y_{t+4}\).

Evaluation

Our model is useful only to the extent that it accurately predicts future values. However, the future has not yet arrived, so we cannot yet directly assess the extent of our model’s predictive accuracy against those unknown values. One option is to evaluate how well our model recovers the data we already have. To do this, calculate the error terms that gauge how well the model fits the existing data: The larger the errors, the worse the model’s performance.

One of the more revealing statistics that assesses the model’s fit to the data is the root mean squared error, RMSE, which directly addresses the size of the errors. This concept is explained in more detail with a worked example in linear regression, especially Section 2.5 on Model Fit. RMSE is computed as follows.

Table 3: Conceptual definition of the root mean squared error (RMSE).

	Description	Formula
1.	Calculate the error term for each row of the data table (each time period)	\(e_t = y_t - \hat y_t\)
2.	Square the error term for each row of the data table (each time period)	\(e_t^2 = (y_t - \hat y_t)^2\)
3.	Sum the squared errors over all the rows of the data	\(SSE = \sum_{t=1}^{n_m} (y_t - \hat{y}_t)^2\)
4.	Compute the mean of the squared errors	\(MSE = SSE / n_m\)
5.	“Undo” the squaring to return to the original measurement units by taking the square root of MSE	\(RMSE \ = \; \sqrt{MSE}\)

As a technical note, when computing the mean of the sum of squared errors, we do not divide by \(n\), the total number of data values. Instead, we divide by \(n_m\), defined as the number of fitted values, the total number of data values minus the number of parameters estimated. For example, if the data are collected monthly, then there are 12 separate seasonal parameters to estimate, one for each season. The fitted values would start no earlier than the 13th data value, plus other parameters are computed as well.

Evaluating a model’s predictive accuracy on future data is the only true source of assessing the amount of prediction error, but at least RMSE on the data we already have provides some information for model evaluation. The smaller the RMSE, the better the model fits the data.

Our best guess of predictive accuracy for our forecasting model

This RMSE fit index, among others, suggests how well the model fit the time series data of \(y\).

Fitting the data is a necessary condition for accurate prediction of future data, but more is required. Only after the future values are known can we directly evaluate the accuracy of our predictive model in predicting the unknown. Once we have this new data, we can calculate a more useful version of RMSE to assess the discrepancy between the predicted values of \(y\) and the observed values.

The Smoothing Parameter

Exponential smoothing is a method that calculates a set of weights to forecast the value of a variable as a weighted average of past values. It places significant emphasis on past values, with more distant pasts receiving diminishing influence. What happened two time periods ago has less impact than what happened the previous time period.

The model is estimated by minimizing error, moving through the data value corresponding to the first fitted value, through the last. The exponential smoothing fitted value for the next time period reduces the error compared to the previous fitted value.

Self-adjusting forecast

Adjust the next fitted value in the time series at Time t+1 to compensate for the error in the current fitted value at Time t.

If the current fitted value \(\hat y_t\) is larger than the actual obtained value of \(y_t\), a positive difference, adjust the next fitted value downward. On the contrary, if the current fitted value \(\hat y_{t}\) is too small, a negative difference, then adjust the next fitted value \(\hat y_{t+1}\) upward.

How much should the next fitted value be adjusted? The error in any specific fitted value consists of two components. One component is any systematic error inherent in the forecast, systematically under-estimating or over-estimating the next value of y. Exponential smoothing is directed to adjust to this type of error to compensate for systematic under- or over-estimation.

The second type of error inherent in any forecast is purely random. Flip a fair coin 10 times and get six heads. Flip the same fair coin another 10 times and get four heads. Why? The answer is random, unpredictable fluctuation. There is no effective adjustment to such variation.

Random error is not predictable

Adjusting a forecast by reacting to random errors results in worse forecasting accuracy than making no adjustments.

To account for random error, there needs to be a way to moderate the adjustment of the discrepancy from what occurred to what the model predicts will occur from the current time period to the next fitted value. Specify the extent of self-adjustment from the current fitted value to the next fitted value with a parameter named \(\alpha\) (alpha).

Smoothing parameter \(\alpha\)

Specifies a proportion of the error that should be adjusted for the next fitted value according to \(\alpha(y_t – \hat y_t), \quad 0 \leq \alpha \leq 1\).

The adjustment made for the following fitted value is some proportion of this error, a value from 0 to 1. What value of \(\alpha\) should be chosen for a particular model for a specific setting? Base the choice of \(\alpha\) on some combination of empirical and theoretical considerations.

If the time series is relatively free from random error, then a larger value of \(\alpha\) permits the series to more quickly adjust to any systematic underlying changes. For a time series containing a substantial random error component, however, smaller values of \(\alpha\) should be used to avoid “overreacting” to the larger random sampling fluctuations inherent in the data.

The conceptual reason for choosing the value of \(\alpha\) follows from the previous table and graphs that illustrate the smoothing weights for different values of \(\alpha\).

Choose the value of \(\alpha\)

Choose the value of \(\alpha\) that minimizes RMSE \(= \sqrt{MSE}\).

How does the value of \(\alpha\) affect the estimated model?

Influence of the value of \(\alpha\)

The larger the value of \(\alpha\), the more relative emphasis is placed on the current and immediate time periods.

Usually, choose a value of \(\alpha\) considerably less than 1. Adjusting the next fitted value by the entire amount of the random error results in the model overreacting in a futile attempt to model the random error component. In practice, \(\alpha\) typically ranges from about 0.1 to 0.3.

The exponential smoothing fitted value for the next time period, \(y_{t+1}\), is the current data value, \(y_t\), plus the adjusted error, \(\alpha (y_t - \hat y_{t})\).

Exponential smoothing forecast

\(\quad \hat y_{t+1} = y_t + \alpha (y_t - \hat y_{t}), \quad 0 \leq \alpha \leq 1\).

To illustrate, suppose that the current forecast at Time t, \(\hat y_{t}\) = 128, and the actual obtained value is larger, \(y_t\) = 133. Compute the forecast for the next value at Time t+1, with \(\alpha\) = .3:

\[\begin{align*} \hat y_{t+1} &= y_t + \alpha (y_t - \hat y_{t})\\ &= 128 + 0.3(133-128)\\ &= 128 + 0.3(5)\\ &= 128 + 1.5\\ &= 129.50 \end{align*}\]

The current forecast of 128 is 5 below the actual value of 133. Partially compensate for this difference from the forecasted and actual values: Raise the new forecast from 128 by .3(133–128) = (.3)(5) = 1.50. So the new forecasted value is 128 + 1.5 = 129.50.

A little algebraic rearrangement of the above definition yields a computationally simpler expression. In practice, this expression generates the next forecast at time t+1 as a weighted average of the current forecast and the forecasting error of the current forecast.

Exponential smoothing forecast computation

\(\quad \hat y_{t+1} = (\alpha) y_t + (1 – \alpha) \hat y_{t}, \quad 0 \leq \alpha \leq 1\).

For a given value of smoothing parameter \(\alpha\), all that is needed to make the next forecast is the current value of \(y\) and the current fitted value of \(y\).

To illustrate, return to the previous example with \(\alpha = .3\), the current fitted value at Time t, \(\hat y_{t}\) = 128, and the actual obtained value is larger, \(y_t\) = 133. Fit the next value at time t+1 as:

\[\begin{align*} \hat y_{t+1} &= (\alpha) y_t + (1 – \alpha) \hat y_t\\ &= (.30)133 + (.70)128\\ &= 39.90 + 89.60\\ &= 129.50 \end{align*}\]

Again, raise the new fitted value by .3(133–128) = (.3)(5) = 1.50 to 129.50 to partially compensate for this difference from the forecasted and actual values.

Smoothing the Past

The exponential smoothing model smooths the random errors inherent in each data value. As shown above, an exponential smoothing model expresses the value of \(y\) for the next time period t for a given value of \(\alpha\) only in terms of the current time period. However, some algebraic manipulation reveals that this definition implicitly includes a set of weights for all previous time values.

Moving average

An exponential smoothing fitted value for the next time period implies a set of weights for each previous time period, a moving average.

To identify these weights, consider the model for the next forecast, based on the current time period, t.

\[\hat y_{t+1} = (\alpha) y_t + (1-\alpha) \hat y_t\]

Now, shift the equation down one time period. Replace t+1 with t, and replace t with t-1.

\[\hat y_{t} = (\alpha) y_{t-1} + (1-\alpha) \hat y_{t-1}\]

Substitute that expression back into the expression for \(\hat y_t\) in the previous equation. Apply some algebra to this definition, as shown in the appendix, results in the following weights going back one time periods, for Times t and t+1.

\[\hat y_{t+1}= (\alpha) y_t + \alpha (1-\alpha) y_{t-1} + (1-\alpha)^2 \, \hat y_{t-1}\]

And, going back two time periods,

\[\hat y_{t+1} = (\alpha) y_t + \alpha (1-\alpha) y_{t-1} + \alpha (1-\alpha)^2 \, y_{t-2} + (1-\alpha)^3 \, \hat y_{t-3}\]

In each of the above expressions, the fitted value, and ultimately the forecast, \(\hat y_{t+1}\), is a weighted sum of some past time periods plus the fitted value for the last time period considered. This pattern generalizes to all existing previous time periods. The following table in Figure 1 shows the specific values of the weights over the current and 10 previous time periods for four different values of \(\alpha\). More than 10 previous time periods are necessary for the weights for lower values of \(\alpha\), \(\alpha\) = .1, and \(\alpha\) = .3, to sum to 1.00.

Figure 1: The weights from exponential smoothing models for alpha = .1, .3, .5, .7 for the present value of y and the previous ten values of y.

Figure 2 shows the pattern of weights for three different values of \(\alpha\). The reason for the word exponential in the name of this smoothing method is demonstrated by comparing Figure 2 (a), Figure 2 (b), and Figure 2 (c) according to their respective smoothing weights. Each set of weights in the following three figures exponentially decreases from the current time period back into previous time periods, but at different rates.

(a) Smoothing weights with alpha = .5 for the forecast of the next time period.

As stated, compute the forecast of the value of \(y\) at the next time period only according to the value of the current time period:

\[(\alpha) y_{t} + (1 - \alpha) \hat y_{t}\]

These weights across the previous time periods shown in Figure 2 are not explicitly computed for the forecast, but are implicit in it. Their use would provide the same result if the forecast were computed from all of these previous time periods.

Implementation

To build a forecasting model, understand the data and freely explore different forecasting methods based on trend and seasonality, whether present or not, and whether additive or multiplicative.

Forms of Exponential Smoothing

To adapt to structures other than that of a stable process, consider three primary components for modeling time series data: error, trend, and seasonality. There are two primary types of expressions for each of the three components: additive and multiplicative. Table 4 describes the general characteristics of the resulting six different types of models.

Generality of exponential smoothing

Exponential smoothing is the most general forecasting method we have yet considered. It allows for all possible combinations of additive and multiplicative trend and seasonality, as well as additive and multiplicative error terms, and all without assuming linearity.

These types of models and data have been discussed in Week #2 with the accompanying reading/video. They were also discussed last week, Week #7, in the context of linear regression models.

Table 4: Classification of additive and multiplicative exponential smoothing models.

	Additive	Multiplicative
Error	The average difference between the observed value and the predicted value is constant across different levels of the time series. The error does not depend on the magnitude of the forecasted value.	The average difference between the observed and predicted values is proportional to the level of the forecasted value. As the forecasted value increases or decreases, the error also increases or decreases proportionally.
Trend	The linear trend is upwards or downwards, growing or decreasing at a constant rate, which plots as a line.	The trend component increases or decreases at a proportional rate over time. The result is an upward or downward sloping curve at an accelerating rate.
Seasonal	The intensity of each seasonal effect remains the same throughout the time series, adding or subtracting the same amount from the trend component along the time series.	The intensity of each seasonal effect consistently magnifies or diminishes, adding or subtracting a increasingly larger or smaller amount from the trend component along the time series.

What type of model does your data support? As always, plot your data to uncover any underlying patterns. The problem, as always, is the random noise that influences each data point, which obscures its true structure. Visualizing your time series helps you to discern beyond the noise of any single data point and perceive the underlying structure as a whole. A deeper understanding of this structure facilitates better adjustment of the analytical forecasting technique to align with it.

Specify the Forecasting Model

Identify the specification of trend and seasonality, and the type of error term for a forecasting model, here an exponential smoothing model. It is straightforward to specify various combinations of these models for a given time series data set. Experiment, explore, then choose the most appropriate model.

The exponential smoothing model adopted by lessR is a modern version of exponential smoothing provided by the R package fable. Here is an optional online book with much more detail by its primary author.

lessR forecasting parameters, the control knobs

Use the XY() function because we are plotting points, which are, by default, for a time series, connected with line segments. A time series is identified by having the x-axis variable be a date variable, specifically of R type Date, implicitly and automatically created when data values that represent dates are entered into XY().

The following parameters are the control knobs for creating a forecast. The system can choose these values for you, but you are always free to specify the values that you wish:

“N” for not allow (except for error)
“A” for additive
“M” for multiplicative

However, note that the model components that you specify are only suggestions for this version of exponential smoothing. For example, if you specify no seasonality and there is strong seasonality in the data, the algorithm will override your suggestion to provide a better-fitting model.

ts_ahead: Parameter indicates an exponential smoothing forecast by specifying the number of time units for which to forecast beyond the time of the last data value. By default, the forecast is based on an additive model.

ts_trend: Trend parameter for a regression analysis forecast. Set to "N" to not allow for trend. The default is to allow for trend.

ts_season: Seasonality parameter for a regression analysis forecast. Set to "N" to not allow for seasonality. The default is to allow for seasonality.

ts_error: Error parameter. Can be "A" or "M" (though personal experience is to usually set to additive).

The default type of the XY() forecasting model is exponential smoothing. That is, the default value of the parameter ts_method is "es".

Simple Exponential Smoothing

Refer to the previously described exponential smoothing model as simple exponential smoothing or SES. Applying the smoothing to the data yields a self-correcting model that adjusts for forecasts made from the beginning of the series through the latest time period.

Unfortunately, the simple exponential smoothing model, SES, with its smoothing parameter, \(\alpha\), has limited applicability. The procedure only correctly applies to stable processes, that is, models without trend or seasonality, a model with a stable mean and a stable variability over time.

Simple exponential smoothing forecast

Regardless of the form of the time series data, simple exponential smoothing provides a “flat” forecast function, all forecasted values are equal.

This first example appropriately applies to the SES model with data characterized by a stable mean and stable variability over time.

Data

To illustrate, first read some stable process data into the d data frame. The data are available on the web at:

http://web.pdx.edu/~gerbing/data/StableData.xlsx

Read data into R

d <- Read("http://web.pdx.edu/~gerbing/data/StableData.xlsx")

The lessR function Read() reads data from files in any one of many different formats. In this example, read the data from an Excel data file into the local R data frame (table) named d. The data are then available to lessR analysis functions in that data frame, which is the default data name for the lessR analysis functions. That means that when doing data analysis, the data=d parameter and value are optional.

The data represent monthly measurements of Variable Y3. Here are the first six rows of data.

head(d)

       Month      Y3
1 2019-07-01 49.3042
2 2019-08-01 49.3760
3 2019-09-01 51.3605
4 2019-10-01 50.5292
5 2019-11-01 49.8766
6 2019-12-01 49.7658

Before submitting a forecasting model for analysis, first view the data to understand its general structure, particularly regarding possible trend and seasonality.

lessR visualization of the stable process data for Variable Y3

XY(Month, Y3)

Use the XY() function because we are plotting points, which are, by default, for a time series connected with line segments.

The visualization in Figure 3 suggests a stable system. There’s no discernible trend, the fluctuations around the center are irregular and lack apparent seasonality, and the overall variability of the system appears to remain constant over time.

Decomposition

To better understand the characteristics of the time series for variable Y3 before specifying and estimating a time series model, conduct a formal seasonal and trend decomposition analysis. The objective is to separate the trend component and the seasonal component of the data, and then identify the remaining error that cannot be explained by either trend or seasonality. Figure 4 presents the resulting visualization.

Decomposition of this time series was also presented in last week’s material, Week #7.

lessR decomposition of Variable Y3

STL(Month, Y3)

STL() is the lessR version of the Base R function stl(), enhanced with some color and provided statistics for assessing the strength of the trend, seasonal, and error components of the time series.

When calling the function, specify the x-axis variable first followed by the y-axis variable.

Figure 4: Seasonal and trend decomposition of the time series for variable Y3.

The visual output of the decomposition in Figure 4 consists of four separate plots stacked over each other. The first plot is the data. The seasonal and trend plots follow, plus the extent of the unaccounted-for error called the remainder.

Regardless of the data analyzed, the seasonal and trend decomposition always identifies seasonality and trend. However, the question remains: are the identified effects substantial enough to be meaningful? To answer this, we have both visual and statistical information.

The visual information used to assess the impact of a component is the gold bar at the extreme right of each of the four plots.

Magnification bar

Each gold bar at the right side of each plot in the trend and seasonal decomposition visualization indicates the amount of magnification required to expand that plot as large as possible to fill the allotted space. The larger the bar, the smaller the effect.

For example, the seasonal effect is virtually nonexistent because its corresponding bar is as large as possible given the size of the corresponding narrow plot. Without that large magnification, the plot of the seasonal effect would be tiny.

The statistical output provides values that represent the size of the gold bars, expressed as the proportion of variance in the variable explained by each component. The seasonality component accounts for only 3.0% of the data’s overall variability. The trend component accounts for more, 19.9%, but the dominant component is the error, 63.6%.


Total variance of Y3: 0.5498205
Proportion of variance for components:
  seasonality --- 0.199 
  trend --------- 0.041 
  remainder ----- 0.716

Although the trend shows a small effect in these data, it is likely a sampling fluke in this relatively small data set, particularly when compared to the random error effect. Notice also that the trend is composed of random appearing ups and downs. There is little consistent up or down across annual time periods.

We conclude that the process likely generates random variation about the center over time. The data further support this, demonstrating a constant level of variability.

Visualize the Forecast

Assuming a stable system, apply the stable process forecasting model to the data. We always seek to match the structure of the data to the type of model we submit for analysis. For these data, we do not want our model to attempt to account for neither trend nor seasonality.

lessR simple exponential smoothing of a stable process

XY(Month, Y3, ts_ahead=8, ts_trend="N", ts_season="N")

Use the XY() function because we are plotting points, which are by default, for a time series, connected with line segments. A time series is identified by having the x-axis variable be a date variable, specifically of R type Date, implicitly and automatically created when data values that represent dates are entered into XY().

ts_ahead: Parameter indicates an exponential smoothing forecast by specifying the number of time units for which to forecast beyond the time of the last data value.

ts_trend: Trend parameter for an regression analysis forecast. Set to "N" to not allow for trend. The default is to allow for trend.

ts_season: Seasonality parameter for an regression analysis forecast. Set to "N" to not allow for seasonality. The default is to allow for seasonality.

ts_error: Error parameter. Can be "A" or "M".

To specify a simple exponential, smoothing model, SES, override the defaults and set both ts_trend and ts_season to"N".

The analysis output in Figure 5 consists of four separate visualizations:

data: [black line] from which the model is estimated
model fit: [light red line] the model’s fitted values to the data
forecast: [dark red line] the model’s forecasted future data values
95% prediction interval [light red band about the forecasted values]

Figure 5: Simple exponential smoothing forecast appropriately applied to a stable process.

All the model’s forecasted future values of Y3 are the same: 50.319. The large discrepancy between the data and the model’s fitted values indicates that the model fails to adequately explain the variability of the data. If the inherent variability truly is random, then the model is not incorrect. Instead, given the available information, there is not much structure to account for, making an accurate prediction impossible. The model can isolate underlying structure, but it does not add structure to the data that is not there.

The time series of the fitted values in Figure 5 shows the smoothing effect of the regression analysis model applied to the training data. The model is applied to forecast the next data value in the time series from the previous value, even though both values have already occurred.

The first two data values are well below the center, so the fitted model starts low and increases over time as the data values rise. After this initial recovery, the fitted values show no trend, but the model attempts to capture the non-existent seasonality. After each particularly large observation relative to the rest, the fitted model increases, then decreases in value following a decrease in the data. Without any regular pattern of increasing and decreasing data, the ups and downs of the fitted model are also irregular.

Figure 5 also visualizes the 95% prediction interval for each forecast value.

95% Prediction interval

The estimated range of values that contains 95% of all the future values of forecasted variable \(y\) for a given future value of time.

For this SES model, the 95% prediction interval spans the range of the data. The confidence range grows increasingly larger for forecasted values further in the future.

To be more confident that the prediction interval will contain the future value of \(y\) when it occurs requires a larger prediction interval. At the extreme, for a data value that is in the range of this example, we would be approximately 99.9999999% confident that the data value will fall within the range of -10,000 to 10,000.

Text Output

In addition to the visualization, the precise forecasted values are also available with their corresponding 95% prediction intervals, along with other information. The analysis’s text output follows.

[Interactive chart from the Plotly R package (Sievert, 2020)] 
[with functions from Ryan, Ulrich, Bennett, and Joy's xts package] 
[with functions from Hyndman and Athanasopoulos's, fpp3 packages] 
   -- standard reference: https://otexts.com/fpp3/

Specified model
---------------
   Y3  [with no specifications] 
The specified model is only suggested.
It may differ from the estimated model.

Model to be estimated
---------------------
Y3 ~ error("A") 


Model analysis
--------------
Series: Y3 
Model: ETS(A,N,N) 
  Smoothing parameters:
    alpha = 0.000100004 

  Initial states:
     l[0]
 50.22444

  sigma^2:  0.5594

     AIC     AICc      BIC 
214.7685 215.1970 221.0515 

Mean squared error of fit to data: 0.540711 

Forecast
--------
     Month predicted    lower   upper    width
1 Jul 2024  50.22444 48.75858 51.6903 2.931719
2 Aug 2024  50.22444 48.75858 51.6903 2.931719
3 Sep 2024  50.22444 48.75858 51.6903 2.931719
4 Oct 2024  50.22444 48.75858 51.6903 2.931719
5 Nov 2024  50.22444 48.75858 51.6903 2.931719
6 Dec 2024  50.22444 48.75858 51.6903 2.931719
7 Jan 2025  50.22444 48.75858 51.6903 2.931719
8 Feb 2025  50.22444 48.75858 51.6903 2.931719

The exponential smoothing software provides the value of \(\alpha\) for this minimization, which, for this analysis, is \(\alpha\) = 0.125. Usually, the software allows customization of the \(\alpha\) value, but the computed value is often the recommended one. This value of \(\alpha\) results in the smallest value of RMSE possible for that version of the exponential smoothing model for that data. For example, setting \(\alpha\) at 0.2 results in an RMSE of 0.665. Increasing \(\alpha\) to 0.5 further increases RMSE to 0.771.

Forecasted Values

Find the forecasted values under the predicted column. For the SES model, the forecasted values are equal to one another.

\[\hat y_{2024.Q3} =\hat y_{2024.Q4} = \; ... \; = \hat y_{2026.Q2} = 50.319\]

As indicated, the simple exponential smoothing model accounts only for the level of the data, and is applicable only to data without trend or seasonality. To account for these components, we need to move beyond simple exponential smoothing to more sophisticated models.

More General Forecasting Models

Data

Consider the following data with both trend and seasonality.

http://web.pdx.edu/~gerbing/0Forecast/data/SalesData.xlsx

lessR read data into R

d <- Read("http://web.pdx.edu/~gerbing/data/SalesData.xlsx")

The lessR function Read() reads data from files in one of many different formats. In this example, read the data from an Excel data file into the local R data frame (table) named d. The data in that data frame are then available to the lessR and R analysis functions. For lessR functions, d is the default data name for the lessR analysis functions. When doing data analysis, the data=d expression is optional.

The data represent quarterly measurements of the variable Sales. The dates are listed as individual days, with each date representing the first day of the corresponding quarter. The 16 lines of the data table follow, reported quarterly from the first quarter of 2016 through the last quarter of 2019.

          Qtr Sales
1  2016-01-01  0.41
2  2016-04-01  0.65
3  2016-07-01  0.96
4  2016-10-01  0.57
5  2017-01-01  0.59
6  2017-04-01  1.20
7  2017-07-01  1.53
8  2017-10-01  0.97
9  2018-01-01  0.93
10 2018-04-01  1.71
11 2018-07-01  1.74
12 2018-10-01  1.42
13 2019-01-01  1.36
14 2019-04-01  2.11
15 2019-07-01  2.25
16 2019-10-01  1.74

lessR decomposition of Variable Y3

STL(Qtr, Sales)

STL() is the Base R function stl() with a color enhancement and provides statistics for assessing the strength of the trend, seasonal and error components of the time series.

Figure 6: Seasonal and trend decomposition of the time series for variable Y3.

The data exhibit a strong trend and seasonality.


Total variance of Sales: 0.318705
Proportion of variance for components:
  seasonality --- 0.241 
  trend --------- 0.691 
  remainder ----- 0.023

The data and the relative proportion of trend, seasonality, and error all indicate a strong trend component, a reasonably strong seasonal component, and a much smaller error component.

Trend and Seasonality Model

The data exhibit both trend and seasonality, so now analyze a more appropriate model that explicitly accounts for both characteristics.

Adapt exponential smoothing to trend and seasonality

Add a trend parameter and a seasonality smoothing parameter to the model to account for trend and seasonality in the data and the subsequent forecast.

This method is based on three smoothing parameters and corresponding equations. The smoothing parameter for the level, \(\alpha\) (alpha), has already been in introduced. Additional parameters are also needed for a more general model: one for trend, \(\beta\) (beta), and one for seasonality, \(\gamma\) (gamma).

Apply the model to trend and seasonal data in Figure 7.

lessR exponential smoothing with trend and seasonality

XY(Qtr, Sales, ts_ahead=4, ts_error="A", ts_trend="A", ts_seasons="A")

ts_ahead: Indicate an exponential smoothing forecast by specifying the number of time units for which to forecast beyond the last data value. By default, the forecast is based on an additive model.

Specify an additive model.

Figure 7: The appropriate exponential smoothing forecasts from additive trend and seasonality applied to data with additive trend and seasonality.

With this more sophisticated model, both trend and seasonality extend into the future forecasted values. Accordingly, the fourth quarter tends to be lower in value than the previous quarters. Although there is increasing trend, Quarter #4 forecasted units are less than those forecasted for Quarter #3: \(y_{t+3}=\) 2.636 and \(y_{t+4}=\) 2.190.

This more general model accounts for the trend and the seasonality. Because the time series displays a regular pattern with relatively small random error, the forecasts show relatively small prediction intervals.

The precise fitted values and their corresponding prediction interval follow.

[Interactive chart from the Plotly R package (Sievert, 2020)] 
[with functions from Ryan, Ulrich, Bennett, and Joy's xts package] 
[with functions from Hyndman and Athanasopoulos's, fpp3 packages] 
   -- standard reference: https://otexts.com/fpp3/

Specified model
---------------
   Sales ~ error("A") + trend("A") + season("A") 
The specified model is only suggested.
It may differ from the estimated model.

Model to be estimated
---------------------
Sales ~ error("A") + trend("A") + season("A") 


Model analysis
--------------
Series: Sales 
Model: ETS(A,A,A) 
  Smoothing parameters:
    alpha = 0.1035522 
    beta  = 0.0001489641 
    gamma = 0.0001199852 

  Initial states:
      l[0]      b[0]       s[0]     s[-1]     s[-2]      s[-3]
 0.4001447 0.1015875 -0.2368338 0.3105102 0.2099856 -0.2836619

  sigma^2:  0.0167

       AIC       AICc        BIC 
-14.199847  15.800153  -7.246549 

Mean squared error of fit to data: 0.008353462 

Forecast
--------
      Qtr predicted    lower    upper     width
1 2020 Q1  1.838430 1.585094 2.091765 0.5066713
2 2020 Q2  2.433658 2.178963 2.688352 0.5093884
3 2020 Q3  2.635763 2.379713 2.891812 0.5120988
4 2020 Q4  2.190000 1.932598 2.447401 0.5148027

The mean square error for the fit of the trend and seasonality model to the data is the lowest of all the analyzed models for these data: MSE=0.022. This improvement is fit is apparent in the analysis of the data in Figure 7 compared to the light red line for model fit, which now more closely matches the data. In particular, the high peaks of each season match the corresponding peaks of the fitted model.

Multiplicative Seasonality

As stated, exponential smoothing models can be additive or multiplicative. Here, pursue a multiplicative model.

Data

To illustrate data with multiplicative effects, first read the data into the d data frame. The data are available on the web at:

http://web.pdx.edu/~gerbing/data/MultSeasonsData.xlsx

Read data into R

d <- Read("http://web.pdx.edu/~gerbing/data/MultSeasonsData.xlsx")

The data represent monthly measurements of Variable Y. Here are the first six rows of data.

head(d)

       Month     Y
1 2020-01-01 1.194
2 2020-02-01 1.044
3 2020-03-01 1.184
4 2020-04-01 1.220
5 2020-05-01 1.166
6 2020-06-01 1.059

Before submitting a forecasting model for analysis, first view the data to understand its general structure, shown in Figure 8 for Variable Y.

lessR visualization of the multiplicative data for Variable Y

XY(Month, Y)

Use the XY() function because we are plotting in the coordinate system with an \(x\)-axis and \(y\)-axis. By default, the points in the scatterplot for a time series are connected with line segments.

These data values unequivocally indicate multiplicative seasonality.

Multiplicative seasonality

The seasonal fluctuations are proportional to the level of the time series, so that as the overall level of the series increases or decreases, the magnitude of the seasonal variations correspondingly increases or decreases.

The data indicate a regular pattern of seasonality but with a multiplicative effect. As time increases, the seasonal ups and downs increase as well.

Decomposition

The Base R stl(), on which the lessR function STL() is based, does not work with multiplicative seasonality and so is not applied here.

Visualize the Forecast

The necessity of the proposed exponential smoothing model that accounts for the multiplicity in the data is evident. A default additive model, which assumes a constant seasonality coefficient for each season, is unsuitable for these data. The seasonal influence clearly grows over time, making a multiplicative model the most appropriate option. Attempting to analyze multiplicative data with an additive model will not yield as accurate results as with the proper multiplicative model. Additionally, the estimated seasonal coefficients will not be applicable as they are assumed to be constant for each season.

Analyze the data with the multiplicative model. The estimation algorithm will produce a default model without any specification of the model components. That approach is reasonable, but usually better to experiment with different components, so here we begin with that approach, specifying all three components: error, trend, and seasonality. Usually, specify the error as additive. The visualization appears in Figure 9.

lessR multiplicative exponential smoothing of multiplicative data for Variable Y

XY(Month, Y, ts_ahead=8, ts_error="A", ts_trend="A", ts_seasons="M")

Use the XY() function because we are plotting points, which are, by default, for a time series connected with line segments.

ts_ahead: Indicate an exponential smoothing forecast by specifying the number of time units for which to forecast beyond the last data value. By default, the forecast is based on an additive model.

ts_seasons: Model specification, seasonality. Set to "M" to specify a multiplicative seasonal model in place of the default additive model.

Figure 9: Forecast multiplicative data with a multiplicative model.

The statistical output follows.

[Interactive chart from the Plotly R package (Sievert, 2020)] 
[with functions from Ryan, Ulrich, Bennett, and Joy's xts package] 
[with functions from Hyndman and Athanasopoulos's, fpp3 packages] 
   -- standard reference: https://otexts.com/fpp3/

Specified model
---------------
   Y ~ error("A") + trend("A") + season("M") 
The specified model is only suggested.
It may differ from the estimated model.

Model to be estimated
---------------------
Y ~ error("A") + trend("A") + season("M") 


Model analysis
--------------
Series: Y 
Model: ETS(A,A,M) 
  Smoothing parameters:
    alpha = 0.001852024 
    beta  = 0.0001005077 
    gamma = 0.0001000043 

  Initial states:
     l[0]       b[0]     s[0]     s[-1]     s[-2]     s[-3]     s[-4]     s[-5]     s[-6]    s[-7]   s[-8]    s[-9]
 1.020573 0.01219271 1.087577 0.7948478 0.7258738 0.6859703 0.7618737 0.8878493 0.9867942 1.207149 1.27789 1.233154
   s[-10]   s[-11]
 1.208397 1.142624

  sigma^2:  0.0351

      AIC      AICc       BIC 
 82.57352  93.90685 121.27684 

Mean squared error of fit to data: 0.0272682 

Forecast
--------
     Month predicted    lower    upper     width
1 Jan 2026  2.183907 1.813570 2.554243 0.7406725
2 Feb 2026  2.324193 1.959415 2.688971 0.7295565
3 Mar 2026  2.383227 2.021309 2.745145 0.7238357
4 Apr 2026  2.485509 2.126770 2.844248 0.7174772
5 May 2026  2.363599 1.994768 2.732429 0.7376610
6 Jun 2026  1.944135 1.576952 2.311319 0.7343667
7 Jul 2026  1.762348 1.392833 2.131864 0.7390311
8 Aug 2026  1.517902 1.148957 1.886846 0.7378891

In the multiplicative model with monthly data, the 12 seasonal coefficients represent seasonal factors that are used to obtain the forecast by multiplying the level and trend components. These coefficients are multiplicative instead of additive. Each coefficient represents the relative effect of the corresponding month, interpreted relative to the baseline of 1, which indicates no change from the underlying trend.

Coefficient > 1: Season tends to be above the trend
Coefficient < 1: Season tends to be below the trend
Coefficient = 1: Season follows the trend exactly

For example, the first seasonal coefficient, for January, is \(s_1\) = 1.09. January typically has values slightly higher than the corresponding trend-level. If the trend for January forecasts 100 units, the forecast for January would be 109 units.

The coefficient for the seventh month, March, is below 1, \(s_3\) = 0.73. That is, March typically has values lower than the corresponding trend. If the trend predicts 100 units, the forecast of the value of Y for July would be 73 units.

Appendix

Examining here only the \(\alpha\) smoothing parameter, the exponential smoothing model for the forecast of the next time period, t+1, is defined only in terms of the current time period t:

\[\hat y_{t+1} = (\alpha) y_t + (1-\alpha) \hat y_t\]

Project the model back one time period to obtain the expression for the current forecast \(\hat y_t\),

\[\hat y_t = (\alpha) y_{t-1} + (1-\alpha) \hat y_{t-1}\]

Substitute this expression for \(\hat y_t\) back into the model for the next forecast,

\[\hat y_{t+1} = (\alpha) y_t + (1-\alpha) \, \left[(\alpha) y_{t-1} + (1-\alpha) \hat y_{t-1}\right]\]

A little algebra reveals that the next forecast can be expressed in terms of the current and previous time period as,

\[\hat y_{t+1}= (\alpha) y_t + \alpha (1-\alpha) y_{t-1} + (1-\alpha)^2 \, \hat y_{t-1}\]

Moreover, this process can be repeated for each previous time period. Moving back two time periods from t+1, express the model is expressed as,

\[\hat y_{t-1} = (\alpha) y_{t-2} + (1-\alpha) \hat y_{t-2}\]

Substituting in the value of \(\hat y_{t-1}\) into the previous expression for \(\hat y_{t+1}\) yields,

\[\hat y_{t+1} = (\alpha) y_t + \alpha (1-\alpha) y_{t-1} + (1-\alpha)^2 \, \left[(\alpha) y_{t-2} + (1-\alpha) \hat y_{t-2}\right]\]

Working through the algebra results in an expression for the next forecast in terms of the current time period and the two immediately past time periods,

\[\hat y_{t+1} = (\alpha) y_t + \alpha (1-\alpha) y_{t-1} + \alpha (1-\alpha)^2 \, y_{t-2} + (1-\alpha)^3 \, \hat y_{t-3}\]