Visualize Time Series including Forecasts

Author

David Gerbing

Published

May 5, 2026, 06:18 pm

❝ Prediction is difficult, especially when dealing with the future. ❞

Danish Proverb

Patterns of Time Series Data

Time series data consist of measured values of a variable observed over time, usually at regular intervals, such as every month or every quarter. The variable of interest might be monthly sales revenue, quarterly profit, weekly inventory level, or daily stock price. To forecast the future from the historical time series data, we identify patterns in that data. Understand what happened in the past, and then extend that sane structure into the future.

NoteTime series structures

Within the context of random variation, time series data may remain stable, or may exhibit a trend of steady increase or decrease, and/or a seasonal pattern.

Visualizing time series data, we look for trend and seasonality. If neither is present, the underlying structure is likely stable. However, any observed data value, including a time series data value, reflects two sources: an underlying structure and random variation. The probability of Heads on a coin flip may be 0.5, but if you flip the coin 10 times, you may obtain 6 Heads. Flip the same coin another 10 times, and you may obtain 4 Heads. The underlying probability of a Head has not changed, only the observed outcomes.

Random variation can obscure the underlying time series structure by adding random ups and downs to the observed data. Do the ups and downs form a seasonal pattern, or are they simply random variation? Is a slight upward tilt in the data evidence of a positive trend, or is it due only to chance? These are the questions we need to answer to build accurate forecasting models, to disentangle the signal from the noise.

The following examples illustrate the different types of structure we seek to identify in time series data. Each pattern, however, resembles actual data because it is partially obscured by random variation.

Examples

Stable Model

Figure 1: Stable process with much random error.

The process in Figure 1 is centered over the median, with a fair amount of random error, but no discernible trend or a pattern to the ups and downs about the center line. The fluctuations appear random. A stable process also is characterized by a constant level of variation. There are random fluctuations but the range of fluctuation overall remains the same over time.

Positive Linear Trend

Figure 2: Data recorded from a positive linear trend.

Trend does not have to be linear. If the trend is linear, it plots as a straight line, either with + slope or - slope. The fluctuations about the linear trend in Figure 2 appear random without any apparent pattern, so no seasonality is detected.

Additive Seasonality without Trend

Figure 3: Seasonal process with some random error.

The process illustrated in Figure 3 exhibits much random error that tends to obscure the underlying signal, the additive seasonality.

Multiplicative Seasonality with Slight Trend

Figure 4: Four geomentric growth seasons with slight upward trend.

The oscillations about the slight trend line in Figure 4 follow a regular pattern of
    moderate_up - bigger_down - moderate_up - small_down
so seasonal. But the seasonality becomes more pronounced as time moves forward.

To build a forecasting model, first understand the data and then explore different forecasting methods based on the presence or absence of trend and seasonality. Also consider if any existing component is additive or multiplicative.

The Patterns

To adapt to structures beyond a stable process, consider the three primary components for modeling time series data: error, trend, and seasonality. Each component can be expressed in one of two primary forms: additive or multiplicative. Table 1 summarizes the general characteristics of six common exponential smoothing models formed from different combinations of these components.

NoteGenerality of exponential smoothing

Exponential smoothing is the most general forecasting method considered so far. It accommodates a wide range of model forms, including additive and multiplicative trend and seasonality, as well as additive and multiplicative error, all without assuming linearity.

Table 1: Classification of additive and multiplicative exponential smoothing models.
Additive Multiplicative
Error The typical size of the error remains about the same across different levels of the time series. The error is expressed as a proportion of the predicted value. As the forecasted value increases or decreases, the typical size of the error increases or decreases proportionally.
Trend The trend increases or decreases by a constant amount each time period, which plots as a line. The trend increases or decreases by a constant proportion each time period. The result is an upward or downward sloping curve with a changing rate of growth or decline.
Seasonal The seasonal effect for each season adds or subtracts about the same amount from the trend component throughout the time series. The seasonal effect multiplies the trend component by a seasonal factor, so the amount added or subtracted becomes larger or smaller as the level of the time series changes.

What type of model does your data support? Begin by plotting the data to reveal its underlying pattern.

Use the XY() function to plot the time series because the data are plotted as points along the \(x\) and \(y\) axes, which, for a time series, are connected by line segments by default. A time series analysis is identified when the x-axis variable is a date variable, specifically of R type Date.

This variable type is implicitly and automatically created when a variable with date values such as 4/18/2026 is entered into as the first argument to XY(). See the Appendix for the permissible forms in which dates can be entered beyond the common forms of digital dates.

Where \(x\) is the date variable, and \(y\) is the variable of interest to visualize over time, plot the time series simply by listing the two variables: XY(x,y)

For example, if Sales are recorded quarterly: XY(Qtr, Sales) plots the time series.

Random influences affect each data point, obscuring the true structure of the time series. A graph presents the overall picture, helping you look beyond the variation in individual observations and perceive the pattern as a whole. Understanding this structure supports a better choice of forecasting model by aligning the analytical technique with the behavior of the data.

Build the Forecasting Model

Usually calculate a forecast by obtaining a weight, a numerical constant, for each past time value. The forecast is a weighted sum, computed by multiplying each past time value by its estimated weight and then add together the weighted time values to obtain the forecast. As an example, suppose the estimated weights for the last three data values are 4.9, 3.2, and 2.7. Where \(y_t\) is the last time value in the series, compute a forecast from the last three time values as:

\[ \text{forecasted next value} = 4.9(y_t) + 3.3(y_{t-1}) + 2.7(y_{t-2})\]

Choose a general type of forecasting model, such as an exponential smoothing model for time series data, and then estimate the specific weights of that model from the data.

Data

From lessR version 4.5.5.

To construct a forecasting model with a single predictor variable, specify one column in the data table for the predictor and another column for the variable to be predicted. For time series data, the variable to be forecast, such as Sales, Stock Price, or Inventory Level, is the variable of interest, generically referred to as \(y\). The predictor variable is Time, so the observed data describe the recent history of the variable of interest. Table 2 displays the general form of the data table.

Table 2: General form of a data table used to estimate a forecasting model from past time series data.
Time y
1 \(y_1\)
2 \(y_2\)
3 \(y_3\)
n \(y_n\)

In practice, the time values are usually entered not as simple numbers but as dates, such as 08/18/2024. These dates can represent data collected daily, weekly, monthly, quarterly, or annually. The variable whose values are to be predicted is generically denoted by \(y\), with names in actual applications such as Sales or Production Output. A specific observed historical value is denoted by \(y_t\), the value of \(y\) at Time \(t\).

NoteTraining data

Existing data values from which the forecasting model is estimated (trains).

The data from which the model is estimated are the training data. For time series data, the data describe the past.

Residuals and Forecasting Error

Submit the data organized as in Table 2 to a computer program that performs exponential smoothing analysis.

  1. Estimate the model from the data, which includes assigning numerical values to the weights of previous time values. Then assess how well the model recovers the observed values of \(y\) on which it was trained.
  2. Use the estimated model to predict future values that have not yet occurred. Later, when those values of \(y\) become known, assess the accuracy of the forecasts.

The analysis results in an estimated model from which an estimated value of \(y\) can be computed, labeled as \(\hat y\). For each observed data value, \(y_t\), there is a corresponding fitted value from the model, \(\hat y_t\). We can also apply the model to a future time period to calculate a forecasted value \(\hat y\).

Following is a brief review of concepts for building a forecasting model.

NoteFitted values

A fitted value, \(\hat y_t\), is computed from the model for a given value of the predictor variable, here Time. It is the value of \(y\) that the model indicates for that time.

Given the fitted value obtained by applying the model to any row of the data already collected, we can compare that fitted value to the corresponding observed data value. This comparison is a fundamental concept in building predictive models. How well does the model recover the data from which it was estimated?

NoteResidual

For data used to estimate the model, the residual is the discrepancy between the observed value at Time \(t\) and the corresponding fitted value from the model: \(e_t = y_t - \hat y_t\).

Table 3 shows the organization of the data table for the \(n\) observed data values across time, along with the corresponding fitted values and residuals computed from the model for each row. Also shown are the forecasted values of \(y\) for two future time periods, \(\hat y_{n+1}\) and \(\hat y_{n+2}\).

Table 3: Data, variables Time and \(y\), and information obtained from the analysis, variables \(\hat y\) and \(e\), with two future forecasted values of \(y\) beginning at Time \(n+1\).
Time y \(\hat y\) e
1 \(y_1\) \(\hat y_1\) \(e_1\)
2 \(y_2\) \(\hat y_2\) \(e_2\)
3 \(y_3\) \(\hat y_3\) \(e_3\)
n \(y_n\) \(\hat y_n\) \(e_n\)
—- ——- ————- ——–
n+1 \(\hat y_{n+1}\)
n+2 \(\hat y_{n+2}\)

For data used to estimate the model, the discrepancy between an observed value and its fitted value is a residual. For future values, once those values are observed, the discrepancy between the observed value and its earlier forecast is a forecast error. A forecast applies to an unknown value, typically a future value. Generically, both discrepancies are errors, one applied to existing data and the other to future data.

NoteForecast error

The discrepancy between the observed value at future Time \(t+m\), \(m\) time periods into the future, once it is known, and the corresponding earlier forecasted value from the model: \(e_{t+m} = y_{t+m} - \hat y_{t+m}\).

The concepts of residuals and forecast errors are fundamental to the development of predictive models, whether from regression analysis, exponential smoothing, or other forecasting techniques. We build predictive models to minimize some function of these discrepancies. We want to know the future, not some set of values far from what actually occurs in the future.

Our interest is the future but our data are from the past. We necessarily estimate the model from past data values.

TipError minimization

Choose the estimated weights, or coefficients, of a predictive equation so as to minimize some function of the residuals, \(y_t - \hat y_t\), across the observed data values.

Explaining existing data with a good fitting model is an initial step in evaluating the effectiveness of a model for predicting new, currently unknown data. If the model cannot account reasonably well for what has already occurred, then it is unlikely to predict well what is not yet known.

We need precise terminology to describe statistical models. For time series prediction, historical data are used to estimate the model, but the primary focus is the future. To avoid confusion, reserve the term forecast for predicted unknown values of \(y\), usually future values, obtained from the model. For example, for \(n\) historical data values, the forecasted sales for the next four quarters are \(\hat y_{n+1}\) through \(\hat y_{n+4}\).

Some confusion exists because the notation to describe fitted and forecasted values is the same \(\hat y\). The only notational distinction is to use \(y_t\) to describe one of the historical values, the data, from Time 1, \(y_1\) through to the last data value \(y_n\). Use \(y_{n+m}\) to describe a forecasted value \(m\) time periods into the future.

Evaluation

Our estimated model is useful only to the extent that it accurately predicts future values. However, the future has not yet arrived, so we cannot yet directly assess the extent of our model’s predictive accuracy against those unknown values. Instead, evaluate how well our model matches the data we already have. To do this, begin with the value the model calculates for each past time value. The discrepancy from what actually occurred and what the model model specifies occurred indicates how well the model fits the existing data: The larger these residuals, the worse the model’s performance on recovering the data.

A fundamental statistic for assessing a model’s fit to the data is the root mean squared error, RMSE, which directly reflects the typical size of the errors. The word “root” refers to the “square root”. Compute RMSE as follows.

See RMSE in Section 2.5 for more detail with a worked example.

Table 4: Conceptual definition of the root mean squared residual (or error) (RMSE), the square root of the average squared residual.
Description Formula
1. Calculate the residual for each row of the data table (each time period) \(e_t = y_t - \hat y_t\)
2. Square the residual for each row of the data table (each time period) \(e_t^2 = (y_t - \hat y_t)^2\)
3. Sum the squared residuals over all the rows of the data \(SSE = \sum_{t=1}^{n_m} (y_t - \hat{y}_t)^2\)
4. Compute the mean of the squared residuals \(MSE = SSE / n_m\)
5. “Undo” the squaring to return to the original measurement units by taking the square root of MSE \(RMSE \ = \; \sqrt{MSE}\)

As a technical note, when computing the mean of the sum of squared errors, we do not divide by \(n\), the total number of data values. Instead, we divide by \(n_m\), defined as the number of fitted values, the total number of data values minus the number of parameters estimated. For example, if the data are collected monthly, then there are 12 separate seasonal parameters to estimate, one for each season. The fitted values would start no earlier than the 13th data value, plus other parameters are computed as well.

The smaller the RMSE, the better the model fits the observed data. Evaluating predictive accuracy on future data is the only direct way to assess forecast error, but those future values are not yet known, which is why forecasting is needed. At least RMSE computed from the data already available provides some information for evaluating the model.

If the model cannot recover the data on which it was trained, it will not be able to accurately forecast the future. However, even if the estimated model fits the observed data well, its forecasts will likely contain additional error. One reason is that the model was estimated from that particular data set and is therefore partly tailored to it. As a result, the model will usually not perform as well on entirely new data for which it has not been optimized, including the future values it has not yet observed.

TipOur best guess of predictive accuracy for our forecasting model

This RMSE fit index suggests how well the model fits the time series data of \(y\).

Only after future values become known can we directly evaluate the accuracy of the predictive model for forecasting the unknown. Once these new data are available, we can calculate a more useful version of RMSE to assess the discrepancy between the forecasted values of \(y\) and the corresponding observed values.

Exponential Smoothing

There are many forecasting methods, but one of the most widely applied is exponential smoothing. Some methods are more sophisticated, but exponential smoothing models generally perform well. Moreover, the exponential smoothing method adopted by lessR is a modern, more robust approach than the classic smoothing methods available in much software.

The modern exponential smoothing model adopted by lessR is implemented in the R package fable. An optional online forecasting book is available from its primary author, Rob Hyndman.

Exponential smoothing does not assume a linear relationship between time and the variable being forecast. One of its strengths is its ability to directly model additive and multiplicative trend as well as additive and multiplicative seasonality. The result is a general and widely applicable forecasting technique.

Self-adjusting Weights

What is “smoothing” and why is it “exponential”? Exponential smoothing calculates a set of weights to forecast the value of a variable as a weighted average of past values. More recent past values receive greater weight, and more distant past values receive progressively less weight. What happened two time periods ago has less influence than what happened in the previous time period.

The model is estimated by minimizing error as it moves through the data from the first fitted value to the last. The basic idea of exponential smoothing is that it is self-correcting: the fitted value for the next time period is updated in response to the error from the previous fitted value.

NoteSelf-adjusting forecast

Adjust the next fitted value in the time series at Time t+1 to compensate for the error in the current fitted value at Time t.

If the current fitted value \(\hat y_t\) is larger than the corresponding observed value \(y_t\), then the model has overestimated, and the next fitted value should be adjusted downward. Conversely, if the current fitted value \(\hat y_t\) is smaller than the observed value \(y_t\), then the model has underestimated, and the next fitted value \(\hat y_{t+1}\) should be adjusted upward.

How much should the next fitted value be adjusted? The error in any specific fitted value can be viewed as having two components. One component is systematic error inherent in the forecast, such as a tendency to under-estimate or over-estimate the next value of \(y\). Exponential smoothing is designed to adjust to this type of error and thereby compensate for systematic under- or over-estimation.

The second type of error inherent in any forecast is purely random. Flip a fair coin 10 times and obtain six Heads. Flip the same fair coin another 10 times and obtain four Heads. Why? The answer is random, unpredictable fluctuation. There is no effective adjustment for this random variation.

TipRandom error is not predictable

Adjusting a forecast by reacting to random errors results in worse forecasting accuracy than making no adjustments.

To account for random variation, the adjustment from the current fitted value to the next fitted value should be moderated rather than made in full. The extent of this self-adjustment is specified by a parameter named \(\alpha\) (alpha).

NoteSmoothing parameter \(\alpha\)

Specifies the proportion of the residual to be used to adjust the next fitted value according to \(\alpha(y_t - \hat y_t)\), where \(0 \leq \alpha \leq 1\).

The adjustment made to the next fitted value is some proportion of the current residual, a value between 0 and 1. What value of \(\alpha\) should be chosen for a particular model in a specific setting? Base the choice of \(\alpha\) on a combination of empirical and theoretical considerations.

If the time series is relatively free from random variation, then a larger value of \(\alpha\) allows the series to adjust more quickly to systematic underlying changes. For a time series containing a substantial random variation component, however, smaller values of \(\alpha\) should be used to avoid overreacting to the random fluctuations inherent in the data. Any one random fluctuation is unrelated to the previous random fluctuation or to the next.

Impact of the value of \(\alpha\)

The conceptual basis for choosing the value of \(\alpha\) is clarified by the following table and graphs, which illustrate the smoothing weights for different values of \(\alpha\).

TipChoose the value of \(\alpha\)

Choose the value of \(\alpha\) that minimizes RMSE \(= \sqrt{MSE}\).

How does the value of \(\alpha\) affect the estimated model?

TipInfluence of the value of \(\alpha\)

The larger the value of \(\alpha\), the more relative emphasis is placed on the current and immediate time periods.

Usually, choose a value of \(\alpha\) considerably less than 1. Adjusting the next fitted value by the entire amount of the residual would cause the model to overreact in a futile attempt to track the random variation component. In practice, \(\alpha\) often ranges from about 0.1 to 0.3.

For the observed time series, exponential smoothing updates the fitted value at the next time point, \(\hat y_{t+1}\), from the current fitted value, \(\hat y_t\), plus the adjusted residual, \(\alpha (y_t - \hat y_t)\).

NoteExponential smoothing update equation

\(\quad \hat y_{t+1} = \hat y_t + \alpha (y_t - \hat y_t), \quad 0 \leq \alpha \leq 1\).

To illustrate, suppose that the current fitted value at Time \(t\) is \(\hat y_t = 128\), and the corresponding observed value is larger, \(y_t = 133\). Compute the next fitted value, \(\hat y_{t+1}\), with \(\alpha = 0.3\):

\[\begin{align*} \hat y_{t+1} &= \hat y_t + \alpha (y_t - \hat y_t) \\ &= 128 + 0.3(133 - 128) \\ &= 128 + 0.3(5) \\ &= 128 + 1.5 \\ &= 129.50 \end{align*}\]

The current fitted value of 128 is 5 below the observed value of 133. Partially compensate for this discrepancy by increasing the next fitted value by \(0.3(133 - 128) = 0.3(5) = 1.50\). The next fitted value is therefore \(128 + 1.50 = 129.50\).

After the fitted values have been computed through the last observed time point, \(n\), the same fitted-value notation extends naturally to forecasts beyond the observed data. Thus, \(\hat y_{n+1}\) is the forecast for the next time period, \(\hat y_{n+2}\) is the forecast for two time periods into the future, and more generally \(\hat y_{n+m}\) is the forecast for \(m\) time periods beyond the last observed value \(y_n\).

NoteExponential smoothing forecast computation

\(\quad \hat y_{t+1} = \alpha y_t + (1 - \alpha)\hat y_t, \quad 0 \leq \alpha \leq 1\).

For a given value of the smoothing parameter \(\alpha\), only two quantities are needed to compute the next fitted value: the current observed value, \(y_t\), and the current fitted value, \(\hat y_t\).

Patterns of Exponential Decay

To illustrate, return to the previous example with \(\alpha = 0.3\), current fitted value \(\hat y_t = 128\), and corresponding observed value \(y_t = 133\). Compute the next fitted value, \(\hat y_{t+1}\), as:

\[\begin{align*} \hat y_{t+1} &= \alpha y_t + (1 - \alpha)\hat y_t \\ &= (0.30)(133) + (0.70)(128) \\ &= 39.90 + 89.60 \\ &= 129.50 \end{align*}\]

The next fitted value, and ultimately the forecast beyond the observed series, is a weighted combination of the current observed value and the current fitted value. This pattern generalizes recursively to all previous time periods, so that exponential smoothing can be expressed as a weighted average of current and past observed values, with weights that decline exponentially as the observations become more distant in time.

The following table in Figure 5 shows the specific values of these weights for the current and 10 previous time periods for four different values of \(\alpha\). For smaller values of \(\alpha\), such as \(\alpha = 0.1\) and \(\alpha = 0.3\), more than 10 previous time periods are needed for the weights to sum to 1.00.

Figure 5: The weights from exponential smoothing models for alpha = .1, .3, .5, .7 for the present value of y and the previous ten values of y.

Figure 6 shows the pattern of weights for three different values of \(\alpha\). The word exponential in the name of this smoothing method refers to how the smoothing weights decline across previous time periods. Comparing Figure 6 (a), Figure 6 (b), and Figure 6 (c) shows that, in each case, the weights decrease exponentially as the time periods move farther into the past, but at different rates for different values of \(\alpha\).

This exponential decay indicates how much emphasis is placed on the immediate past relative to the distant past. A larger value of \(\alpha\) assigns relatively greater weight to the most recent observations, whereas a smaller value distributes more weight across older observations as well. Accordingly, larger values of \(\alpha\) lead to quicker adaptation to recent values of the time series, and smaller values lead to greater smoothing across all the values.

(a) Smoothing weights with alpha = .5 for the forecast of the next time period.
(b) Smoothing weights with alpha = .3 for the forecast of the next time period.
(c) Smoothing weights with alpha = .1 for the forecast of the next time period.
Figure 6: Three different rates of exponential decay for three different values of \(\alpha\).

As stated, compute the forecast for the next time period from only two quantities from the current time period:

\[ \hat y_{n+1} = \alpha y_n + (1 - \alpha)\hat y_n \]

The weights across earlier time periods shown in Figure 6 are not computed explicitly in this forecast equation, but they are implicit in it. If the forecast were instead written directly in terms of all previous observed values, these same exponentially declining weights would appear and would yield the same result.

The parameter \(\alpha\) controls the level component of the time series. However, many time series do not remain at a constant level, but instead exhibit trend and/or seasonality. Accordingly, the full exponential smoothing model includes corresponding smoothing parameters for these additional components: for trend, \(\beta\) (beta), and for seasonality, \(\gamma\) (gamma).

Implementation

Specify the Forecasting Model

Using the lessR interface to modern exponential smoothing, it is straightforward to specify various combinations of models with or without trend and/or seasonality for a given time series. Experiment, explore, then choose the most appropriate model.

The following parmeters apply to XY().

ts_ahead: Parameter that indicates to forecast the specified number of time units to forecast beyond the time of the last data value.

The following parameters are the control knobs for specifying a forecasting model. They allow the user to match the type of specified model to the form of the data. The system can choose these values for you, or specify the values that you wish:

  • "N" none, except for error
  • "A" additive
  • "M" multiplicative

However, note that the model components that you specify are only suggestions for this version of exponential smoothing. For example, if you specify no seasonality and there is strong seasonality in the data, the algorithm may override your suggestion to provide a better-fitting model.

ts_trend: Trend parameter for the forecast.

ts_season: Seasonality parameter for an exponential smoothing forecast.

ts_error: Error parameter. Can be "A" or "M", though additive error is usually a good default.

The default type of the XY() forecasting model is exponential smoothing. That is, the default value of the parameter ts_method is "es".

Stable Process

If the process is stable, there is no trend or seasonality.

TipSimple exponential smoothing forecast

For a stable process, exponential smoothing produces a flat forecast function: all forecasted values are equal.

Consider the following example.

Data

To illustrate, first read stable process data into the d data frame. The data are available on the web at:

http://web.pdx.edu/~gerbing/data/StableData.xlsx

d <- Read("http://web.pdx.edu/~gerbing/data/StableData.xlsx")

The lessR function Read() reads data from many different file formats. In this example, it reads data from an Excel file on the web into the local R data frame named d. Because d is the default data frame for lessR analysis functions, the data=d parameter is optional.


The data represent monthly measurements of Variable Y3. Here are the first six rows of data.

head(d)
       Month      Y3
1 2019-07-01 49.3042
2 2019-08-01 49.3760
3 2019-09-01 51.3605
4 2019-10-01 50.5292
5 2019-11-01 49.8766
6 2019-12-01 49.7658

Before submitting a forecasting model for analysis, first view the data to understand its general structure, particularly regarding possible trend and seasonality.

XY(Month, Y3)

Use the XY() function because we are plotting points in the two-dimensional \(x-y\) coordinate space. When X is a variable of type Date, by default, a time series with connected line segments is displayed.

Figure 7: Stable process data.

The visualization in Figure 7 suggests a stable process. There is no discernible trend, the fluctuations around the center are irregular and lack apparent seasonality, and the overall variability of the system appears to remain constant over time.

Visualize the Forecast

Assuming a stable process, apply the corresponding forecasting model to the data. We always seek to match the structure of the data to the type of model submitted for analysis. For these data, we do not want the model to attempt to account for either trend or seasonality.

XY(Month, Y3, ts_ahead=8)

Let the algorithm detect the type of model.

The analysis output in Figure 8 consists of four separate visualizations:

  • data: [black line] from which the model is estimated
  • model fit: [light red line] the model’s fitted values to the data
  • forecast: [dark red line] the model’s forecasted future data values
  • 95% prediction interval [light red band about the forecasted values]
Figure 8: Simple exponential smoothing forecast appropriately applied to a stable process.

The model correctly detected the stability, without trend and without seasonality. All of the model’s forecasted future values of Y3 are the same: 50.319. The large discrepancy between the observed data values and the model’s fitted values indicates that the model does not explain much of the variability in the data. If the inherent variability truly is random, then the model is not incorrect. Instead, given the available information, there is little structure to model, making accurate prediction difficult. The model can isolate underlying structure, but it cannot add structure that is not present in the data.

The first two data values are well below the center of the series, so the fitted values start low and then increase as subsequent data values rise. After this initial recovery, the fitted values show no trend, consistent with the specified stable-process model. Each unusually large observation pulls the fitted values upward, followed by a decrease when subsequent observations are lower. Because the data show no regular pattern of increases and decreases, the corresponding ups and downs of the fitted values are also irregular.

Figure 8 also visualizes the 95% prediction interval for each forecast value.

Note95% Prediction interval

The estimated range of values that contains 95% of all the future values of forecasted variable \(y\) for a given future value of time.

For this stable-process model, the 95% prediction interval spans most of the range of the observed data. The prediction interval becomes increasingly wider for forecasted values further into the future.

Greater confidence that the prediction interval will contain a future value of \(y\) requires a wider interval. At the extreme, for data values in the range of this example, we would be nearly certain that the future value would fall within the interval from -10,000 to 10,000. However, such an interval would be too wide to provide a useful forecast.

Text Output

In addition to the visualization, the precise forecasted values are available with their corresponding 95% prediction intervals, along with other model information. The text output from the analysis follows.

[Interactive chart from the Plotly R package (Sievert, 2020)] 
[with functions from Ryan, Ulrich, Bennett, and Joy's xts package] 
[with functions from Hyndman and Athanasopoulos's, fpp3 packages] 
   -- standard reference: https://otexts.com/fpp3/

Specified model
---------------
   Y3  [with no specifications] 
The specified model is only suggested.
It may differ from the estimated model.

Model to be estimated
---------------------
Y3 ~ error("A") 


Model analysis
--------------
Series: Y3 
Model: ETS(A,N,N) 
  Smoothing parameters:
    alpha = 0.000100004 

  Initial states:
     l[0]
 50.22444

  sigma^2:  0.5594

     AIC     AICc      BIC 
214.7685 215.1970 221.0515 

Mean squared error of fit to data:  0.540711
Root mean squared error (RMSE) fit: 0.7353305 

Forecast
--------
     Month predicted    lower   upper    width
1 Jul 2024  50.22444 48.75858 51.6903 2.931719
2 Aug 2024  50.22444 48.75858 51.6903 2.931719
3 Sep 2024  50.22444 48.75858 51.6903 2.931719
4 Oct 2024  50.22444 48.75858 51.6903 2.931719
5 Nov 2024  50.22444 48.75858 51.6903 2.931719
6 Dec 2024  50.22444 48.75858 51.6903 2.931719
7 Jan 2025  50.22444 48.75858 51.6903 2.931719
8 Feb 2025  50.22444 48.75858 51.6903 2.931719

The exponential smoothing software estimates the value of \(\alpha\) that minimizes the forecast error for the specified model and data. For this analysis, \(\alpha = 0.0001\), and extremely small value, which means that all data values were given approximately the same weight.

Under the output heading, “Model to be estimated”, find only an error specification with no trend or seasonality, formally confirming the estimated model is of a stable process.

Forecasted Values

Find the forecasted values under the predicted column. For the stable process model, all forecasted values are equal.

\[\hat y_{2024.Q3} = \hat y_{2024.Q4} = \; \cdots \; = \hat y_{2026.Q2} = 50.319\]

As indicated, the simple exponential smoothing model accounts only for the level of the data and is applicable to data without trend or seasonality. To account for either of these components, we move to more sophisticated exponential smoothing models.

Beware of Overfitting

The visualized data in Figure 8 are simulated as a stable process. In this situation, we know the true underlying process, and we see that the exponential smoothing algorithm correctly identified in the process of stability. Now, suppose as in an actual data analysis that you do not know the underlying process, and want to see if you can obtain better fit by specifying a different model. Consider the model and imposes both trend and seasonality on this stable data.

XY(Month, Y3, ts_ahead=8, ts_trend="A", ts_seasons="A")

Figure 9: Exponential smoothing forecast with inappropiate trend and seasonality applied to a stable process.

The obtained fit from this model is noticeably improved over the stable process model with no trend and seasonality. The RMSE decreases from 0.735 for the stable model to 0.671 for the more sophisticated model with trend and seasonality. But wait! The correct model is the stable model. How do we get better fit by imposing the wrong model?

The answer is random sampling error and building a model that fits that random sampling error, taking advantage of chance. The model imposed the conditions that we specified: trend and seasonality. However, since those patterns are known not to exist in the data, whatever trend in seasonality the model detected is only due to serious, non-generalizable, non-reproducible aspects of this given sample. The results will never generalize.

NoteOverfitting a model to the data

A forecasting model follows the random ups and downs of the observed time series too closely, treating noise as if it were meaningful structure.

The problem of overfitting means that you must be aware of the general pattern in your data before fitting a forecasting model. Naively pursuing a better fit can produce a model that fits this particular data sample more closely, but does not generalize to future data. In this context, the future data are the observations you wish to forecast.

For example, a model with trend and seasonality may fit the observed data better than a stable-process model with no trend and no seasonality. However, if the apparent trend and seasonality reflect random variation rather than true structure, then the more complex model is overfit.

WarningAvoid overfitting

An overfit model can fit the observed data more closely while producing worse forecasts, increasing forecasting error for future observations.

Blindly pursuing statistical fit without considering the structure of the data can lead to worse forecasts than using a simpler model that fits the observed data less closely. Better fit may lead to more forecasting error.

More General Forecasting Models

Data

Consider the following data with both trend and seasonality.

http://web.pdx.edu/~gerbing/data/SalesData.xlsx

d <- Read("http://web.pdx.edu/~gerbing/data/SalesData.xlsx")

In this example, read the data from an Excel data file into the local R data frame (table) named d.

The data represent quarterly measurements of the variable Sales. The dates are listed as individual days, with each date representing the first day of the corresponding quarter. The 16 lines of the data table follow, reported quarterly from the first quarter of 2016 through the last quarter of 2019.

d
          Qtr Sales
1  2016-01-01  0.41
2  2016-04-01  0.65
3  2016-07-01  0.96
4  2016-10-01  0.57
5  2017-01-01  0.59
6  2017-04-01  1.20
7  2017-07-01  1.53
8  2017-10-01  0.97
9  2018-01-01  0.93
10 2018-04-01  1.71
11 2018-07-01  1.74
12 2018-10-01  1.42
13 2019-01-01  1.36
14 2019-04-01  2.11
15 2019-07-01  2.25
16 2019-10-01  1.74

Trend and Seasonality Model

The data exhibit both trend and seasonality, so now analyze a model that appropriately and explicitly accounts for both characteristics.

NoteAdapt exponential smoothing to trend and seasonality

Add a trend component and a seasonal component to the model to account for trend and seasonality in both the fitted values and the subsequent forecast.

Apply the model to trend-and-seasonal data. First, run the model with no explicit model specifications.

XY(Qtr, Sales, ts_ahead=4)

ts_ahead: Parameter that indicates an exponential smoothing forecast by specifying the number of time units to forecast beyond the last data value.

Figure 10: The appropriate exponential smoothing forecasts letting the algorithm choose the specification.

As Figure 10 shows, the forecasted values are under-predicted from what would occur if trend was properly accounted for. The data are characterized by relatively strong linear positive trend. This conclusion is verified from the text output, as the algorithm detected additive seasonality miss the trend component.

Model to be estimated
---------------------
Sales ~ error("A") + season("A")

Model fit is indicated by an RMSE of 0.156.

XY(Qtr, Sales, ts_ahead=4, ts_error="A", ts_trend="A", ts_seasons="A")

Instead, specify a fully additive model with an explicit additive seasonal component, forecasting ahead four quarters.

Figure 11: The appropriate exponential smoothing forecasts from additive trend and seasonality applied to data with additive trend and seasonality.

With this more sophisticated model, both trend and seasonality extend into the forecasted future values. Accordingly, the fourth quarter tends to have a lower value than the preceding quarters. Although the trend is increasing, the forecasted value for Quarter 4 is less than the forecasted value for Quarter 3: \(\hat y_{t+3} = 2.636\) and \(\hat y_{t+4} = 2.190\).

This more general model accounts for both trend and seasonality. Because the time series displays a regular pattern with relatively small random error, the forecasts have relatively narrow prediction intervals.

The precise fitted values and their corresponding prediction intervals follow.

[Interactive chart from the Plotly R package (Sievert, 2020)] 
[with functions from Ryan, Ulrich, Bennett, and Joy's xts package] 
[with functions from Hyndman and Athanasopoulos's, fpp3 packages] 
   -- standard reference: https://otexts.com/fpp3/

Specified model
---------------
   Sales ~ error("A") + trend("A") + season("A") 
The specified model is only suggested.
It may differ from the estimated model.

Model to be estimated
---------------------
Sales ~ error("A") + trend("A") + season("A") 


Model analysis
--------------
Series: Sales 
Model: ETS(A,A,A) 
  Smoothing parameters:
    alpha = 0.1035522 
    beta  = 0.0001489641 
    gamma = 0.0001199852 

  Initial states:
      l[0]      b[0]       s[0]     s[-1]     s[-2]      s[-3]
 0.4001447 0.1015875 -0.2368338 0.3105102 0.2099856 -0.2836619

  sigma^2:  0.0167

       AIC       AICc        BIC 
-14.199847  15.800153  -7.246549 

Mean squared error of fit to data:  0.008353462
Root mean squared error (RMSE) fit: 0.09139728 

Forecast
--------
      Qtr predicted    lower    upper     width
1 2020 Q1  1.838430 1.585094 2.091765 0.5066713
2 2020 Q2  2.433658 2.178963 2.688352 0.5093884
3 2020 Q3  2.635763 2.379713 2.891812 0.5120988
4 2020 Q4  2.190000 1.932598 2.447401 0.5148027

The root mean squared error (RMSE) for the fit of the trend-and-seasonality model improves to 0.091. This improvement in fit is apparent in Figure 11: the light red fitted line now more closely follows the observed data. In particular, the high peaks in each seasonal cycle align closely with the corresponding peaks of the fitted model.

In this situation, by understanding that our data exhibit both trend and seasonality, providing a more complex model both improves the model’s fit to the data, and, if the same process continues into the future, will also provide more accurate forecasts. To understand when additional model complexity overfits the data, worsening forecasting accuracy, and when it improves forecasting accuracy, depends on our understanding of the structure of our time series data.

Multiplicative Seasonality

As stated, exponential smoothing models can be additive or multiplicative. Here, the data are best fit with a multiplicative model.

Data

To illustrate data with multiplicative effects, first read the data into the d data frame. The data are available on the web at:

http://web.pdx.edu/~gerbing/data/MultSeasonsData.xlsx

d <- Read("http://web.pdx.edu/~gerbing/data/MultSeasonsData.xlsx")

In this example, read the data from an Excel data file into the local R data frame (table) named d.


The data represent monthly measurements of Variable Y. Here are the first six rows of data.

head(d)
       Month     Y
1 2020-01-01 1.194
2 2020-02-01 1.044
3 2020-03-01 1.184
4 2020-04-01 1.220
5 2020-05-01 1.166
6 2020-06-01 1.059

Before submitting a forecasting model for analysis, as always, first view the data to understand its general structure, shown in Figure 12 for Variable Y.

XY(Month, Y)

Figure 12: Data with multiplicative seasonality.

These data values unequivocally indicate multiplicative seasonality with linear trend. As time increases, the seasonal ups and downs increase as well.

NoteMultiplicative seasonality

The seasonal fluctuations are proportional to the level of the time series, so that as the overall level of the series increases or decreases, the magnitude of the seasonal variations correspondingly increases or decreases.

Visualize the Forecast

The seasonal influence clearly grows over time, making a multiplicative model the most appropriate option. Attempting to analyze multiplicative data with an additive model will not yield as accurate results as with the proper multiplicative model. Additionally, the estimated seasonal coefficients will not be applicable as they are assumed to be constant for each season.

First, run the model with no specification to see what model the algorithm detects.

XY(Month, Y, ts_ahead=8)

Figure 13: Forecast multiplicative data with a multiplicative model.

The algorithm correctly chose additive trend and seasonal multiplicity, but it also chose a multiplicative error term. From the output:

Model to be estimated
---------------------
Y ~ error("M") + trend("A") + season("M") 

The fit index RMSE is 0.179 for that model. Lack of optimal fit is indicated by the model’s poor fit to the last two seasons, with fitted values lower than they should be. Extending this inadequate fit into the future as a forecast does not properly account for the positive trend inherent in the data, producing a lower forecasted maximum than the last season of data.

The data do indicate linear positive trend with multiplicative seasonality, so the only remaining parameter to change is the error term. Again, fully specify the model, this time with additive error. The visualization from the fully specified model appears in Figure 14.

XY(Month, Y, ts_ahead=8, ts_error="A", ts_trend="A", ts_seasons="M")

ts_ahead: Indicate an exponential smoothing forecast by specifying the number of time units for which to forecast beyond the last data value. By default, the forecast is based on an additive model.

ts_error: Specify additive error.

ts_trend: Specify additive trend.

ts_seasons: Set to "M" to specify a multiplicative seasonal model in place of the default additive model.

Figure 14: Forecast multiplicative data with a multiplicative model.

The statistical output follows.

[Interactive chart from the Plotly R package (Sievert, 2020)] 
[with functions from Ryan, Ulrich, Bennett, and Joy's xts package] 
[with functions from Hyndman and Athanasopoulos's, fpp3 packages] 
   -- standard reference: https://otexts.com/fpp3/

Specified model
---------------
   Y ~ error("A") + trend("A") + season("M") 
The specified model is only suggested.
It may differ from the estimated model.

Model to be estimated
---------------------
Y ~ error("A") + trend("A") + season("M") 


Model analysis
--------------
Series: Y 
Model: ETS(A,A,M) 
  Smoothing parameters:
    alpha = 0.001852024 
    beta  = 0.0001005077 
    gamma = 0.0001000043 

  Initial states:
     l[0]       b[0]     s[0]     s[-1]     s[-2]     s[-3]     s[-4]     s[-5]     s[-6]    s[-7]   s[-8]    s[-9]
 1.020573 0.01219271 1.087577 0.7948478 0.7258738 0.6859703 0.7618737 0.8878493 0.9867942 1.207149 1.27789 1.233154
   s[-10]   s[-11]
 1.208397 1.142624

  sigma^2:  0.0351

      AIC      AICc       BIC 
 82.57352  93.90685 121.27684 

Mean squared error of fit to data:  0.0272682
Root mean squared error (RMSE) fit: 0.1651309 

Forecast
--------
     Month predicted    lower    upper     width
1 Jan 2026  2.181914 1.810778 2.553051 0.7422738
2 Feb 2026  2.326258 1.953885 2.698630 0.7447442
3 Mar 2026  2.383534 2.019548 2.747519 0.7279710
4 Apr 2026  2.484788 2.118977 2.850599 0.7316220
5 May 2026  2.366205 1.999690 2.732719 0.7330285
6 Jun 2026  1.941396 1.573962 2.308830 0.7348684
7 Jul 2026  1.759112 1.393425 2.124799 0.7313742
8 Aug 2026  1.525364 1.155389 1.895339 0.7399501

Now the fit has improved compared to the initial unspecified model, with RMSE down to 0.165. Visually, the forecasted values appear to better extend the structure of the historical data. For the last two seasons of data, we still have some underestimation by the fitted model for the last two seasons, but the fit has improved and the underestimation is not as severe.

Again, we need to be familiar with the structure of our data and visually verify that its structure projects into the future. Data visualization becomes a central aspect of time series forecasting. The estimation algorithm can help us specify a model, but the ultimate responsibility for model specification remains with us.

Seasonal Coefficients

In the multiplicative model with monthly data, the 12 seasonal coefficients represent seasonal factors that are used to obtain the forecast by multiplying the level and trend components. These coefficients are multiplicative instead of additive. Each coefficient represents the relative effect of the corresponding month, interpreted relative to the baseline of 1, which indicates no change from the underlying trend.

  • Coefficient > 1: Season tends to be above the trend
  • Coefficient < 1: Season tends to be below the trend
  • Coefficient = 1: Season follows the trend exactly

For example, the first seasonal coefficient, for January, is \(s_1\) = 1.09. January typically has values slightly higher than the corresponding trend-level. If the trend for January forecasts 100 units, the forecast for January would be 109 units.

The coefficient for the seventh month, March, is below 1, \(s_3\) = 0.73. That is, March typically has values lower than the corresponding trend. If the trend predicts 100 units, the forecast of the value of Y for July would be 73 units.

Time Units

This section contains useful information about plotting a time series:

  • aggregating over time units to display a time series in a larger time unit than the entered date information
  • formats by which dates can be entered to be converted to a variable of type Date

Aggregate Over Time

Suppose your data are recorded daily, but you wish to analyze quarterly sales. Visualizing the data as they exist shows the time series of daily sales. To visualize the time series of sales by quarter, sum the sales for all the days in each quarter.

NoteAggregation

Compute a statistic such as a sum or a mean over a range of data, which, for time series data, is over a time unit of days, weeks, months, quarters, or years.

Aggregate by Sums

Consider three variables in the Sales data table: OrderDate, Sales, and Profit.

d <- Read("https://dgerbing.github.io/data/Sales.csv")
head(d)

   OrderDate Sales Profit
1 2022-01-01  0.61   0.42
2 2022-01-02  0.91   0.51
3 2022-01-02  1.00   0.17
4 2022-01-02  1.49  -2.30
5 2022-01-02  1.02  -3.40
6 2022-01-02  0.83  -0.50

The resulting d data frame is reasonably large, with 4,315 rows of data reporting 4,315 individual sales. As shown in the data for these three variables in the first 10 rows, sales are reported daily, with multiple sales per day. For example, on January 4, 2021, there were three sales for $3.54, $11.78, and $272.74. Their sum represents the total sales for that day.

There are multiple orders per day, so the time series plot of the original data is not what would be typically desired. At least, the sales data needs to be collapsed and aggregated to a daily basis by summing all sales revenue for each day. For example, the sum of the three sales for January 4, 2021 is $288.06, the value plotted for that date. The aggregated daily sales data provide the data needed to plot the daily time series shown in Figure 15.

XY(OrderDate, Sales, ts_unit="days")

To aggregate with the lessR function XY(), access the first and possibly the second of the following parameters.

ts_unit: Specify a value that is longer than the natural time intervals in the data. Possible values are days, weeks, months, quarters, and years. For example, if each sale were recorded with its date, then a value of days would aggregate sales by day, yielding a daily time series of sales.

ts_agg: Aggregate by sums or means over the time units. The default value is "sum".


Best guess for the date format: %Y-%m-%d
If this format is wrong, specify with parameter: ts_format
To see all possible formats, enter: ?strptime
Examples:  "08/18/2024" format is "%m/%d/%Y"
           "18-08-24"   format is "%d-%m-%y"
           "August 18, 2024" format is "%B %d, %Y"
Figure 15: Sales data aggregated by day.

However, even after sales are aggregated by day, the data remains too detailed for reporting a daily time series over three years. For example, seasonality is difficult to discern from the visualization of the daily time series. Instead, aggregate the data further by some larger time unit, such as quarters.

XY(OrderDate, Sales, ts_unit="quarters")

ts_unit: Parameter to specify the time unit for the aggregation. Based on functions from the xts package, currently implemented valid values include "days", "weeks", "months", "quarters", and "years".

ts_agg: Parameter that specifies the arithmetic operation of the aggregation. The default value is "sum", so no need to specify in this function call.

When aggregating sales by summing over consecutive quarters, the overall upward trend in sales is evident, as are the consistent seasonal fluctuations, with maximum sales in Q4 each year, as shown in Figure 16.


Best guess for the date format: %Y-%m-%d
If this format is wrong, specify with parameter: ts_format
To see all possible formats, enter: ?strptime
Examples:  "08/18/2024" format is "%m/%d/%Y"
           "18-08-24"   format is "%d-%m-%y"
           "August 18, 2024" format is "%B %d, %Y"
Figure 16: Sales data aggregated by quarters.

Aggregate by Means

Consider the StockPrice monthly data table with Apple stock prices.

d <- Read("StockPrice")
head(d)
         Month Company      Price    Volume
1   1985-01-01   Apple 0.09530602 175302400
23  1985-02-01   Apple 0.09787019 137737600
42  1985-03-01   Apple 0.08504876 247430400
63  1985-04-01   Apple 0.07393682 114060800
84  1985-05-01   Apple 0.07137269  57344000
106 1985-06-01   Apple 0.05470512 576016000

In this example, we wish to aggregate by mean rather than by sum. In the Sales data, each original data point, a recorded sales, is a part of the overall whole, such as a part of daily sales or monthly sales. To get the full daily or monthly sales, we sum the sales over that period.

However, for stock price, each monthly price indicates a value for that time unit. To aggregate, we want the average stock price over the given time period to represent the stock’s value during that period. In this example, focus on Apple’s average quarterly stock price as in Figure 17.

XY(Month, Price, filter=(Company=="Apple"),
            ts_unit="quarters", ts_agg="mean")

filter: Parameter to specify the logical condition for selecting rows of data for the analysis.

ts_agg: Parameter to specify the arithmetic operation for which to aggregate over time. The default value is "sum", so explicitly specify the "mean" aggregation.

pt_size: Optional parameter to specify the size of the plotted points. By default, when plotting a time series with lessR, the point size is 0. Set a positive number to visualize the plotted points, which are connected by line segments by default.

Figure 17: Time series of Apple stock price aggregated by quarters.

Or, step back further and visualize Apple’s stock growth on an annual basis, shown in Figure 18.

XY(Month, Price, filter=(Company=="Apple"),
            ts_unit="years", ts_agg="mean")

Figure 18: Time series of Apple stock price aggregated by years.

Date Formats

lessR does automatic date conversion for the following five digital date formats. When the numerical date values are read as character strings, such as from a ‘.csv’ file, XY() will implicitly convert the characters to an R variable of type Date. Expressing the year with all four digits is recommended, though not always necessary. The following examples use the hyphen, -, delimiter. The slash, /, and period, ., can also be used as delimiters to specify dates.

  • 2024-08-18: Four-digit year, one- or two-digit month, one- or two-digit day
  • 08-18-2024: One- or two-digit month, one- or two-digit day, four-digit year
  • 08-18-24: One- or two-digit month, one- or two-digit day, two-digit year
  • 18-08-2024: One- or two-digit day, one- or two-digit month, four-digit year
  • 18-08-24: One- or two-digit day, one- or two-digit month, two-digit year

Described below are additional possibilities for entering dates as the \(x\)-variable into XY() that will be automatically converted to a Date variable.

Daily Data. Enter the dates for daily data values in one of the above five numerical formats. Or, use the ts_format parameter to manually specify a format for non-numerical date values that can include the name of the corresponding month. (Enter ?strptime to view the possible manual date formats.)

Weekly Data. Enter the dates for weekly data values as with daily data values, except that consecutive dates are one week apart. For example, each date represents the first day of the corresponding week, such as "04/03/2024" for March 4, 2024, which begins the first full week in March 2024, followed by "11/03/2024" for the 11th day of the same month.

Monthly Data. Two possibilities exist for entering monthly data.

  • Date: Consecutive dates one month apart. For example, each date represents the first day of the corresponding month, such as "01/03/2024" for the first day of March 2024, followed by "1/04/2024" for the first day of April 2024.
  • Year Month: Four-digit year followed by the three-letter month abbreviation as a single data value. For example, "2024 Jan" followed by "2024 Feb".

Quarterly Data. Two possibilities exist for entering quarterly data.

  • Date: Enter consecutive dates that are one quarter, or three months, apart. For example, represent a quarter with the first day of the month for the corresponding quarter, such as "01/01/2024" for the first day of the first quarter, followed by "01/04/2024" for the first day of the second quarter.
  • Year Quarter: Enter a four-digit year followed by either Q1, Q2, Q3, or Q4, all as a single data value. For example, "2024 Q1" followed by "2024 Q2".

Annual Data. Two possibilities exist for entering annual data.

  • Date: Enter consecutive dates one year apart. For example, each date represents the first day of the year, such as "01/01/2024" for the first day of 2024, followed by "01/01/2025" for the first day of the following year.
  • Year: Enter a four-digit year. For example, "2024" followed by "2025".