❝ Prophecy is a good line of business, but it is full of risks. ❞

Mark Twain

Concept

Linear regression and exponential smoothing have traditionally been the most applied methodologies of what is now called machine learning. The values of a set of predictor variables are used to predict the values of a response or target variable. These predictor variables can be other variables that are related to the target but not the target variable itself. The predictor variables can also be values of the target variable collected at previous times, The focus of this presentation.

Running lessR version 4.4.2.

Time series data

An ordered sequence of data values observed at successive, evenly spaced points in time with each data value associated with a specific date or time.

Data

Choose a type of predictive model, such as a regression model, and then estimate specific details of that model from the data analysis. Evaluate some aspects of its effectiveness by inputting the data into the estimated model to evaluate how well it reconstructs the data. These procedures have been discussed in more detail in linear regression, especially Section~2.4 on Residuals. These same concepts are briefly reviewed here but applied to time series data.

When developing a model with a single predictor variable, we require two columns in our data table: values of the variable from which to forecast and existing values for the variable that we wish to forecast future, unknown values. With time series data, the variable from which we seek to forecast, the predictor variable, is Time. Table 1 depicts the general form of the data table.

Table 1: General form of a data table from which to estimate a predictive model from time series data.

Time	y
1	\(y_1\)
2	\(y_2\)
3	\(y_3\)
…	…
n	\(y_n\)

In practice, the time values are usually entered not as numbers but as dates, such as 08/18/2024. These dates can represent data collected daily, weekly, monthly, quarterly, or annually. The variable with values to be predicted is generically referred to as \(y\), a variable such as Sales or Production output. Generically, refer to a specific data value as \(y_t\), the value of \(y\) at Time \(t\).

Training data

Existing data values for the predictor variable and variable to predict from which the forecasting model has been estimated.

Error

Submit the data organized as in Table 1 to a computer program that can perform the regression analysis forecasting analysis. The analysis results in a model from which a date can be entered, and the corresponding value of \(y\) consistent with the model calculated. For each data value, \(y_t\), there is a corresponding value fitted by the model, \(\hat y_t\).

Fitted values

A fitted value, \(\hat y_t\), is computed from the model for a given value of the predictor variable, here Time, that specifies what data should occur at the given time.

Given the fitted value obtained from applying the model to any row of the data we have already collected, we can see how close the fitted value matches the actual data value, a fundamental concept in constructing predictive models. How well does the model recover the data from which it was estimated?

Residual or error term

The discrepancy between the actual data value that occurred at Time \(t\) and the corresponding value computed, that is, fitted, by the model, \(e_t = y_t - \hat y_t\).

Table 2 shows the organization of the data table for the \(n\) data values across time and, from the model, the subsequently computed fitted values and error terms for each row of the data table. Also illustrated are the predicted values of the value of \(y\) projected two time periods into the future, \(\hat y_{n+1}\) and \(\hat y_{n+2}\).

Table 2: Data, variables Time and \(y\), and information obtained from the analysis, variables \(\hat y\) and \(e\), with two future predicted values of \(y\) beginning with Time n+1.

Time	y	\(\hat y\)	e
1	\(y_1\)	\(\hat y_1\)	\(e_1\)
2	\(y_2\)	\(\hat y_2\)	\(e_2\)
3	\(y_3\)	\(\hat y_3\)	\(e_3\)
…	…	…	…
n	\(y_n\)	\(\hat y_n\)	\(e_n\)
————	———	—————-	———-
n+1		\(\hat y_{n+1}\)
n+2		\(\hat y_{n+2}\)

This concept of residual or error term is fundamental to the development of predictive models, whether regression analysis, exponential smoothing, or any other forecasting technique. We want to build predictive models that minimize the errors in the rows of data.

Error minimization

Developing the specific predictive equation from the data follows from minimizing some function of the error, \(y_t -\hat y_t\), across the data values.

Our goal is to develop a model that explains our data at a given Time, \(t\), by computing a fitted value, \(\hat y_i\), that is reasonably close to the actual data value that has occurred or will occur, \(y_i\). The explanation of existing data is an initial step to evaluate the effectiveness of a model to obtain predictive accuracy on new, currently unknown data. If the model cannot account for what has already occurred, then it certainly cannot account for what has not yet known. Still, when the model is applied to data that have already occurred, there is no forecast because we already know the value of \(y\). A forecast applies to events with unknown values, such as for future events.

Forecasted values

A forecasted value is a fitted value, \(\hat y_t\), computed by the model that estimates an unknown value, when applied to time series data, a future value of the time series.

We need precise terminology to describe statistical models. We need the data from past events from which to estimate the model but our focus is the future. To avoid confusion, better to reserve the term “forecast” to predict unknown values of \(y\), usually future values, from the model. For example, what are the forecasted sales for the next four quarters, \(\hat y_{t+1}\) through \(\hat y_{t+4}\).

Evaluation

Our model is useful only to the extent that it accurately predicts future values. However, the future has not yet arrived, so we cannot yet directly assess the extent of our model’s predictive accuracy against those unknown values. One option is to evaluate how well our model recovers the data we already have. To do this, calculate the error terms that gauge how well the model fits the existing data: The larger the errors, the worse the model’s performance.

A typical fit statistic that assesses the model’s fit to the data is the root mean squared error, RMSE. This concept is explained in more detail with a worked example in linear regression, especially Section~2.5 on Model Fit. RMSE is computed as follows.

Table 3: Conceptual definition of the root mean squared error (RMSE).

	Description	Formula
1.	Calculate the error term for each row of the data table	\(e_t = y_t - \hat y_t\)
2.	Square the error term for each row of the data table	\(e_t^2 = (y_t - \hat y_t)^2\)
3.	Sum the squared errors over all the rows of the data	\(SSE = \sum_{t=1}^{n_m} (y_t - \hat{y}_t)^2\)
4.	Compute the mean of the squared errors	\(MSE = SSE / n_m\)
5.	“Undo” the squaring to return to the original measurement units by taking the square root of MSE	\(RMSE \ = \; \sqrt{MSE}\)

As a technical note, when computing the mean of the sum of squared errors, we do not divide by \(n\), the total number of data values. Instead, we divide by \(n_m\), defined as the number of fitted values, the total number of data values minus the number of parameters estimated. For example, if the data are collected monthly, then there are 12 separate seasonal parameters to estimate, one for each season. The fitted values would start no earlier than the 13th data value, plus other parameters are computed as well.

Without a means of directly evaluating the predictive accuracy of a model on future data, at least we now have some information for model evaluation. The smaller the RMSE, the better the fit of the model to the data.

Our best guide from the data regarding predictive accuracy

We use this RMSE fit index, among others, to suggest how well the model will perform when forecasting unknown future values of \(y\).

Overfitting

A fundamental problem of evaluating the fit of a model to the data on which the model trained is that the resulting fit indices are inflated. The model is biased towards its own data and will not perform as well on new data, that is, actual prediction, the primary reason for building the model. The issue is not if model evaluation on its own training data will be more favorable than evaluation of new, future data, but by how much will the value of the fit index drop when applied to new data.

Given we now have abundant data, sometimes we can randomly split a large data set into two or more subsets. Estimate the model on one data subset, then fit the estimated model to the remaining data. The data values of variable \(y\) are known to the analyst but not to the model. The model then does true prediction but we can evaluate predictive accuracy without having to wait for future values. Unfortunately, time series data are sequentially ordered, usually without replication at each time point, so randomly splitting the data does not apply.

Overfitting

The estimated model tends to reflect too much random error from the training data that does not generalize to the new data for which the forecasts are made.

An overfitted model, which detects excessive random noise in the data, may mislead. The most important aspect of model evaluation is its performance on new data. In this context, the key question shifts from the model’s fit to the training data to its fit to forecasted data. The forecasting errors are likely to be larger than those from the training data. The decrement in fit from training data to forecasted data depends on the extent of overfitting the model to the training data.

Only after the future values become known can we directly evaluate the accuracy of our predictive model to predict the unknown. Once we have this new data, we can calculate a more useful version of RMSE to assess any discrepancy between what was predicted and the actual values of \(y\).

Implementation

Analysis of a Stable Process

This first example applies regression analysis of a time series to a stable process, to data characterized by a stable mean and stable variability over time.

Data

To illustrate, first read some stable process data into the d data frame. The data are available on the web at:

http://web.pdx.edu/~gerbing/data/StableData.xlsx

Read data into R

d <- Read("http://web.pdx.edu/~gerbing/data/StableData.xlsx")

The lessR function Read() reads data from files in any one of many different formats. In this example, read the data from an Excel data file into the local R data frame (table) named d. The data are then available to lessR analysis functions in that data frame, which is the default data name for the lessR analysis functions. That means that when doing data analysis, the data=d parameter and value are optional.

The data represent monthly measurements of Variable Y3. Here are the first six rows of data.

head(d)

       Month      Y3
1 2019-07-01 49.3042
2 2019-08-01 49.3760
3 2019-09-01 51.3605
4 2019-10-01 50.5292
5 2019-11-01 49.8766
6 2019-12-01 49.7658

Before submitting a forecasting model for analysis, first view the data to understand its general structure.

lessR visualization of the stable process data for Variable Y3

Plot(Month, Y3)

Use the Plot() function because we are plotting points, which are, by default, for a time series connected with line segments.

The visualization in Figure 1 suggests a stable system. There’s no discernible trend, the fluctuations around the center are irregular and lack apparent seasonality, and the overall variability of the system appears to remain constant over time.

Decomposition

To better understand the characteristics of the time series for variable Y3 before specifying and estimating a time series model, conduct a formal seasonal and trend decomposition analysis. The objective is to separate the trend component and the seasonal component of the data, and then identify the remaining error that cannot be explained by either trend or seasonality. Figure 2 presents the resulting visualization.

R/lessR decomposition of Variable Y3

STL(Month, Y3)

STL() is the lessR version of the Base R function stl(), enhanced with some color and provided statistics for assessing the strength of the trend, seasonal, and error components of the time series.

When calling the function, specify the x-axis variable first followed by the y-axis variable.

Figure 2: Seasonal and trend decomposition of the time series for variable Y3.

The visual output of the decomposition in Figure 2 consists of four separate plots stacked over each other. The first plot is the data. The seasonal and trend plots follow, plus the extent of the unaccounted variation, error called the remainder.

Regardless of the data analyzed, the seasonal and trend decomposition always identifies seasonality and trend. However, the question remains: are the identified effects substantial enough to be meaningful? To answer this, we have both visual and statistical information.

The visual information used to assess the impact of a component is the gold bar at the extreme right of each of the four plots.

Magnification bar

Each gold bar at the right side of each plot in the trend and seasonal decomposition visualization indicates the amount of magnification required to expand that plot to as large as possible to fill the allotted space. The larger the bar, the smaller the effect.

For example, the seasonal effect is virtually nonexistent because its corresponding bar is as large as possible given the size of the corresponding, narrow plot. Without that large magnification, the plot of the seasonal effect would be tiny.

The statistical output provides the values that represent the size of the gold bars in terms of the proportion of variance of the variable explained by each component. The seasonality component accounts for only 3.0% of the overall variability of the data. The trend component accounts for more, 19.9%, but the dominant component is the error, 63.6%.


Total variance of Y3: 0.5498205
Proportion of variance for components:
  seasonality --- 0.199 
  trend --------- 0.041 
  remainder ----- 0.716

Although the trend shows a small effect in these data, it is likely a sampling fluke in this relatively small data set, particularly when compared to the random error effect. Notice also that the trend is composed of random appearing ups and downs. There is little consistent up or down across annual time periods.

We conclude that the process likely generates random variation about the center over time. The data further support this, demonstrating a constant level of variability.

Visualize the Forecast

Assuming a stable system, apply the stable process forecasting model to the data. We always seek to match the structure of the data to the type of model we submit for analysis. For these data, we do not want our model to attempt to account for neither trend nor seasonality.

lessR simple regression analysis of a stable process

Plot(Month, Y3, ts_ahead=8, ts_method="lm", ts_trend=FALSE, ts_season=FALSE)

Use the Plot() function because we are plotting points, which are by default, for a time series, connected with line segments. A time series is identified by having the x-axis variable be a date variable, specifically of R type Date, implicitly and automatically created when data values that represent dates are entered into Plot().

ts_ahead: This parameter specifies the number of time units to forecast ahead beyond the last data value.

ts_method: This parameter specifies the type of forecasting analysis, here "lm" for linear model, that is, linear regression analysis. (The default method is exponential smoothing analysis.)

ts_trend: Trend parameter for a regression analysis forecast. Set to FALSE to not allow for trend. The default is to allow for trend.

ts_season: Seasonality parameter for a regression analysis forecast. Set to FALSE to not allow for seasonality. The default is to allow for seasonality.

To specify a linear regression model to model a stable process, that is, no trend and no seasonality, override the defaults and set both ts_trend and ts_season toFALSE.

The analysis output in Figure 3 consists of four separate visualizations:

data: [black line] from which the model is estimated
model fit: [light red line] the model’s fitted values to the data
forecast: [dark red line] the model’s forecasted future data values
95% prediction interval [light red band about the forecasted values]

Figure 3: Simple regression analysis forecast appropriately applied to a stable process.

All the model’s forecasted values for the future values of Y3 are the same, 50.224. The large discrepancy between the data and the model’s fitted values indicates that the model fails to adequately explain the variability of the data. If the inherit variability truly is random, then the model is not incorrect. Instead, given the available information, there is not much structure to account for, making accurate prediction with any model impossible. The model can isolate underlying structure but it does not add non-existent structure to the data.

The time series of the fitted values in Figure 3 shows the impact of the regression analysis model applied to the training data. The model is applied to act as if forecasting the next data value in the time series from the previous value, even though both values have already occurred.

The first two data values are well below the center, so the fitted model begins with low values, increasing over time as the magnitude of the data values increases. After this initial recovery, the fitted values show no discernible trend, but the model attempts to capture the non-existent seasonality. After each particularly large observation relative to the rest, the fitted model increases, then decreases in value following a decrease in the data. Without any regular pattern of increasing and decreasing data, the ups and downs of the fitted model are also irregular.

Figure 3 also visualizes the 95% prediction interval for each forecast value.

95% Prediction interval

Estimated range of values that contains 95% of all future values of forecasted variable \(y\) for a given future value of time.

For this stable process model, the 95% prediction interval spans the range of the data. The confidence range grows increasingly larger for forecasted values further in the future.

To be more confident that the prediction interval will contain the future value of \(y\) when it occurs requires a larger prediction interval. At the extreme, for a data value that is in the range of this example, we would be approximately 99.9999999% confident the data value will fall within the range of -10,000 to 10,000.

Text Output

In addition to the visualization, the precise forecasted values are also available with their corresponding 95% prediction intervals along with other information. The text output of the analysis follows.


 
Mean squared error of fit to data: 0.55930 

Coefficients for Linear Trend 
 b0: 50.22405  b1: 0.0000   
 
     Month predicted    upper    lower    width
1 Jul 2024  50.22405 48.67671 51.77138 3.094670
2 Aug 2024  50.22405 48.67422 51.77388 3.099656
3 Sep 2024  50.22405 48.67165 51.77644 3.104795
4 Oct 2024  50.22405 48.66900 51.77909 3.110085
5 Nov 2024  50.22405 48.66628 51.78181 3.115526
6 Dec 2024  50.22405 48.66349 51.78461 3.121117
7 Jan 2025  50.22405 48.66062 51.78748 3.126858
8 Feb 2025  50.22405 48.65767 51.79042 3.132747

The poor fit of the stable process model applied to these data is indicated by:

visualization: the discrepancy between the plot of the data values and the values fitted by the model
statistic: the high value of MSE of 0.559.

Visually, the lack of fit is indicated by the discrepancy of the data, the black line in Figure 3, compared to the fit of the model to the data, the light red line. Figure 3 shows the fitted data values lagging the actual data. For example, when the data peaks at a high point, the fitted values also increase as a proportion of the data but one time period later. The inherent up-and-downs in the data are reflected in fitted values with the lag, leading to less than optimal fit.

Find the forecasted values under the predicted column. For the stable process model, the forecasted values equal one another.

\[\hat y_{2024.Q3} =\hat y_{2024.Q4} = \; ... \; = \hat y_{2026.Q2} = 50.319\]

As indicated, the simple regression analysis model only accounts for the level of the forecasting data, applicable only to data without trend or seasonality.

Analysis of Trend and Seasonality

Example Data

The data are available on the web.

http://web.pdx.edu/~gerbing/0Forecast/data/SalesData.xlsx

Read data into R

d <- Read("http://web.pdx.edu/~gerbing/data/SalesData.xlsx")

The lessR function Read() reads data from files in one of many different formats. In this example, read the data from an Excel data file into the local R data frame (table) named d. The data are then available to lessR analysis functions in that data frame, which is the default data name for the lessR analysis functions. That means that when doing data analysis, the data=d parameter and value are optional.

The data represent quarterly measurements of the variable Sales. The dates are listed as individual days, with each date representing the first day of the corresponding quarter. The 16 lines of the data table for variables Qtr and Sales follow, reported quarterly from the first quarter of 2016 through the last quarter of 2019.

          Qtr Sales
1  2016-01-01  0.41
2  2016-04-01  0.65
3  2016-07-01  0.96
4  2016-10-01  0.57
5  2017-01-01  0.59
6  2017-04-01  1.20
7  2017-07-01  1.53
8  2017-10-01  0.97
9  2018-01-01  0.93
10 2018-04-01  1.71
11 2018-07-01  1.74
12 2018-10-01  1.42
13 2019-01-01  1.36
14 2019-04-01  2.11
15 2019-07-01  2.25
16 2019-10-01  1.74

lessR decomposition of Variable Y3

STL(Qtr, Sales)

STL() is the Base R function stl() with a color enhancement and provided statistics for assessing the strength of the trend, seasonal and error components of the time series.

Figure 4: Seasonal and trend decomposition of the time series for variable Y3.

The data exhibit strong trend and seasonality.


Total variance of Sales: 0.318705
Proportion of variance for components:
  seasonality --- 0.241 
  trend --------- 0.691 
  remainder ----- 0.023

What type of model does your data support? As always, plot your data to uncover any underlying patterns. However, the random noise that influences each data point obscures its true structure. Visualizing your time series enables you to discern beyond the noise of any single data point and perceive the underlying structure as a whole. A deeper understanding of this structure facilitates better adjustment of the analytical forecasting technique to align with it.

Trend

R regression analysis with trend but no seasonality

For example, to specify the default model with additive errors, with no seasonality but allow for trend, delete the trend parameter from the Plot() function call to rely upon the default of accounting for trend.

Plot(Qtr, Sales, ts_ahead=6, ts_method="lm", ts_season=FALSE, ts_method="lm")

ts_ahead: This parameter specifies the number of time units to forecast ahead beyond the last data value.

ts_method: This parameter specifies the type of forecasting analysis, here "lm" for linear model, that is, linear regression analysis. (The default method is exponential smoothing analysis.)

ts_season: Seasonality parameter for a regression analysis forecast. Set to FALSE to not allow for seasonality.

Although not listed here, the root mean squared error, RMSE, is relatively substantial: 0.179. To visually illustrate the poor fit, compare the black line representing the data to the light red line that plots the fitted values. Enabling the trend option allows the fitted line to approximate the actual trend, but this model fails to account for the inherent seasonality. Nevertheless, the forecasted values would likely be more accurate than those obtained from the stable process model. These forecasted values do extend the overall trend of the data.

Seasonality

Next, consider a model that explicitly accounts for seasonality but not trend with these data.

lessR deseseasonalized regression with seasonality but no trend

Plot(Qtr, Sales, ts_ahead=4, ts_trend=FALSE, ts_method="lm")

ts_ahead: Indicate a regression analysis forecast by specifying the number of time units for which to do the beyond the last data value.

ts_trend: Trend parameter for a regression analysis forecast. Here, set to FALSE so as to not allow trend.

To specify an model with no trend but allow for seasonality, only set ts_trend to FALSE.


 
Mean squared error of fit to data: 2.06117 

Coefficients for Linear Trend and Seasonality 
 b0: -0.00761  b1: 0.00067   
 s1: -0.32562  s2: 0.19186  s3: 0.3280  s4: -0.19425   
 
      Qtr  predicted     upper    lower    width
1 2020 Q1 -0.3217524 -3.798684 3.155179 6.953862
2 2020 Q2  0.1964020 -3.351980 3.744784 7.096765
3 2020 Q3  0.3332107 -3.292911 3.959332 7.252244
4 2020 Q4 -0.1883579 -3.898112 3.521396 7.419508

Figure 6: Inappropriate forecasts from trend and seasonal data accounting for seasonality but not trend.

Again, the values are not displayed here. The mean squared error (MSE) for the model that does not account for trend but does account for seasonality is comparatively huge, 2.061. Examining Figure 6 reveals that the fitted values contain no trend and so always lag below the data values. Worse, the forecasted values must account for trend as the values are projected into the future. Without specifying trend, the forecast shows a repetition of the seasonal pattern but without any trend.

The primary conclusion of this analysis is that if you observed trend or seasonality in the data, then a structural component present in the data should be correspondingly accounted for by the model.

Trend and Seasonality

The data exhibit both trend and seasonality, so now analyze a more appropriate model that explicitly accounts for both characteristics. Apply the model to trend and seasonal data in Figure 7.

lessR regression analysis with trend and seasonality

Plot(Qtr, Sales, ts_ahead=4, ts_method="lm")

ts_ahead: Indicate a regression analysis forecast by specifying the number of time units for which to do the beyond the last data value.

ts_method: This parameter specifies the type of forecasting analysis, here "lm" for linear model, that is, linear regression analysis. (The default method is exponential smoothing analysis.)

Allow for trend and seasonality in the regression analysis forecast by not specifying the corresponding parameters ts_trend and ts_season. The analysis begins at their default values so that their corresponding optimal values are estimated.

Figure 7: The appropriate forecasts from trend and seasonality with Holt-Winter’s method applied to data with trend and seasonality.

With this more sophisticated model, both trend and seasonality extend into the future forecasted values. Accordingly, the fourth quarter forecast tends to be lower in value than the previous quarters. Although there is increasing trend, Quarter #4 forecasted units are less than those forecasted for Quarter #3: \(y_{t+3}=\) 2.636 and \(y_{t+4}=\) 2.190.

This more general model accounts for the trend and the seasonality. Because the time series displays a regular pattern with relatively small random error, the forecasts show relatively small prediction intervals. The precise fitted values and their corresponding prediction intervals follow.


 
Mean squared error of fit to data: 0.009635 

Coefficients for Linear Trend and Seasonality 
 b0: 0.41626  b1: 0.09912   
 s1: -0.32562  s2: 0.19186  s3: 0.3280  s4: -0.19425   
 
      Qtr predicted    upper    lower     width
1 2020 Q1  1.775622 1.537904 2.013340 0.4754360
2 2020 Q2  2.392218 2.149615 2.634821 0.4852063
3 2020 Q3  2.627468 2.379550 2.875386 0.4958363
4 2020 Q4  2.204341 1.950705 2.457977 0.5072722

The mean square error for the fit of the trend and seasonality model to the data is the lowest of all the analyzed models for these data: MSE=0.010. This improvement in fit is apparent in the analysis of the data in Figure 7 compared to the light red line for model fit, which now more closely matches the data. In particular, the high peaks of each season match the corresponding peaks of the fitted model.

We can compare the fit of the different models, each to the same data with trend and seasonality, as in Table 4.

Table 4: RMSE across four models of the same data that exhibits both trend and seasonality.

	Stable	Trend	Seasonality	Trend/Seasonality
RMSE	0.341	0.088	2.061	0.010

The more closely the model’s form aligns with the data’s structure, the better the fit to the training data and the more accurate the forecasted values, assuming the same underlying dynamics that generated the data. While we do not have an estimate of the actual forecasting error in this case, the same logic applies. The closer are the data structure and model specification, the more accurate the forecast for new data.

These regression analysis forecasting analyses also provide coefficients for linear trends and seasonality when applicable. The trend line for these data is estimated as follows:

\[\hat y_{trend} = a + b (y_t) = 0.4163 + 0.010 (y_t)\]

Each seasonality coefficient indicates the impact of a specific season on the trend. A positive seasonality coefficient signifies an increase above the trend, while a negative seasonality coefficient indicates a decrease below the trend. For example, the first coefficient is \(s_1\) = -0.326, so the first season tends to lower sales 0.326 units below the trend. The second seasonality coefficient is \(s_2\) = 0.192, so the impact of the second season tends to increase the value from trend by 0.192 units. The effect of each seasonality coefficient is additive, above or below the trend for a given season.

Overfitting Example

Why not specify every model according to trend and seasonality? Unfortunately, assessing a model’s fit on the data it trained on is a kind of cheating.

Overfitting

The estimated model tends to reflect too much random error in the training data that does not generalize to the new data for which the forecasts are made.

An overfitted model, which detects excessive random noise in the data, may be misleading. The most important aspect of model evaluation is its performance on new data. In this context, the key question shifts from the model’s fit to the training data to its fit to forecasted data. If the model is overfitted, the forecasting errors are likely to be larger than those from the training data.

For instance, consider time series data. Are the fluctuations regular, indicating seasonality, or are they irregular, suggesting random sampling fluctuations? Especially with smaller datasets, both people and analytic algorithms may see patterns where none exist over time. An example is estimation algorithm that detects seasonality from stable process data. However, if this seasonality is projected into the future as a forecast, the forecasting errors will be larger if there’s no seasonality in the underlying structure. A reasonable fit to the training data may not necessarily be advantageous in reducing forecasting errors.

Those seasonal fluctuations in the forecasted data values are an artifact of the sampling error inherited in the data on which the model trained. As shown in Figure 1, there is no seasonality inherent in these data.

Match the specified model to the data

Understand the properties of the time series data before constructing and analyzing a regression analysis forecasting model.

This is why it is important to understand the characteristics of the time series data before constructing a model and estimating its values from the data.