We spent three weeks on regression analysis, arguably the most important statistical analysis that exists. While regression analysis can be applied to time series forecasting, it is usually thought of as a form of supervised machine learning for building models that forecast the value of a variable in terms of other variables, what are called explanatory models. If you can complete my regression analysis template, you have a reasonable level of expertise to build these models.
However, I teach a full course in regression analysis, BTA 516, that lasts the entire term and I still have more material to cover. Be aware that the biggest limitation of your ability to do a regression analysis is a consideration of categorical variables. If you have one or more categorical predictor variables, R makes the proper adjustment, but you need to know how to interpret that adjustment. Also, the automatic adjustment R makes is just one of many possibilities and not always the one that is best suited for your analysis.
For the case of a categorical response or target variable, standard least-squares regression analysis is not even appropriate. Instead, logistic regression is needed, accessible with my lessR function Logit(). Its syntax is exactly like my Regression() function but the response variable before the tilde would only have two values, such as predicting if a shipment will be late or not. So you already know how to use that function though the interpretation of its results is different from standard regression. The primary assessment of fit is how well the model can classify outcomes into the proper category.
Both the situation of categorical predictor variables and a categorical target variable are covered in my online materials linked to in my regression content, but are not assigned reading for this class. We simply do not have enough time. Instead of covering more material on regression analysis, we need to return to the other major forecasting paradigm, time series analysis, for which we developed some intuition during Weeks 2 and 3. You may wish to review that material before proceeding.
In Week 3 we generated a forecast for a stable process, one without trend or seasonality. Weeks 4 and 6 built up to Week 7 where we forecasted from a linear model with multiple predictor variables. The resulting linear model provided a prediction accompanied by a prediction interval, and an understanding of the importance of each predictor by estimating the corresponding slope coefficients accompanied by a confidence interval.
This week extends forecasting with regression analysis explanatory models to time series models of a linear trend over time of a single variable with seasonality. To do a time series forecast, this week introduces the forecast package, which easily produces sophisticated forecasts with accompanying visualizations.
lessR has a great regression function, one of the best available in terms of its comprehensiveness, but when applied to a time series, it is oriented to time series only with trend. It does not account for seasonality. That is where the forecast package enters the scene with its tslm() function. This function automatically deseasonalizes the data before doing the regression, then adds the seasonality back to the estimated regression line (surface). To use lessR Regression() to accomplish this task would involve manually deaseasonalizing the data, then manually adding back the seasonality after the model is estimated. The tslm() function does this extra seasonality work for us.
The forecast should consist of a specific prediction surrounded by the 95% prediction interval as well as accompanying visualization. For a time series prediction, the forecast typically extends several or more time periods into the future. Each forecast at each time period should include its own prediction interval, which usually grow larger the further out from the current time period.
There are several R functions applied this week.
These functions are from four different packages: stats from Base R, lessR, forecast, and ggplot2. However, only the lessR and forecast packages need to be manually installed. The R installation consists primarily of the Base R functions, organized into six different packages. The visualization package ggplot2 automatically downloads and loads with the forecast package.
As you know, do not memorize any of this computer stuff, nor is any of it on any test. Using these forecasting functions is not a memory game. Always have access to the directions, the specific function calls. Relying upon memory and guessing is a losing proposition that will waste your time. Better to copy my relevant example for a homework problem, paste into R, and then modify as needed. Always refer to the directions as you do the homework. You have an immense amount of computer power and analysis ability on your computer and the following Reading and videos Will show you how to harness that power.
Make sure to access my online summary at the end of Section 2.
Distinction between Additive and Multiplicative Seasonality [quick read with an illustration]
Forecast from Additive Seasonality and Trend Section 2
Video [6:39] 2.1: Time series data with trend and seasonality, R time series object, Plot
Video [4:42] 2.2: Decompose Time Series into trend and seasonality, R class() function
Video [7:29] 2.3: Do the Forecast, visualize the forecast with error bands
Video [7:48] 2.4: Isolate and plot the trend and seasonality, de-seasonalize the data