Week-by-Week Course Summary

Week 1: R

Separate Data from Code
Unlike a worksheet app such as Excel, R and related analysis systems separate the data from the code that manipulates the data. Excel is extraordinarily useful, but also extraordinarily overused. Countless examples of complex, linked worksheets exist that require full-time people just to understand and debug, and some are probably never fully debugged. Businesses that rely solely upon Excel for complex operations are behind the times, using 20th century practices when more modern, more efficient, more powerful, and easier to use alternatives exist. R with lessR top that list of better alternatives. Welcome to the world of real data science!
Read and Analyze
Read the data, usually from an Excel worksheet or a csv text file, then analyze as needed. The analysis includes visualizations as well as statistical analysis such as forecasting.

Week 2: Plot a Time Series

Randomness
All data values are influenced by a random component. Understanding the nature of randomness and the associated sampling error is essential to understanding data analysis.
Visualize
The first step to forecasting visualizes time-oriented data to reveal its underlying structure in terms of any existing components: stablility, trend, and seasonality. From this structure apply a proper analytic forecasting technique. One simple technique intuits the underlying structure from the visualization of the time series and then draws the extension into the future. Obviously not precise, and without error bands, but can be useful.
Date Type
Achieve a visualization of time-oriented data with dates on the horizontal axis from one of two possibilities.

Week 3: Variability

Variability
Data analysis is the analysis of variability. Assess variability of a numerical variable according to the squared deviations about the mean. Take the average and then un-square with the square root, which is standard deviation.
Normally Distributed Variability
Many natural phenomenon, including random error distributions, are normally distributed. The key fact here is that almost 95% of all normally distributed values are within two standard deviations of the population mean. This key concept allows the construction of error bands around a forecasted value from the mathematically derived standard deviation of the sample estimate over hypothetical repeated samples, the standard error of the statistic.

Week 4: Correlation and Regression

Correlation
The extent of a linear relationship can be expressed with the correlation coefficient, geometrically with a scatterplot. Correlation varies from -1 for a perfect inverse relationship, to 0 for no relationship, to 1 for a perfect direct relationship between two variables.
Regression
With one predictor variable, the regression model is a linear function (y-intercept, slope) of a single variable.
Explanatory vs Time Series Models
An explanatory model generates a forecast from the values of variables different than the variable forecasted. A time series model generates a forecast from values of the same variable from earlier time periods.

Week 6: One-Predictor Least-Squares Regression, Inference, Prediction Intervals

Model Fit
A line can be put through any scatterplot. But does the model (line) effectively summarize the relationship between the variables, that is, does the model fit the data? Evaluate fit with se and R2. More importantly, apply those fit indices to new data to best evaluate forecasting fit.
Inference
Does target y change on average as predictor or feature x changes? If yes, then the slope is nonzero. But in the sample the slope is always nonzero, even if there is no relationship. To evaluate the population slope, the value of interest, do statistical inference in the form of hypothesis testing and the confidence interval.
Prediction Intervals
The forecasted value is almost always wrong. A proper forecast is not a point, a single value, but a range of values that likely contains the actual value when it occurs. That range is the prediction interval.
Forecasting Error
The standard deviation of the residuals, se, is too optimistic for evaluating forecasting error. The reason is that this se is a descriptive statistic that only describes fit to the data from which the model was estimated, the training data. Forecasts, however, occur with new data, and so must account for sampling error as well, so the standard error of prediction, spred, is always larger than se from the analysis of the data from which the model was estimated. Also, spred is a different value for each set of values for the predictor variables.

Week 7: Multiple Least-Squares Regression

Multiple Predictors
Explanatory regression models typically have multiple predictor variables, usually about three to nine, though could be any number. New predictor variables that are relevant (correlate with y) and unique (do not correlate with other x's) contribute to better fit and forecasting accuracy.
Ceteris Paribus
An extraordinarily useful property of multiple regression is that the impact of a predictor variable on the forecasted response is how much the response changes, on average, for a unit increase in the value of the predictor, with the values of all other variables held constant.
Collinearity
When predictor variables correlate too highly, more generally when they are linearly related, the inflate their respective standard errors of the slope coefficients. Predictive accuracy is not hurt, but the understanding of the relationships of the predictors to the response variable y is diminished. Also, predictive accuracy is not increased much in general with collinear predictors.
Feature Selection
When constructing a multiple regression model, the predictor variables, the features, must be selected. The goal is to construct a parsimonious model, with almost the maximum available fit with the smallest number of features. One helpful technique is best-subsets regression, in which all, or most, possible combinations of predictor variables are analyzed for fit.

Week 8: Trend and Seasonality

Trend
Often linear or approximated by linearity, trend is the general movement of the time series, either increasing or decreasing.
Seasonality
Time series data always fluctuates, but seasonal fluctuations are regular patterns of fluctuation that vary according to the relevant time period, such as four quarters over the year.
Decomposition
To decompose the time series is to isolate the trend and seasonal components from the random error fluctuations. The components can influence the data values with either an additive or a multiplicative relationship.
Forecast
Many forecasting techniques exist for a time series with trend and seasonality. One basic technique is to de-seasonalize the data, run the linear regression, project into the future, then add the seasonality back. The tslm() function from the forecast package automates this procedure. The alternative is to manually deseasonalize the data, run the regression analysis, then manually add back in the seasonality. The tslm() function eliminates the need for this extra work.

Week 9: Exponential Smoothing

Self Adjustment
An exponential smoothing forecast self-adjusts according to the error in the current forecast. The amount of self-adjustment follows from the smoothing parameter alpha. A forecasting method based only on the smoothing parameter is simple exponential smoothing, ses, which only yields a flat forecast.
Exponential Decay of Past Times
The basic definition of the exponential smoothing model implies that the impact of past time periods is less than the impact of more recent time periods. The decrease in the weights is exponential. The larger the smoothing parameter, the faster the decay.
Holt's Adaptation
The ses forecasting method does not account for trend or seasonality. Holt's adaption adds a second smoothing parameter, beta, that accounts for linear trend.
Holt-Winters Adaption
To use exponential smoothing to forecast a time series with trend and seasonality, apply the Holt-Winters adaptation, which adds a third smoothing parameter.