Multiple Regression

This Week’s Overview

Multiple regression, an analysis of a model with multiple predictor variables, is one of the premier data analysis techniques, and a portal into supervised machine learning for forecasting. As with regression with a single predictor variable, we have two interests for doing the analysis: accurate prediction and understanding the relationships among the variables in the model. Expanding the model to multiple predictor variables provides for the possibility of better prediction and better understanding.

Regarding understanding, the slope coefficient from a multiple regression extends the interpretation from what we previously learned, which was the slope coefficient from a regression with only one predictor variable. The extension is statistical control, that all the direct contributions of all other predictor variables in the model are held constant. Always make sure to add that qualifying phrase to the interpretation of these slope coefficients from a multiple regression model. For that reason, slope coefficients from a multiple regression model are called partial slope coefficients.

Learning Objectives for This Week

With multiple regression, we usually start with a larger set of predictor variables, and whittle the set down to a more functional subset that successfully accounts for and forecasts the response variable. In other words, our final model usually has with fewer variables than our initial model.

Each selected variable in a multiple regression model should satisfy two criteria. A predictor variable selected to be added to an existing model should provide:

We prefer a set of predictor variables such that each is uncorrelated with the others (unique), and yet all of the predictor variables correlate strongly with the response variable (relevant). Such an ideal at the extreme rarely happens, but that goal guides our selection of the predictor variables in our final model. In practice, good but not necessary, to have the correlations among the predictor variables below 0.3, and the correlations of each predictor variable with the response above 0.5.

Add predictor variables in the R model specification for each variable by adding a + sign and the variable name. Do multiple regression with

         reg_brief(Y ~ X1 + X2)    or    reg(Y ~ X1 + X2)

which regresses response variable Y on predictor variables X1 and X2. Include as many predictors as available.

And now, for a forecasting class, time to make some forecasts! As always, a prediction (forecast) includes both the predicted value and the predictor interval about that value. Last week we generated specific predictions and associated prediction intervals by adding the parameter X1.new to the regression model for a single predictor variable. This week's homework presents two predictor variables. To generate a prediction interval for two predictor variables specify two parameters: X1.new and X2.new. Assign numerical values to each parameter to generate the corresponding predictions and prediction intervals. As with last week, when specifying a set of values for a single parameter value, always enclose the multiple values with the c() function. For example, for the second set of predictor values 48, 59, and 63 specify as X2.new=c(48,59,63).

Content

Multiple Regression Online Textbook Section 5

The following material generalizes the regression model to multiple regression. Once the basics for the single predictor model are understood, the generalization to multiple predictors is straightforward. Remember that with multiple regression the interpretation of each slope coefficient has the qualifier, with the values of the other variables held constant.

Videos of Section 5 Material

Sec 5.1 Multiple Regression Model [12:13] → The model contains two or more predictor variables

Sec 5.2 Feature Selection [9:35] → Choose the predictor variables

Sec 5.3 Collinearity [9:52] → Predictor variables should not correlate too highly

Sec 5.4 Best Subset Regressions [7:26] → Select the most parsimonious set of predictor variables

The following material does not add new conceptual material beyond the online textbook material, but does add another example, including the hypothesis test and confidence interval of the partial slope coefficients. The following material, recorded in the studio in 2016, is based on slides I developed that morphed into the online textbook.

9.1 Multiple Regression [pdf]
9.1a Multiple Regression Model [10:02]
9.1c-I Input Into The Regression Analysis [9:02]
9.1c-II Interpret Model Estimates [10:07] (corresponds to Secs 3.2 and 3.3, online textbook)
9.1c-III Assess Model Fit [8:15] (corresponds to Secs 2.5, and 3.3, online textbook)