mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
4 Original Model
Are we better off with the Lasso technology? Is there information gained that we could not obtain with standard OLS (ordinary least squares) regression? Here we pursue a topic not typically presented in explanations of regularization, a comparison to standard OLS regression analysis. Two topics can be compared.
- model selection
- model fit, such as \(R^2\)
Get mpg centered to be equivalent to the data for the Lasso regression.
mtcars$mpg <- scale(mtcars$mpg, center=TRUE, scale=FALSE)Run the OLS regression analysis with standardized predictor variables for equivalence to the Lasso regression. Standardize with the Regression() parameter new_scale. Save the output into the r data structure so that the relevant pieces can be examined individually.
r <- reg(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
new_scale="z", subset=75, data=mtcars, graphics=FALSE)
Rescaled Data, First Six Rows
mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 0.909375 -0.1049878 -0.5706198 -0.5350928 0.5675137 -0.6103996 -0.7771651 0 1 0.4235542
Mazda RX4 Wag 0.909375 -0.1049878 -0.5706198 -0.5350928 0.5675137 -0.3497853 -0.4637808 0 1 0.4235542
Datsun 710 2.709375 -1.2248578 -0.9901821 -0.7830405 0.4739996 -0.9170046 0.4260068 1 1 0.4235542
Hornet 4 Drive 1.309375 -0.1049878 0.2200937 -0.5350928 -0.9661175 -0.0022995 0.8904872 1 0 -0.9318192
Hornet Sportabout -1.390625 1.0148821 1.0430812 0.4129422 -0.8351978 0.2276543 -0.4637808 0 0 -0.9318192
Valiant -1.990625 -0.1049878 -0.0461670 -0.6080186 -1.5646078 0.2480946 1.3269868 1 0 -0.9318192
carb
Mazda RX4 0.7352031
Mazda RX4 Wag 0.7352031
Datsun 710 -1.1221521
Hornet 4 Drive -1.1221521
Hornet Sportabout -0.5030337
Valiant -1.1221521
Show the estimated model.
r$out_estimates Estimate Std Err t-value p-value Lower 95% Upper 95%
(Intercept) -1.1628634 1.4449858 -0.805 0.430 -4.1678758 1.8421490
cyl -0.1990239 1.8663298 -0.107 0.916 -4.0802693 3.6822215
disp 1.6527521 2.2132353 0.747 0.463 -2.9499227 6.2554270
hp -1.4728758 1.4925163 -0.987 0.335 -4.5767334 1.6309818
drat 0.4208515 0.8743992 0.481 0.635 -1.3975612 2.2392642
wt -3.6352666 1.8536038 -1.961 0.063 -7.4900467 0.2195135
qsec 1.4671530 1.3059782 1.123 0.274 -1.2487772 4.1830833
vs 0.3177630 2.1045086 0.151 0.881 -4.0588022 4.6943283
am 2.5202267 2.0566505 1.225 0.234 -1.7568122 6.7972656
gear 0.4835664 1.1017333 0.439 0.665 -1.8076133 2.7747462
carb -0.3221021 1.3386010 -0.241 0.812 -3.1058753 2.4616712
No predictors are significant! What kind of model is this? We can dub such a model as a “kitchen sink” model, it includes everything but the kitchen sink. Instead of evaluating the potential predictor variables on their individual utility, relevance and uniqueness, just throw everything in the model and see what sticks. Not necessarily wrong but certainly not thoughtful. All those predictors and not even one is statistically signification with not one \(p\)-value \(< \alpha=0.05\). [See Section 12.2.1 for a discussion of relevance and uniqueness.]
With no significant predictors (features), perhaps the model has poor fit. Next, check model fit.
r$out_fitStandard deviation of mpg: 6.0269481
Standard deviation of residuals: 2.6501970 for 21 degrees of freedom
95% range of residual variation: 11.0227729 = 2 * (2.080 * 2.6501970)
R-squared: 0.869 Adjusted R-squared: 0.807 PRESS R-squared: 0.654
Null hypothesis of all 0 population slope coefficients:
F-statistic: 13.932 df: 10 and 21 p-value: 0.000
\(R^2\) = 0.869, a high value indeed, though a descriptive statistic on the training data. There is some overfitting because \(R^2_{PRESS}\) drops to 0.654 but still excellent fit applied to testing data. [See Section 11.6.3 for a discussion of \(R^2_{PRESS}\).]
An model with excellent fit but no significant slope coefficients? Check on collinearity, which could explain the lack of significance with inflated standard errors for the estimation of the slope coefficients, yet excellent model fit overall. [See Section 12.6.1 for a discussion of collinearity.]
r$out_collinear Tolerance VIF
cyl 0.065 15.374
disp 0.046 21.620
hp 0.102 9.832
drat 0.296 3.375
wt 0.066 15.165
qsec 0.133 7.528
vs 0.201 4.966
am 0.215 4.648
gear 0.187 5.357
carb 0.126 7.909
There is a huge collinearity problem, the highest Tolerance of 0.296 is barely above the somewhat arbitrary threshold of 0.2. As evidence of this Tolerance problem, \(R^2=0.869\), very high, yet no slope coefficients are significant. So do a Best Subsets analysis to see which predictors survive. [See Section 12.6.2 for a discussion of Best Subsets.]
r$out_subsets cyl disp hp drat wt qsec vs am gear carb R2adj X's
0 1 1 0 1 1 0 1 0 0 0.838 5
0 0 1 0 1 1 0 1 0 0 0.837 4
0 0 0 0 1 1 0 1 0 1 0.836 4
0 1 1 1 1 1 0 1 0 0 0.835 6
0 1 1 0 1 1 0 1 1 0 0.834 6
0 0 0 1 1 1 0 1 0 1 0.834 5
0 0 1 0 1 1 0 1 0 1 0.834 5
1 1 1 0 1 1 0 1 0 0 0.834 6
0 0 0 0 1 1 0 1 0 0 0.834 3
0 0 0 0 1 1 0 1 1 1 0.832 5
0 1 1 0 1 1 1 1 0 0 0.832 6
0 0 1 1 1 1 0 1 0 0 0.832 5
1 0 0 0 1 1 0 1 0 1 0.832 5
0 0 1 1 1 1 0 1 0 1 0.831 6
0 1 1 0 1 1 0 1 0 1 0.831 6
1 0 1 0 1 1 0 1 0 0 0.831 5
0 0 1 0 1 1 1 1 0 0 0.831 5
0 1 0 0 1 1 0 1 0 0 0.831 4
0 0 1 0 1 1 0 1 1 0 0.831 5
0 0 1 0 1 1 0 1 1 1 0.831 6
0 1 1 1 1 1 0 1 1 0 0.830 7
0 0 0 0 1 1 1 1 0 1 0.829 5
0 0 0 1 1 1 0 1 1 1 0.829 6
1 1 1 1 1 1 0 1 0 0 0.829 7
1 0 0 0 1 1 0 1 0 0 0.829 4
0 0 0 1 1 1 0 1 0 0 0.829 4
0 1 1 1 1 1 1 1 0 0 0.829 7
1 1 1 0 1 1 0 1 1 0 0.829 7
1 0 1 0 1 1 0 1 0 1 0.828 6
cyl disp hp drat wt qsec vs am gear carb
1 0 0 1 1 1 0 1 0 1 0.828 6
0 1 1 0 1 1 1 1 1 0 0.828 7
1 0 0 0 1 0 0 1 0 1 0.828 4
0 1 1 1 1 1 0 1 0 1 0.828 7
0 1 1 0 1 1 0 1 1 1 0.828 7
0 0 1 0 1 0 1 1 0 0 0.828 4
0 0 0 0 1 1 0 1 1 0 0.828 4
0 0 0 0 1 1 1 1 0 0 0.827 4
1 1 1 0 1 1 1 1 0 0 0.827 7
1 1 1 0 1 1 0 1 0 1 0.827 7
1 0 1 0 1 0 0 1 0 0 0.827 4
1 0 1 0 1 0 0 0 0 0 0.826 3
0 0 1 1 1 1 0 1 1 1 0.826 7
1 0 0 0 1 0 0 0 0 1 0.826 3
0 1 1 1 1 1 0 1 1 1 0.823 8
0 1 1 1 1 1 1 1 1 0 0.823 8
0 0 1 0 1 0 0 1 0 0 0.823 3
1 1 1 1 1 1 0 1 1 0 0.823 8
1 0 0 0 1 1 0 0 0 0 0.822 3
1 1 1 1 1 1 1 1 0 0 0.822 8
1 1 1 1 1 1 0 1 0 1 0.822 8
1 1 1 0 1 1 1 1 1 0 0.821 8
1 1 1 0 1 1 0 1 1 1 0.821 8
0 1 1 1 1 1 1 1 0 1 0.821 8
0 1 1 0 1 1 1 1 1 1 0.821 8
1 1 1 0 1 1 1 1 0 1 0.820 8
0 0 0 1 1 1 0 0 0 0 0.820 3
0 0 1 1 1 0 0 0 0 0 0.819 3
1 0 0 0 1 0 0 0 0 0 0.819 2
0 0 1 0 1 0 0 0 1 0 0.818 3
cyl disp hp drat wt qsec vs am gear carb
0 0 1 0 1 1 0 0 0 0 0.817 3
0 1 1 1 1 1 1 1 1 1 0.815 9
1 1 1 1 1 1 0 1 1 1 0.815 9
0 0 0 0 1 1 0 0 1 0 0.815 3
1 1 1 1 1 1 1 1 1 0 0.815 9
0 0 1 0 1 0 0 0 0 0 0.815 2
0 0 0 0 1 1 0 0 0 0 0.814 2
1 1 1 1 1 1 1 1 0 1 0.814 9
1 1 1 0 1 1 1 1 1 1 0.813 9
1 0 1 1 1 1 1 1 1 1 0.811 9
1 1 0 1 1 1 1 1 1 1 0.807 9
1 1 1 1 1 1 1 1 1 1 0.807 10
1 1 1 1 1 0 1 1 1 1 0.804 9
1 1 1 1 1 1 1 0 1 1 0.802 9
0 0 0 0 1 0 1 0 0 0 0.787 2
1 1 1 1 0 1 1 1 1 1 0.782 9
>>> Only first 75 of 91 rows printed
To indicate more, add subset=n, where n is the number of lines
[based on Thomas Lumley's leaps function from the leaps package]
The model with the highest \(R^2_{adj}=0.838\) has 5 predictors. However, pursuing parsimony, we find that a model with only two predictors, cyl and wt, drops the \(R^2_{adj}\) down only 0.19, to 0.819.
Both Lasso regression and Best subsets do the variable selection just fine.
Also, there should also be a check for potential outliers. That issue does not go away just because one is running Lasso regression. And while Lasso regression addresses the collinearity problem, still good to view the actual collinearity diagnostics. [See Section 11.6.1 for a discussion of outliers.]
4.1 Reduced Model
The Lasso regression retained three predictor variables: cyl, hp, and wt. However, hp was barely larger than 0 and so perhaps will not survive additional analysis.
r3 <- reg(mpg ~ cyl + hp + wt, data=mtcars, new_scale="z", graphics=FALSE)
Rescaled Data, First Six Rows
mpg cyl hp wt
Mazda RX4 0.909375 -0.1049878 -0.5350928 -0.6103996
Mazda RX4 Wag 0.909375 -0.1049878 -0.5350928 -0.3497853
Datsun 710 2.709375 -1.2248578 -0.7830405 -0.9170046
Hornet 4 Drive 1.309375 -0.1049878 -0.5350928 -0.0022995
Hornet Sportabout -1.390625 1.0148821 0.4129422 0.2276543
Valiant -1.990625 -0.1049878 -0.6080186 0.2480946
r3$out_estimates Estimate Std Err t-value p-value Lower 95% Upper 95%
(Intercept) -0.0000000 0.4439832 -0.000 1.000 -0.9094585 0.9094584
cyl -1.6816539 0.9838935 -1.709 0.098 -3.6970683 0.3337606
hp -1.2367441 0.8142698 -1.519 0.140 -2.9047001 0.4312119
wt -3.0987483 0.7246220 -4.276 0.000 -4.5830692 -1.6144275
r3$out_fitStandard deviation of mpg: 6.0269481
Standard deviation of residuals: 2.5115485 for 28 degrees of freedom
95% range of residual variation: 10.2893476 = 2 * (2.048 * 2.5115485)
R-squared: 0.843 Adjusted R-squared: 0.826 PRESS R-squared: 0.796
Null hypothesis of all 0 population slope coefficients:
F-statistic: 50.171 df: 3 and 28 p-value: 0.000
Horsepower does drop from the reduced model, with a \(p\)-value of 0.140.
This conclusion is also verified by the Best Subsets analysis, repeated here for brevity.
r3$out_subsets cyl hp wt R2adj X's
1 1 1 0.826 3
1 0 1 0.819 2
0 1 1 0.815 2
0 0 1 0.745 1
1 1 0 0.723 2
1 0 0 0.717 1
0 1 0 0.589 1
[based on Thomas Lumley's leaps function from the leaps package]
Dropping hp just does not much reduce \(R^2_{adj}\).
4.2 Final Model
r2 <- reg(mpg ~ cyl + wt, data=mtcars, new_scale="z", graphics=FALSE)
Rescaled Data, First Six Rows
mpg cyl wt
Mazda RX4 0.909375 -0.1049878 -0.6103996
Mazda RX4 Wag 0.909375 -0.1049878 -0.3497853
Datsun 710 2.709375 -1.2248578 -0.9170046
Hornet 4 Drive 1.309375 -0.1049878 -0.0022995
Hornet Sportabout -1.390625 1.0148821 0.2276543
Valiant -1.990625 -0.1049878 0.2480946
r2$out_estimates Estimate Std Err t-value p-value Lower 95% Upper 95%
(Intercept) -0.0000001 0.4538769 -0.000 1.000 -0.9282826 0.9282825
cyl -2.6928038 0.7406008 -3.636 0.001 -4.2075025 -1.1781051
wt -3.1222303 0.7406008 -4.216 0.000 -4.6369290 -1.6075317
r2$out_fitStandard deviation of mpg: 6.0269481
Standard deviation of residuals: 2.5675157 for 29 degrees of freedom
95% range of residual variation: 10.5023184 = 2 * (2.045 * 2.5675157)
R-squared: 0.830 Adjusted R-squared: 0.819 PRESS R-squared: 0.790
Null hypothesis of all 0 population slope coefficients:
F-statistic: 70.908 df: 2 and 29 p-value: 0.000
Dropping hp from the model only reduces \(R^2_{PRESS}\) by only 0.006, further indicating that little if anything is gained by including the variable in the model.