4 Original Model

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Are we better off with the Lasso technology? Is there information gained that we could not obtain with standard OLS (ordinary least squares) regression? Here we pursue a topic not typically presented in explanations of regularization, a comparison to standard OLS regression analysis. Two topics can be compared.

model selection
model fit, such as \(R^2\)

Get mpg centered to be equivalent to the data for the Lasso regression.

mtcars$mpg <- scale(mtcars$mpg, center=TRUE, scale=FALSE)

Run the OLS regression analysis with standardized predictor variables for equivalence to the Lasso regression. Standardize with the Regression() parameter new_scale. Save the output into the r data structure so that the relevant pieces can be examined individually.

r <- reg(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, 
         new_scale="z", subset=75, data=mtcars, graphics=FALSE)


Rescaled Data, First Six Rows
                        mpg        cyl       disp         hp       drat         wt       qsec vs am       gear
Mazda RX4          0.909375 -0.1049878 -0.5706198 -0.5350928  0.5675137 -0.6103996 -0.7771651  0  1  0.4235542
Mazda RX4 Wag      0.909375 -0.1049878 -0.5706198 -0.5350928  0.5675137 -0.3497853 -0.4637808  0  1  0.4235542
Datsun 710         2.709375 -1.2248578 -0.9901821 -0.7830405  0.4739996 -0.9170046  0.4260068  1  1  0.4235542
Hornet 4 Drive     1.309375 -0.1049878  0.2200937 -0.5350928 -0.9661175 -0.0022995  0.8904872  1  0 -0.9318192
Hornet Sportabout -1.390625  1.0148821  1.0430812  0.4129422 -0.8351978  0.2276543 -0.4637808  0  0 -0.9318192
Valiant           -1.990625 -0.1049878 -0.0461670 -0.6080186 -1.5646078  0.2480946  1.3269868  1  0 -0.9318192
                        carb
Mazda RX4          0.7352031
Mazda RX4 Wag      0.7352031
Datsun 710        -1.1221521
Hornet 4 Drive    -1.1221521
Hornet Sportabout -0.5030337
Valiant           -1.1221521

Show the estimated model.

r$out_estimates

              Estimate    Std Err  t-value  p-value    Lower 95%   Upper 95%
(Intercept) -1.1628634  1.4449858   -0.805    0.430   -4.1678758   1.8421490
        cyl -0.1990239  1.8663298   -0.107    0.916   -4.0802693   3.6822215
       disp  1.6527521  2.2132353    0.747    0.463   -2.9499227   6.2554270
         hp -1.4728758  1.4925163   -0.987    0.335   -4.5767334   1.6309818
       drat  0.4208515  0.8743992    0.481    0.635   -1.3975612   2.2392642
         wt -3.6352666  1.8536038   -1.961    0.063   -7.4900467   0.2195135
       qsec  1.4671530  1.3059782    1.123    0.274   -1.2487772   4.1830833
         vs  0.3177630  2.1045086    0.151    0.881   -4.0588022   4.6943283
         am  2.5202267  2.0566505    1.225    0.234   -1.7568122   6.7972656
       gear  0.4835664  1.1017333    0.439    0.665   -1.8076133   2.7747462
       carb -0.3221021  1.3386010   -0.241    0.812   -3.1058753   2.4616712

No predictors are significant! What kind of model is this? We can dub such a model as a “kitchen sink” model, it includes everything but the kitchen sink. Instead of evaluating the potential predictor variables on their individual utility, relevance and uniqueness, just throw everything in the model and see what sticks. Not necessarily wrong but certainly not thoughtful. All those predictors and not even one is statistically signification with not one \(p\)-value \(< \alpha=0.05\). [See Section 12.2.1 for a discussion of relevance and uniqueness.]

With no significant predictors (features), perhaps the model has poor fit. Next, check model fit.

r$out_fit

Standard deviation of mpg: 6.0269481

Standard deviation of residuals:  2.6501970 for 21 degrees of freedom
95% range of residual variation:  11.0227729 = 2 * (2.080 * 2.6501970)

R-squared:  0.869    Adjusted R-squared:  0.807    PRESS R-squared:  0.654

Null hypothesis of all 0 population slope coefficients:
  F-statistic: 13.932     df: 10 and 21     p-value:  0.000

\(R^2\) = 0.869, a high value indeed, though a descriptive statistic on the training data. There is some overfitting because \(R^2_{PRESS}\) drops to 0.654 but still excellent fit applied to testing data. [See Section 11.6.3 for a discussion of \(R^2_{PRESS}\).]

An model with excellent fit but no significant slope coefficients? Check on collinearity, which could explain the lack of significance with inflated standard errors for the estimation of the slope coefficients, yet excellent model fit overall. [See Section 12.6.1 for a discussion of collinearity.]

r$out_collinear

       Tolerance       VIF
   cyl     0.065    15.374
  disp     0.046    21.620
    hp     0.102     9.832
  drat     0.296     3.375
    wt     0.066    15.165
  qsec     0.133     7.528
    vs     0.201     4.966
    am     0.215     4.648
  gear     0.187     5.357
  carb     0.126     7.909

There is a huge collinearity problem, the highest Tolerance of 0.296 is barely above the somewhat arbitrary threshold of 0.2. As evidence of this Tolerance problem, \(R^2=0.869\), very high, yet no slope coefficients are significant. So do a Best Subsets analysis to see which predictors survive. [See Section 12.6.2 for a discussion of Best Subsets.]

r$out_subsets

 cyl disp  hp drat  wt qsec  vs  am gear carb    R2adj    X's
   0    1   1    0   1    1   0   1    0    0    0.838      5
   0    0   1    0   1    1   0   1    0    0    0.837      4
   0    0   0    0   1    1   0   1    0    1    0.836      4
   0    1   1    1   1    1   0   1    0    0    0.835      6
   0    1   1    0   1    1   0   1    1    0    0.834      6
   0    0   0    1   1    1   0   1    0    1    0.834      5
   0    0   1    0   1    1   0   1    0    1    0.834      5
   1    1   1    0   1    1   0   1    0    0    0.834      6
   0    0   0    0   1    1   0   1    0    0    0.834      3
   0    0   0    0   1    1   0   1    1    1    0.832      5
   0    1   1    0   1    1   1   1    0    0    0.832      6
   0    0   1    1   1    1   0   1    0    0    0.832      5
   1    0   0    0   1    1   0   1    0    1    0.832      5
   0    0   1    1   1    1   0   1    0    1    0.831      6
   0    1   1    0   1    1   0   1    0    1    0.831      6
   1    0   1    0   1    1   0   1    0    0    0.831      5
   0    0   1    0   1    1   1   1    0    0    0.831      5
   0    1   0    0   1    1   0   1    0    0    0.831      4
   0    0   1    0   1    1   0   1    1    0    0.831      5
   0    0   1    0   1    1   0   1    1    1    0.831      6
   0    1   1    1   1    1   0   1    1    0    0.830      7
   0    0   0    0   1    1   1   1    0    1    0.829      5
   0    0   0    1   1    1   0   1    1    1    0.829      6
   1    1   1    1   1    1   0   1    0    0    0.829      7
   1    0   0    0   1    1   0   1    0    0    0.829      4
   0    0   0    1   1    1   0   1    0    0    0.829      4
   0    1   1    1   1    1   1   1    0    0    0.829      7
   1    1   1    0   1    1   0   1    1    0    0.829      7
   1    0   1    0   1    1   0   1    0    1    0.828      6
 cyl disp  hp drat  wt qsec  vs  am gear carb
   1    0   0    1   1    1   0   1    0    1    0.828      6
   0    1   1    0   1    1   1   1    1    0    0.828      7
   1    0   0    0   1    0   0   1    0    1    0.828      4
   0    1   1    1   1    1   0   1    0    1    0.828      7
   0    1   1    0   1    1   0   1    1    1    0.828      7
   0    0   1    0   1    0   1   1    0    0    0.828      4
   0    0   0    0   1    1   0   1    1    0    0.828      4
   0    0   0    0   1    1   1   1    0    0    0.827      4
   1    1   1    0   1    1   1   1    0    0    0.827      7
   1    1   1    0   1    1   0   1    0    1    0.827      7
   1    0   1    0   1    0   0   1    0    0    0.827      4
   1    0   1    0   1    0   0   0    0    0    0.826      3
   0    0   1    1   1    1   0   1    1    1    0.826      7
   1    0   0    0   1    0   0   0    0    1    0.826      3
   0    1   1    1   1    1   0   1    1    1    0.823      8
   0    1   1    1   1    1   1   1    1    0    0.823      8
   0    0   1    0   1    0   0   1    0    0    0.823      3
   1    1   1    1   1    1   0   1    1    0    0.823      8
   1    0   0    0   1    1   0   0    0    0    0.822      3
   1    1   1    1   1    1   1   1    0    0    0.822      8
   1    1   1    1   1    1   0   1    0    1    0.822      8
   1    1   1    0   1    1   1   1    1    0    0.821      8
   1    1   1    0   1    1   0   1    1    1    0.821      8
   0    1   1    1   1    1   1   1    0    1    0.821      8
   0    1   1    0   1    1   1   1    1    1    0.821      8
   1    1   1    0   1    1   1   1    0    1    0.820      8
   0    0   0    1   1    1   0   0    0    0    0.820      3
   0    0   1    1   1    0   0   0    0    0    0.819      3
   1    0   0    0   1    0   0   0    0    0    0.819      2
   0    0   1    0   1    0   0   0    1    0    0.818      3
 cyl disp  hp drat  wt qsec  vs  am gear carb
   0    0   1    0   1    1   0   0    0    0    0.817      3
   0    1   1    1   1    1   1   1    1    1    0.815      9
   1    1   1    1   1    1   0   1    1    1    0.815      9
   0    0   0    0   1    1   0   0    1    0    0.815      3
   1    1   1    1   1    1   1   1    1    0    0.815      9
   0    0   1    0   1    0   0   0    0    0    0.815      2
   0    0   0    0   1    1   0   0    0    0    0.814      2
   1    1   1    1   1    1   1   1    0    1    0.814      9
   1    1   1    0   1    1   1   1    1    1    0.813      9
   1    0   1    1   1    1   1   1    1    1    0.811      9
   1    1   0    1   1    1   1   1    1    1    0.807      9
   1    1   1    1   1    1   1   1    1    1    0.807     10
   1    1   1    1   1    0   1   1    1    1    0.804      9
   1    1   1    1   1    1   1   0    1    1    0.802      9
   0    0   0    0   1    0   1   0    0    0    0.787      2
   1    1   1    1   0    1   1   1    1    1    0.782      9

>>> Only first 75 of 91 rows printed
    To indicate more, add subset=n, where n is the number of lines

[based on Thomas Lumley's leaps function from the leaps package]

The model with the highest \(R^2_{adj}=0.838\) has 5 predictors. However, pursuing parsimony, we find that a model with only two predictors, cyl and wt, drops the \(R^2_{adj}\) down only 0.19, to 0.819.

Both Lasso regression and Best subsets do the variable selection just fine.

Also, there should also be a check for potential outliers. That issue does not go away just because one is running Lasso regression. And while Lasso regression addresses the collinearity problem, still good to view the actual collinearity diagnostics. [See Section 11.6.1 for a discussion of outliers.]

4.1 Reduced Model

The Lasso regression retained three predictor variables: cyl, hp, and wt. However, hp was barely larger than 0 and so perhaps will not survive additional analysis.

r3 <- reg(mpg ~ cyl + hp + wt, data=mtcars, new_scale="z", graphics=FALSE)


Rescaled Data, First Six Rows
                        mpg        cyl         hp         wt
Mazda RX4          0.909375 -0.1049878 -0.5350928 -0.6103996
Mazda RX4 Wag      0.909375 -0.1049878 -0.5350928 -0.3497853
Datsun 710         2.709375 -1.2248578 -0.7830405 -0.9170046
Hornet 4 Drive     1.309375 -0.1049878 -0.5350928 -0.0022995
Hornet Sportabout -1.390625  1.0148821  0.4129422  0.2276543
Valiant           -1.990625 -0.1049878 -0.6080186  0.2480946

r3$out_estimates

              Estimate    Std Err  t-value  p-value    Lower 95%    Upper 95%
(Intercept) -0.0000000  0.4439832   -0.000    1.000   -0.9094585    0.9094584
        cyl -1.6816539  0.9838935   -1.709    0.098   -3.6970683    0.3337606
         hp -1.2367441  0.8142698   -1.519    0.140   -2.9047001    0.4312119
         wt -3.0987483  0.7246220   -4.276    0.000   -4.5830692   -1.6144275

r3$out_fit

Standard deviation of mpg: 6.0269481

Standard deviation of residuals:  2.5115485 for 28 degrees of freedom
95% range of residual variation:  10.2893476 = 2 * (2.048 * 2.5115485)

R-squared:  0.843    Adjusted R-squared:  0.826    PRESS R-squared:  0.796

Null hypothesis of all 0 population slope coefficients:
  F-statistic: 50.171     df: 3 and 28     p-value:  0.000

Horsepower does drop from the reduced model, with a \(p\)-value of 0.140.

This conclusion is also verified by the Best Subsets analysis, repeated here for brevity.

r3$out_subsets

 cyl  hp  wt    R2adj    X's
   1   1   1    0.826      3
   1   0   1    0.819      2
   0   1   1    0.815      2
   0   0   1    0.745      1
   1   1   0    0.723      2
   1   0   0    0.717      1
   0   1   0    0.589      1

[based on Thomas Lumley's leaps function from the leaps package]

Dropping hp just does not much reduce \(R^2_{adj}\).

4.2 Final Model

r2 <- reg(mpg ~ cyl + wt, data=mtcars, new_scale="z", graphics=FALSE)


Rescaled Data, First Six Rows
                        mpg        cyl         wt
Mazda RX4          0.909375 -0.1049878 -0.6103996
Mazda RX4 Wag      0.909375 -0.1049878 -0.3497853
Datsun 710         2.709375 -1.2248578 -0.9170046
Hornet 4 Drive     1.309375 -0.1049878 -0.0022995
Hornet Sportabout -1.390625  1.0148821  0.2276543
Valiant           -1.990625 -0.1049878  0.2480946

r2$out_estimates

              Estimate    Std Err  t-value  p-value    Lower 95%    Upper 95%
(Intercept) -0.0000001  0.4538769   -0.000    1.000   -0.9282826    0.9282825
        cyl -2.6928038  0.7406008   -3.636    0.001   -4.2075025   -1.1781051
         wt -3.1222303  0.7406008   -4.216    0.000   -4.6369290   -1.6075317

r2$out_fit

Standard deviation of mpg: 6.0269481

Standard deviation of residuals:  2.5675157 for 29 degrees of freedom
95% range of residual variation:  10.5023184 = 2 * (2.045 * 2.5675157)

R-squared:  0.830    Adjusted R-squared:  0.819    PRESS R-squared:  0.790

Null hypothesis of all 0 population slope coefficients:
  F-statistic: 70.908     df: 2 and 29     p-value:  0.000

Dropping hp from the model only reduces \(R^2_{PRESS}\) by only 0.006, further indicating that little if anything is gained by including the variable in the model.