Note: No R instructions on the test. R output is generated for you.

Note: Some of the following solutions are presented verbally and then with a formula. Either the verbal answer or the formula answer work for an answer. You do not needuch both.

library(lessR)
## 
## lessR 3.9.9  feedback: gerbing@pdx.edu  web: lessRstats.com/new
## ---------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Access many vignettes to show by example how to use lessR.
## To learn about reading, writing, & manipulating data, graphics,
##   means & models, factor analysis, & customization:
## enter,  browseVignettes("lessR")  or
## visit,  https://CRAN.R-project.org/package=lessR

Data: http://web.pdx.edu/~gerbing/data/SFGsfg.csv

Customers at a restaurant, SFG, respond to the individual items with a 7-pt Likert format, from 1 to 7. Assess the customer’s perception of the outcome variable Satisfaction (x22) with the following item:

How satisfied are you with the SFG?

Not Satisfied                    Very
   At All                      Satisfied
     1    2    3    4    5    6    7

Management wishes to understand the reasons that contribute to customer satisfaction. For the outcome variable of Satisfaction (x22), consider the following three potential contributors:

Each customer evaluates each of these items on the following Likert scale.

Strongly                      Strongly
Disagree                       Agree
   1    2    3    4    5    6    7

Analysis Question: To what extent do perceived Fun, Attractiveness and Tastiness account for Overall Satisfaction of the restaurant dining experience?

First read the data.

d <- Read("http://web.pdx.edu/~gerbing/data/SFGsfg.csv")
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1        id   integer    253       0     253   2  3  4 ... 403  404  405
##  2      x_s1   integer    253       0       1   1  1  1 ... 1  1  1
##  3      x_s2   integer    253       0       1   1  1  1 ... 1  1  1
##  4      x_s3   integer    253       0       1   1  1  1 ... 1  1  1
##  5      x_s4 character    253       0       1   SFG  SFG  SFG ... SFG  SFG  SFG
##  6        x1   integer    253       0       4   3  2  7 ... 3  4  4
##  7        x2   integer    253       0       4   4  4  5 ... 4  4  3
##  8        x3   integer    253       0       6   5  7  7 ... 7  3  3
##  9        x4   integer    253       0       4   3  4  5 ... 3  5  6
## 10        x5   integer    253       0       4   4  4  4 ... 4  4  3
## 11        x6   integer    253       0       5   4  7  7 ... 7  3  3
## 12        x7   integer    253       0       6   5  7  7 ... 3  3  2
## 13        x8   integer    253       0       6   3  6  5 ... 4  5  6
## 14        x9   integer    253       0       6   6  1  7 ... 4  4  4
## 15       x10   integer    253       0       6   3  6  4 ... 3  5  6
## 16       x11   integer    253       0       3   3  3  7 ... 3  4  4
## 17       x12   integer    253       0       4   2  1  4 ... 2  3  3
## 18       x13   integer    253       0       5   5  4  5 ... 6  5  4
## 19       x14   integer    253       0       5   3  6  6 ... 6  2  2
## 20       x15   integer    253       0       4   7  5  5 ... 7  5  4
## 21       x16   integer    253       0       5   3  6  6 ... 6  2  2
## 22       x17   integer    253       0       5   5  4  5 ... 4  5  3
## 23       x18   integer    253       0       5   6  4  5 ... 6  5  4
## 24       x19   integer    253       0       6   3  1  3 ... 3  3  4
## 25       x20   integer    253       0       6   6  3  5 ... 6  4  3
## 26       x21   integer    253       0       6   3  3  7 ... 3  3  4
## 27       x22   integer    253       0       5   4  4  5 ... 6  4  4
## 28       x23   integer    253       0       5   4  4  4 ... 5  4  3
## 29       x24   integer    253       0       5   4  3  4 ... 5  3  3
## 30       x25   integer    253       0       5   4  1  3 ... 5  4  2
## 31       x26   integer    253       0       4   1  4  1 ... 4  1  4
## 32       x27   integer    253       0       3   2  2  2 ... 2  2  1
## 33       x28   integer    253       0       4   4  3  3 ... 3  3  3
## 34       x29   integer    253       0       4   3  1  4 ... 1  4  2
## 35       x30   integer    253       0       3   2  3  2 ... 1  3  3
## 36       x31 character    253       0       2   No  No  No ... Yes  No  No
## 37       x32 character    253       0       2   Male  Male  Male ... Male  Male  Female
## 38       x33   integer    253       0       3   2  1  3 ... 3  1  1
## 39       x34 character    253       0       5   18 - 25  50 - 59 ... 50 - 59  50 - 59
## 40       x35   integer    253       0      50   25000  60000 ... 15000  80000
## 41       x36    double    253       0      13   4  2  7 ... 3.333333333  4  4
## 42       x37    double    253       0      12   3  5.333333333 ... 5  6
## 43       x38    double    253       0      14   4.666666667  7 ... 3  2.666666667
## 44       x39    double    253       0       6   4  4  4.5 ... 4  4  3
## 45       x40    double    253       0      16   2.666666667 ... 3.666666667
## 46       x41    double    253       0      11   6.333333333  4 ... 4.666666667  3.666666667
## 47       x42    double    253       0       7   3  6  6 ... 6  2  2
## 48       x43    double    253       0       9   5  4  5 ... 5  5  3.5
## ------------------------------------------------------------------------------------------

Now the regression analysis. The outcome variable, x22 for Satisfaction, is the response variable or target, \(y\), for this analysis. The product attributes are the predictor variables or features: x13, x17, and x18.

reg(x22 ~ x13 + x17 + x18)

Scatterplot Matrix

  1. Show the scatter plot matrix of the relationship of each of the variables in the model with each other. Look for the two desirable types of relationships regarding (i) the relation of the response variable with the predictor variables, (ii) the relation of the predictor variables with each other, and (iii) implications for the final model.
Scatterplot matrix of all variables in the regression model.

Scatterplot matrix of all variables in the regression model.

  1. Ideal: Each predictor variable is relevant, that is, correlates highly with the variable that we are trying to predict, in this situation, Satisfaction coded as x22, the response variable.

This analysis of the response variable, Satisfaction or x22, is from the first column for the scatterplots, or, the first row for the corresponding correlation coefficients. No real strong relationships as the highest correlation is only 0.39, the correlation of x22 with x18, but at least somewhat moderate. It appears the second predictor variable, x17, barely correlates with x22, that is, a fun place is not (linearly) related to satisfaction. Note that the best-fit line for x17 with x22 is flat. That is, as perceived attractiveness of the restaurant’s interior (x17) increases, there is no or little relationship with Satisfaction (x22).

  1. Ideal: Each predictor variable brings new information to the model, that is, correlates little with the remaining predictor variables. Otherwise the predictors are to some extent redundant with each other.

The remaining scatterplots and correlations show how the predictors correlate with each other. Two of the predictors, x17 and x13, correlate more highly with each other than any variable does with the response variable, x22. So x17 does not correlate with the response, Satisfaction, and it is somewhat redundant with x13, so x17 likely is not needed in the model to understand Satisfaction.

  1. x18 is the most relevant variable, with the highest correlation with target x22 of 0.39, so is the most likely predictor variable to remain in the final model. Predictors x13 and x17 correlated 0.46, higher than either variable correlates with x22. Given this collinearity, and with x17 only correlating with target x22 of 0.15, x17 will very likely not appear in the final model. The status of x13 is not clear from the scatterplot. The statistical analysis may or may not confirm x13 as a predictor in the final model.

The Model

Model Coefficients

             Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95%
(Intercept)     1.380      0.459    3.006    0.003       0.476       2.285 
        x13     0.274      0.094    2.902    0.004       0.088       0.460 
        x17     0.012      0.069    0.180    0.858      -0.124       0.149 
        x18     0.371      0.069    5.396    0.000       0.236       0.506 
  1. Apply the model.
  1. Write the estimated regression model from this analysis.

\[\hat y_{Sat} = 1.380 + 0.274x_{Fun} + 0.012x_{Interior} + 0.371x_{Tasty}\]

Note: Many applications of regression analysis in the real world involve prediction, such as a financial application where a bank predicts the probability that a loan will be In this situation we are mainly interested in the relationship among the variables in the model, not so much prediction per se. To understand the model, however, you should be able to predict specific levels of the response or \(y\) variable, here Satisfaction, from the values of the predictor variables.

Suppose someone responds with a 4 on all three predictor variables.

  1. Manually calculate the fitted/forecasted Satisfaction score.

\[\hat y_{Sat} = 1.380 + 0.274(4) + 0.012(4) + 0.371(4) = 4.01\]

  1. Suppose that person actually obtained a Satisfaction score of 5. What is the associated residual?

\[e = y - \hat y = 5 - 4.01 = 0.99\]

The previous analysis is of the descriptive statistics, the estimated model as it describes this specific data set. As always, go beyond analysis of the specific data set to the population as a whole. That is, do inferential statistics, the hypothesis test and the confidence interval. We see (i.e., compute) the values of the descriptive statistics, such as the estimated regression coefficients, the b’s, but what happens in general in the population. What are the values of the corresponding population model, the \(\beta's\)?

Analysis of the first predictor variable: Fun Place to Eat

Hypothesis Test

  1. Specify the null hypothesis and its alternative for the hypothesis test of the slope coefficient of no relation.

Use the phrase population slope coefficient, or more simply, the Greek letter beta, as in \(\beta_{Fun}\).

\[\textrm{Null Hypothesis},\; H_0: \beta_{Fun}=0\] \[\textrm{Alternative to the Null},\; H_1: \beta_{Fun} \ne 0\]

  1. Specify the computation of the test statistic (i.e., t-statistic) by applying the relevant numbers from this specific analysis.

How many standard errors is the sample result, \(b_{Fun}\), from the hypothesized value, \(\beta = 0\)?

\[t_b = \dfrac{b_{Fun} - 0}{s_b} = \dfrac{0.274}{0.094} = 2.902\]

  1. Specify the definition of the \(p\)-value as applied to this specific analysis.

Assuming the null hypothesis is correct, the \(p\)-value of 0.004 is the probability of getting a sample slope coefficient as far or farther away from 0 as the obtained \(t\)-value of 2.902.

  1. Specify the basis for the statistical decision and the resulting statistical conclusion.

Invoke a probability threshold of \(\alpha = 0.05\) to describe an unusual or “weird” result.

This is a low probability (i.e., weird) event if the null is true. That result makes the null hypothesis appear a bit ridiculous.

\[p\textrm{-value} = 0.004 < \alpha=0.05\]

Assuming the null hypothesis of no relation, an unlikely event occurred, so reject the null hypothesis of no relationship between Satisfaction and Fun with the values of Attractive Interior and Tastiness held constant.

  1. HT: Interpretation, as you would report to management.

There is a relationship between Satisfaction and Fun with the values of Attractive Interior and Tastiness held constant. As the perceived Fun increases, Satisfaction, on average, also increases, for customers with the same levels of perceived Attractive Interior and Tastiness.

Confidence Interval

  1. Specify the value that the confidence interval estimates.

The confidence interval of the slope coefficient estimates the corresponding population value, \(\beta_{Fun}\) in terms of its relationship to Satisfaction for customers with the values of Attractive Interior and Tastiness held constant.

  1. Specify the computation of the margin of error by applying the relevant numbers of this specific analysis. Use 1.97 (or 2) for the \(t\)-cutoff.

Margin of Error is 1.97 standard errors, or \(E= t_{.025}(s_b) = (1.97)(0.094) = 0.185\)

  1. Show the computations of the confidence interval illustrated with the specific numbers from this analysis.

Lower Bound of CI:   \(b_{Fun} - E = 0.274 - 0.185 = 0.088\)
Upper Bound of CI:   \(b_{Fun} + E = 0.274 + 0.185 = 0.460\)

  1. CI: Interpretation, as you would report to management.

With 95% confidence, for each 1-unit increase of the Likert response on a 7-point response format from low to high perceived Fun, there is, on average, anywhere from about 0.088 to 0.460 increase in Satisfaction, also measured on a 7-point response format.

Consistency of HT and CI

  1. Demonstrate the consistency of the confidence interval and hypothesis test using the specific numbers for this analysis for both results.

Yes the results of both analyses are consistent: There is a relationship between Perceived Fun and Customer Satisfaction at the restaurant. For the confidence interval, all values are positive, yielding the conclusion that as Perceived Fun increases, so does Satisfaction, on average, anywhere from 0.088 to 0.460, with the values of Attractive Interior and Tastiness held constant held constant. For the hypothesis test, \(p\)-value \(< 0\), so the null hypothesis of a 0 slope coefficient is rejected, not a plausible value.

Analysis of the second predictor variable: Attractive Interior

Hypothesis Test

  1. Specify the null hypothesis and its alternative for the hypothesis test of the slope coefficient of no relation.

Use the phrase population slope coefficient, or more simply, the Greek letter beta, as in \(\beta_{Interior}\).

\[\textrm{Null Hypothesis},\; H_0: \beta_{Interior}=0\] \[\textrm{Alternative to the Null},\; H_1: \beta_{Interior} \ne 0\]

  1. Specify the computation of the test statistic (i.e., t-statistic) by applying the relevant numbers from this specific analysis.

How many standard errors is the sample result, \(b_{Interior}\), from the hypothesized value, \(\beta = 0\)?

\[t_b = \dfrac{b_{Interior} - 0}{s_b} = \dfrac{0.012}{0.069} = 0.180\]

  1. Specify the definition of the \(p\)-value as applied to this specific analysis.

Assuming the truth of the null hypothesis, the probability of the outcome for the observed value of the slope coefficient of 0.012, that is, the obtained \(t\)-value of 0.180, as far or farther from 0 is 0.858.

  1. Specify the basis for the statistical decision and the resulting statistical conclusion.

Invoke a probability threshold of \(\alpha = 0.05\) to describe an unusual or “weird” result.

\[p\textrm{-value} = 0.858 > \alpha=0.05\]

The data are consistent with the null hypothesis of no relation, the observed sample slope coefficient has a value not so far from zero. Do not reject the null hypothesis of no relationship between Satisfaction and Attractive Interior with the values of Fun and Tastiness held constant.

  1. HT: Interpretation, as you would report to management.

No relationship was detected between Satisfaction and perceived Attractiveness of the Interior for customers with the same perception of Fun and Tastiness.

Confidence Interval

  1. Specify the value that the confidence interval estimates.

The confidence interval of the slope coefficient estimates the corresponding population value, \(\beta_{Interior}\), for customers with the same perception of Fun and Tastiness.

  1. Specify the computation of the margin of error by applying the relevant numbers of this specific analysis. Use the \(t\)-cutoff of \(t_{.025=1.97}\).

Margin of Error is 1.97 standard errors, or \(E= t_{.025}(s_b) = (1.97)(0.069) = 0.1359\)

  1. Show the computations of the confidence interval illustrated with the specific numbers from this analysis.

Lower Bound of CI:   \(b_{Int} - E = 0.012 - 0.1359 = -0.124\)
Upper Bound of CI:   \(b_{Int} + E = 0.012 + 0.1359 = 0.149\)

  1. CI: Interpretation, as you would report to management.

With 95% confidence, for each 1-unit increase of the Likert response for perceived Attractive Interior on a 7-point response format, there is, on average, anywhere from about \(-0.124\) to \(0.149\) increase in Satisfaction, also measured on a 7-point response format. Zero is within this range of plausible values. No relation between Satisfaction and the extent of an Attractive Interior, with the values of Fun and Tastiness held constant.

Consistency of HT and CI

  1. Demonstrate the consistency of the confidence interval and hypothesis test using the specific numbers for this analysis for both results.

Yes the results of both analyses are consistent: No relationship between the extent of the Attractive Interior and Customer Satisfaction at the restaurant is detected. For the confidence interval, some values are positive and others are negative and include zero, yielding the conclusion that as the perceived Attractiveness of the Interior increases, there is no information regarding if Satisfaction, on average, slightly increases or decreases, or stays the same. For the hypothesis test, \(p\)-value \(> 0\), so no relationship detected.

Consider All the Predictor Variables and Model Fit

Model Selection

  1. Model Selection: Consider all the predictor variables simultaneously. In the context of the multiple regression model, are any of these much less useful in terms of predicting the response variable than the other predictors? Why or why not?

Note: The answer to this question is part of the model selection problem. To build models, we often start with many potential predictor variables. Which variables to retain in the model is a mixture of analytics and intuition. Many better and more sophisticated strategies exist, but one basic strategy is to initially retain predictor variables that are related to the response variable, and drop those that are not related.

Examine collinearity.

Collinearity 
 
      Tolerance       VIF 
  x13     0.720     1.389 
  x17     0.792     1.263 
  x18     0.890     1.124 

Although one of the feature correlations is high, collinearity as an issue is not present as each Tolerance is well above .2 (and each VIF well below 5).

Do a best subset regression to help select a parsimonious model.

Best Subset Regression Models 
 
 x13 x17 x18    R2adj    X's 
   1   0   1    0.183      2 
   1   1   1    0.180      3 
   0   1   1    0.156      2 
   0   0   1    0.151      1 
   1   0   0    0.092      1 
   1   1   0    0.088      2 
   0   1   0    0.018      1

The subset regression analysis indicates that x17 is not needed in the model. In fact, adding it reduces \(R^2_{adj}\) by 0.003. The highest \(R^2_{adj}\) for Satisfaction is obtained with x13 and x18 as predictors, 0.183. Dropping x13 to a single predictor model lowers \(R^2_{adj}\) down to 0.151.

Of the three predictor variables, only one is not significantly related to Satisfaction when holding the values of the remaining two variables constant: x17, or Has an attractive interior. Controlling for values of the other two variables, customers are apparently not interested in the attractiveness of the interior as they are to experience an enjoyable atmosphere and tasty food.

Model Fit

  1. Evaluate fit with the standard deviation of residuals.

How much variation is there of the residuals about the regression surface?

Model Fit 
 
Standard deviation of residuals:  0.907 for 249 degrees of freedom 
 
R-squared:  0.190    Adjusted R-squared:  0.180    PRESS R-squared:  0.159 

Given the usual normal distribution of the residuals, the 95% range of variation of the residuals for the response variable is almost four, on a seven point scale. There is little confidence that any one particular value of the response variable for a given customer is close to the value consistent with the model, the fitted value. That is, the model does not appear to accurately predict the response variable.

  1. Evaluate fit with \(R^2\).

The value of \(R^2=0.180\) is low, not far from zero. The model does not provide much predictive information using the three variables. The model does not do so much better at understanding the value of Satisfaction than does the null model, the mean of Satisfaction by itself. Still, Fun and Taste of the food appear to exert some influence on Satisfaction.

Conclusion

  1. What is the managerial relevance of these findings?

Consider these three predictor variables to predict satisfaction:

The analysis did not result in a strong understanding of the relationship of fun, attractiveness of the interior, and taste to customer satisfaction. However, there does appear to be some positive relationship between a fun atmosphere and the taste of the food to satisfaction with fun and taste held constant, so those aspects of the customer experience should be a focus for providing an experience that results in a satisfied customer who will provide repeat business. Still, other factors should also be investigated given that the obtained model does not have so much predictive power. Probably not worth investing much money in upgrading attractiveness, but good for management to be aware of the relationship.