Note: No R instructions on the test. R output is generated for you.
Note: Some of the following solutions are presented verbally and then with a formula. Either the verbal answer or the formula answer work for an answer. You do not needuch both.
library(lessR)
##
## lessR 3.9.9 feedback: gerbing@pdx.edu web: lessRstats.com/new
## ---------------------------------------------------------------
## > d <- Read("") Read text, Excel, SPSS, SAS, or R data file
## d is default data frame, data= in analysis routines optional
##
## Access many vignettes to show by example how to use lessR.
## To learn about reading, writing, & manipulating data, graphics,
## means & models, factor analysis, & customization:
## enter, browseVignettes("lessR") or
## visit, https://CRAN.R-project.org/package=lessR
Data: http://web.pdx.edu/~gerbing/data/SFGsfg.csv
Customers at a restaurant, SFG, respond to the individual items with a 7-pt Likert format, from 1 to 7. Assess the customer’s perception of the outcome variable Satisfaction (x22) with the following item:
How satisfied are you with the SFG?
Not Satisfied Very
At All Satisfied
1 2 3 4 5 6 7
Management wishes to understand the reasons that contribute to customer satisfaction. For the outcome variable of Satisfaction (x22), consider the following three potential contributors:
Each customer evaluates each of these items on the following Likert scale.
Strongly Strongly
Disagree Agree
1 2 3 4 5 6 7
Analysis Question: To what extent do perceived Fun, Attractiveness and Tastiness account for Overall Satisfaction of the restaurant dining experience?
First read the data.
d <- Read("http://web.pdx.edu/~gerbing/data/SFGsfg.csv")
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 id integer 253 0 253 2 3 4 ... 403 404 405
## 2 x_s1 integer 253 0 1 1 1 1 ... 1 1 1
## 3 x_s2 integer 253 0 1 1 1 1 ... 1 1 1
## 4 x_s3 integer 253 0 1 1 1 1 ... 1 1 1
## 5 x_s4 character 253 0 1 SFG SFG SFG ... SFG SFG SFG
## 6 x1 integer 253 0 4 3 2 7 ... 3 4 4
## 7 x2 integer 253 0 4 4 4 5 ... 4 4 3
## 8 x3 integer 253 0 6 5 7 7 ... 7 3 3
## 9 x4 integer 253 0 4 3 4 5 ... 3 5 6
## 10 x5 integer 253 0 4 4 4 4 ... 4 4 3
## 11 x6 integer 253 0 5 4 7 7 ... 7 3 3
## 12 x7 integer 253 0 6 5 7 7 ... 3 3 2
## 13 x8 integer 253 0 6 3 6 5 ... 4 5 6
## 14 x9 integer 253 0 6 6 1 7 ... 4 4 4
## 15 x10 integer 253 0 6 3 6 4 ... 3 5 6
## 16 x11 integer 253 0 3 3 3 7 ... 3 4 4
## 17 x12 integer 253 0 4 2 1 4 ... 2 3 3
## 18 x13 integer 253 0 5 5 4 5 ... 6 5 4
## 19 x14 integer 253 0 5 3 6 6 ... 6 2 2
## 20 x15 integer 253 0 4 7 5 5 ... 7 5 4
## 21 x16 integer 253 0 5 3 6 6 ... 6 2 2
## 22 x17 integer 253 0 5 5 4 5 ... 4 5 3
## 23 x18 integer 253 0 5 6 4 5 ... 6 5 4
## 24 x19 integer 253 0 6 3 1 3 ... 3 3 4
## 25 x20 integer 253 0 6 6 3 5 ... 6 4 3
## 26 x21 integer 253 0 6 3 3 7 ... 3 3 4
## 27 x22 integer 253 0 5 4 4 5 ... 6 4 4
## 28 x23 integer 253 0 5 4 4 4 ... 5 4 3
## 29 x24 integer 253 0 5 4 3 4 ... 5 3 3
## 30 x25 integer 253 0 5 4 1 3 ... 5 4 2
## 31 x26 integer 253 0 4 1 4 1 ... 4 1 4
## 32 x27 integer 253 0 3 2 2 2 ... 2 2 1
## 33 x28 integer 253 0 4 4 3 3 ... 3 3 3
## 34 x29 integer 253 0 4 3 1 4 ... 1 4 2
## 35 x30 integer 253 0 3 2 3 2 ... 1 3 3
## 36 x31 character 253 0 2 No No No ... Yes No No
## 37 x32 character 253 0 2 Male Male Male ... Male Male Female
## 38 x33 integer 253 0 3 2 1 3 ... 3 1 1
## 39 x34 character 253 0 5 18 - 25 50 - 59 ... 50 - 59 50 - 59
## 40 x35 integer 253 0 50 25000 60000 ... 15000 80000
## 41 x36 double 253 0 13 4 2 7 ... 3.333333333 4 4
## 42 x37 double 253 0 12 3 5.333333333 ... 5 6
## 43 x38 double 253 0 14 4.666666667 7 ... 3 2.666666667
## 44 x39 double 253 0 6 4 4 4.5 ... 4 4 3
## 45 x40 double 253 0 16 2.666666667 ... 3.666666667
## 46 x41 double 253 0 11 6.333333333 4 ... 4.666666667 3.666666667
## 47 x42 double 253 0 7 3 6 6 ... 6 2 2
## 48 x43 double 253 0 9 5 4 5 ... 5 5 3.5
## ------------------------------------------------------------------------------------------
Now the regression analysis. The outcome variable, x22 for Satisfaction, is the response variable or target, \(y\), for this analysis. The product attributes are the predictor variables or features: x13, x17, and x18.
reg(x22 ~ x13 + x17 + x18)
Scatterplot matrix of all variables in the regression model.
This analysis of the response variable, Satisfaction or x22, is from the first column for the scatterplots, or, the first row for the corresponding correlation coefficients. No real strong relationships as the highest correlation is only 0.39, the correlation of x22 with x18, but at least somewhat moderate. It appears the second predictor variable, x17, barely correlates with x22, that is, a fun place is not (linearly) related to satisfaction. Note that the best-fit line for x17 with x22 is flat. That is, as perceived attractiveness of the restaurant’s interior (x17) increases, there is no or little relationship with Satisfaction (x22).
The remaining scatterplots and correlations show how the predictors correlate with each other. Two of the predictors, x17 and x13, correlate more highly with each other than any variable does with the response variable, x22. So x17 does not correlate with the response, Satisfaction, and it is somewhat redundant with x13, so x17 likely is not needed in the model to understand Satisfaction.
Model Coefficients
Estimate Std Err t-value p-value Lower 95% Upper 95%
(Intercept) 1.380 0.459 3.006 0.003 0.476 2.285
x13 0.274 0.094 2.902 0.004 0.088 0.460
x17 0.012 0.069 0.180 0.858 -0.124 0.149
x18 0.371 0.069 5.396 0.000 0.236 0.506
\[\hat y_{Sat} = 1.380 + 0.274x_{Fun} + 0.012x_{Interior} + 0.371x_{Tasty}\]
Note: Many applications of regression analysis in the real world involve prediction, such as a financial application where a bank predicts the probability that a loan will be In this situation we are mainly interested in the relationship among the variables in the model, not so much prediction per se. To understand the model, however, you should be able to predict specific levels of the response or \(y\) variable, here Satisfaction, from the values of the predictor variables.
Suppose someone responds with a 4 on all three predictor variables.
\[\hat y_{Sat} = 1.380 + 0.274(4) + 0.012(4) + 0.371(4) = 4.01\]
\[e = y - \hat y = 5 - 4.01 = 0.99\]
The previous analysis is of the descriptive statistics, the estimated model as it describes this specific data set. As always, go beyond analysis of the specific data set to the population as a whole. That is, do inferential statistics, the hypothesis test and the confidence interval. We see (i.e., compute) the values of the descriptive statistics, such as the estimated regression coefficients, the b’s, but what happens in general in the population. What are the values of the corresponding population model, the \(\beta's\)?
Use the phrase population slope coefficient, or more simply, the Greek letter beta, as in \(\beta_{Fun}\).
\[\textrm{Null Hypothesis},\; H_0: \beta_{Fun}=0\] \[\textrm{Alternative to the Null},\; H_1: \beta_{Fun} \ne 0\]
How many standard errors is the sample result, \(b_{Fun}\), from the hypothesized value, \(\beta = 0\)?
\[t_b = \dfrac{b_{Fun} - 0}{s_b} = \dfrac{0.274}{0.094} = 2.902\]
Assuming the null hypothesis is correct, the \(p\)-value of 0.004 is the probability of getting a sample slope coefficient as far or farther away from 0 as the obtained \(t\)-value of 2.902.
Invoke a probability threshold of \(\alpha = 0.05\) to describe an unusual or “weird” result.
This is a low probability (i.e., weird) event if the null is true. That result makes the null hypothesis appear a bit ridiculous.
\[p\textrm{-value} = 0.004 < \alpha=0.05\]
Assuming the null hypothesis of no relation, an unlikely event occurred, so reject the null hypothesis of no relationship between Satisfaction and Fun with the values of Attractive Interior and Tastiness held constant.
There is a relationship between Satisfaction and Fun with the values of Attractive Interior and Tastiness held constant. As the perceived Fun increases, Satisfaction, on average, also increases, for customers with the same levels of perceived Attractive Interior and Tastiness.
The confidence interval of the slope coefficient estimates the corresponding population value, \(\beta_{Fun}\) in terms of its relationship to Satisfaction for customers with the values of Attractive Interior and Tastiness held constant.
Margin of Error is 1.97 standard errors, or \(E= t_{.025}(s_b) = (1.97)(0.094) = 0.185\)
Lower Bound of CI: \(b_{Fun} - E = 0.274 - 0.185 = 0.088\)
Upper Bound of CI: \(b_{Fun} + E = 0.274 + 0.185 = 0.460\)
With 95% confidence, for each 1-unit increase of the Likert response on a 7-point response format from low to high perceived Fun, there is, on average, anywhere from about 0.088 to 0.460 increase in Satisfaction, also measured on a 7-point response format.
Yes the results of both analyses are consistent: There is a relationship between Perceived Fun and Customer Satisfaction at the restaurant. For the confidence interval, all values are positive, yielding the conclusion that as Perceived Fun increases, so does Satisfaction, on average, anywhere from 0.088 to 0.460, with the values of Attractive Interior and Tastiness held constant held constant. For the hypothesis test, \(p\)-value \(< 0\), so the null hypothesis of a 0 slope coefficient is rejected, not a plausible value.
Use the phrase population slope coefficient, or more simply, the Greek letter beta, as in \(\beta_{Interior}\).
\[\textrm{Null Hypothesis},\; H_0: \beta_{Interior}=0\] \[\textrm{Alternative to the Null},\; H_1: \beta_{Interior} \ne 0\]
How many standard errors is the sample result, \(b_{Interior}\), from the hypothesized value, \(\beta = 0\)?
\[t_b = \dfrac{b_{Interior} - 0}{s_b} = \dfrac{0.012}{0.069} = 0.180\]
Assuming the truth of the null hypothesis, the probability of the outcome for the observed value of the slope coefficient of 0.012, that is, the obtained \(t\)-value of 0.180, as far or farther from 0 is 0.858.
Invoke a probability threshold of \(\alpha = 0.05\) to describe an unusual or “weird” result.
\[p\textrm{-value} = 0.858 > \alpha=0.05\]
The data are consistent with the null hypothesis of no relation, the observed sample slope coefficient has a value not so far from zero. Do not reject the null hypothesis of no relationship between Satisfaction and Attractive Interior with the values of Fun and Tastiness held constant.
No relationship was detected between Satisfaction and perceived Attractiveness of the Interior for customers with the same perception of Fun and Tastiness.
The confidence interval of the slope coefficient estimates the corresponding population value, \(\beta_{Interior}\), for customers with the same perception of Fun and Tastiness.
Margin of Error is 1.97 standard errors, or \(E= t_{.025}(s_b) = (1.97)(0.069) = 0.1359\)
Lower Bound of CI: \(b_{Int} - E = 0.012 - 0.1359 = -0.124\)
Upper Bound of CI: \(b_{Int} + E = 0.012 + 0.1359 = 0.149\)
With 95% confidence, for each 1-unit increase of the Likert response for perceived Attractive Interior on a 7-point response format, there is, on average, anywhere from about \(-0.124\) to \(0.149\) increase in Satisfaction, also measured on a 7-point response format. Zero is within this range of plausible values. No relation between Satisfaction and the extent of an Attractive Interior, with the values of Fun and Tastiness held constant.
Yes the results of both analyses are consistent: No relationship between the extent of the Attractive Interior and Customer Satisfaction at the restaurant is detected. For the confidence interval, some values are positive and others are negative and include zero, yielding the conclusion that as the perceived Attractiveness of the Interior increases, there is no information regarding if Satisfaction, on average, slightly increases or decreases, or stays the same. For the hypothesis test, \(p\)-value \(> 0\), so no relationship detected.
Note: The answer to this question is part of the model selection problem. To build models, we often start with many potential predictor variables. Which variables to retain in the model is a mixture of analytics and intuition. Many better and more sophisticated strategies exist, but one basic strategy is to initially retain predictor variables that are related to the response variable, and drop those that are not related.
Examine collinearity.
Collinearity
Tolerance VIF
x13 0.720 1.389
x17 0.792 1.263
x18 0.890 1.124
Although one of the feature correlations is high, collinearity as an issue is not present as each Tolerance is well above .2 (and each VIF well below 5).
Do a best subset regression to help select a parsimonious model.
Best Subset Regression Models
x13 x17 x18 R2adj X's
1 0 1 0.183 2
1 1 1 0.180 3
0 1 1 0.156 2
0 0 1 0.151 1
1 0 0 0.092 1
1 1 0 0.088 2
0 1 0 0.018 1
The subset regression analysis indicates that x17 is not needed in the model. In fact, adding it reduces \(R^2_{adj}\) by 0.003. The highest \(R^2_{adj}\) for Satisfaction is obtained with x13 and x18 as predictors, 0.183. Dropping x13 to a single predictor model lowers \(R^2_{adj}\) down to 0.151.
Of the three predictor variables, only one is not significantly related to Satisfaction when holding the values of the remaining two variables constant: x17, or Has an attractive interior. Controlling for values of the other two variables, customers are apparently not interested in the attractiveness of the interior as they are to experience an enjoyable atmosphere and tasty food.
How much variation is there of the residuals about the regression surface?
Model Fit
Standard deviation of residuals: 0.907 for 249 degrees of freedom
R-squared: 0.190 Adjusted R-squared: 0.180 PRESS R-squared: 0.159
Given the usual normal distribution of the residuals, the 95% range of variation of the residuals for the response variable is almost four, on a seven point scale. There is little confidence that any one particular value of the response variable for a given customer is close to the value consistent with the model, the fitted value. That is, the model does not appear to accurately predict the response variable.
The value of \(R^2=0.180\) is low, not far from zero. The model does not provide much predictive information using the three variables. The model does not do so much better at understanding the value of Satisfaction than does the null model, the mean of Satisfaction by itself. Still, Fun and Taste of the food appear to exert some influence on Satisfaction.
Consider these three predictor variables to predict satisfaction:
The analysis did not result in a strong understanding of the relationship of fun, attractiveness of the interior, and taste to customer satisfaction. However, there does appear to be some positive relationship between a fun atmosphere and the taste of the food to satisfaction with fun and taste held constant, so those aspects of the customer experience should be a focus for providing an experience that results in a satisfied customer who will provide repeat business. Still, other factors should also be investigated given that the obtained model does not have so much predictive power. Probably not worth investing much money in upgrading attractiveness, but good for management to be aware of the relationship.