Short-Answer Problems

These concepts can appear on the short-answer part of the tests. As part of this homework, answer the following questions, usually just several sentences that include the definition.

  1. Identify and briefly explain the two types of supervised machine learning regarding the nature of the target variable.
    Supervised machine learning trains a model to predict a target. The two types of supervised machine learning either forecast a continuous variable, such as linear regression, or forecast a classification into a category, such as logistic regression.

  2. Why is binary prediction the process of classification?
    Binary prediction is classification into only one of two categories. Distinguish classification into a category from measurement of a quantitative variable. For example, classify someone as Male or Female body type, but measure their height.

  3. In machine learning the variable to forecasted or predicted is called the target or the label. When is the term label most appropriate?
    Label is most appropriate when forecasting the level or category of a categorical variable. The level is described by a label in the usual English definition of the word.

  4. When predicting a binary outcome, what are the two ways to be correct? Define your terms.
    There are two groups. The two ways to be correct are to correctly classify a sample into its correct group, one called the positive group and the other the negative group, so a true positive or a true negative.

  5. When predicting a binary outcome, what are the two ways to be wrong? Define your terms.
    False Negative, when the model predicts the sample in a group is not in the group. False Positive, when the model predicts the sample is in the group, and it is not.

  6. What is the accuracy of a binary prediction? When it is of the most interest?
    Accuracy is the percent of correct classifications, the number of true positives plus true negatives divided by the total number of samples, including the false classifications. It is of the most interest when the cost of the two mis-classifications are the same.

  7. What is the purpose of the sensitivity (recall) metric?
    Sensitivity assesses how many samples in the positive group are correctly classified as positive. It is applicable to situations where the concern is of missing something that exists, such as cancer in a medical diagnosis, a terrorist as a passenger on an airplane, or a poor-quality part in a manufacturing scenario.

  8. What is the purpose of the precision metric?
    The purpose of precision is to determine how many samples in the negative group are incorrectly classified into the positive group. How many airline passengers were incorrectly identified initially as terrorists? How many patients were told they have cancer when they did not? How many good parts were incorrectly classified as bad?

  9. What metric balances sensitivity and precision? How does it accomplish the balance?
    The value of the F1 metric lies between the sensitivity and precision values, as their harmonic mean.

  10. Why is the logit transformation of best fit more appropriate for binary classification than a straight line of best fit?
    A straight line cannot effectively summarize a scatter plot of a target variable that only has two values. Instead of a cloud of points, there are two lines of points across the values of the x-variable. Instead of a straight line, an S-shaped curve of the logit provides a more suitable curve for summarizing the relation between continuous x and binary y.

  11. What is the target variable in a logit regression analysis? Why?
    The target variable is the logarithm of the odds ratio. The reason is to have a target that reflects probability, but varies from negative infinity to positive infinity while effectively summarizing the relationship between a binary target and the features.

  12. What is the process of an iterative solution for model coefficients?
    When a direct algebraic solution is not possible, then the method to compute the estimated parameter values relies upon iteration, the method of gradient descent. Start with a somewhat if not completely random guess as to the parameter values. Then, using calculus, the algorithm moves the parameter values in a direction that further minimizes the error. Keep going until no further error minimization is obtained, or until the maximum number of iterations is exceeded because some estimates never converge. The problem is that a local minimum may have been obtained. With another set of starting values, a better minimum may result.

  13. When in the sklearn Python machine learning environment, how similar is the code for doing k-fold validation for least-squares regression vs. logistic regression? What is the distinction?
    This is a huge strength of the sklearn machine learning analysis environment. Simple code changes, such as instantiating another estimation module, can invoke an entirely different estimation algorithm. The analyst can easily test multiple algorithms and choose the best for the data set.

  14. When breaking data into training/test subsets, when forecasting a categorical variable why do we want the same proportion of people in each group in each subset as in the full data set?
    To evaluate how good a model is, we need to compare to forecasting without the model. That forecast is from what is called the null model, which, for logistic regression, is the forecast to the group with the most members. If the proportion of members in the groups change for each random assignment of samples to training and test data, so does the performance of the model.