# Decision Tree with the _sklearn_ ML Framework

<div id="author"">
David Gerbing<br>
The School of Business<br>
Portland State University<br>
gerbing@pdx.edu
</div>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preliminaries" data-toc-modified-id="Preliminaries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preliminaries</a></span></li><li><span><a href="#Get-and-Structure-Data" data-toc-modified-id="Get-and-Structure-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Get and Structure Data</a></span></li><li><span><a href="#Grid-search:-Hyperparameter-tuning-with-cross-validation" data-toc-modified-id="Grid-search:-Hyperparameter-tuning-with-cross-validation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Grid search: Hyperparameter tuning with cross-validation</a></span></li><li><span><a href="#Illustrate-the-Model" data-toc-modified-id="Illustrate-the-Model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Illustrate the Model</a></span></li><li><span><a href="#Apply-the-Model" data-toc-modified-id="Apply-the-Model-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Apply the Model</a></span></li></ul></div>

## Preliminaries

A classic application of supervised machine learning classification is customer churn. The ability to successfully forecast a customer of a company's services and products about to no longer be a customer allows the company to commit resources to attempt to salvage the relationship. 

The following data file contains information on over 7000 customers of a telecom service, including former customers who left the service plan within the last 30 days the data was collected. 

Data: http://web.pdx.edu/~gerbing/data/churn_clean.csv

The data has been cleaned according to the analysis for last week, so no need to repeat the cleaning process, or the data exploration. 


## Get and Structure Data

*a. Read the cleaned data into a data frame, and display its dimensions.*

*b. Display the variable names and the first six rows of data.*

*c. Score 1 for Churn. Create the lists of the names of _classes_ and _features_ for the graph later on, then create the data structures from these names.*

*d. Do a MinMax transformation to get all data into a 0 to 1 range. Verify.*

## Grid search: Hyperparameter tuning with cross-validation

*e. Do a grid search with a 3-fold cross-validation. Search on the following parameters and values: maximum depth with values of 3 and 4, and maximum features with values of 4, 6, and 8.*

*f. Display all the results of the cross-validation grid search.*

*g. Display the most relevant results, the means.*

*h. Main management goal is to detect churners before they churn, which means focus on avoiding false positives. So, focus on precision. Why is the model with a depth of 3 and 6 features a good model to choose?*

*i. Fit is evaluated on testing data, obtained by splitting the data into training and testing subsets. However, generally obtain the best estimates with the most data. Given sufficient fit of the selected model, estimate that model, that is, construct the tree, on the full data set.*

*j. Calculate $\hat y$.*

## Illustrate the Model

*k. Draw the tree diagram.*

*l. Managerial Interpertation.*

1. What information does managment desire to gain from this analysis?

1. Predicted to churn.
    + i. Specify the decision rules that best detect churners. 
    + ii. How many were predicted to churn? 
    + iii. How many people in that leaf churned and how many did not churn? 
    + iv. How wll does the model do in terms of identifying churners and distinguishing them from non-churners.*
1. Business Implications
    + i. Identify the problem of false positives in the one or more leaves that predict churn. 
    + ii. Why is this issue central to the business reason why this analysis is done? 
    + iii. What general advice do you have improving this model?*

## Apply the Model

*m. Apply the model to a person who has the following scores.*

- Charges = 45
- TotalCharges = 45
- MtoM = 29
- Paperless = 1
- Check = 1
- Phone = 0
- tenure = 0 
- Dependents = 1
- Internet = 0