Tableau: Continuity vs Categories

Author

David Gerbing

Published

Apr 2, 2024, 02:03 pm

This document is the Tableau interpretation of the more general, conceptual discussion regarding continuous vs categorical variables.

Data Storage Types

Tableau refers to a variable with the corresponding name from relational database, a field. The data table read into Tableau is called a table.

Continuous Variables

Tableau calls numerical variables measures. The term measures is a reasonably appropriate term given that the data values of continuous variables are measured. The more generally recognized terms for continuous variables are either continuous or quantitative but measure it is for Tableau.

Categorical Variables

Tableau refers to categorical variables with what must be one of the more misleading and inappropriate terms in all of data science: dimensions. According to everyone else, from middle school algebra students to Ph.D. data scientists who use any analysis software other than Tableau, the meaning of dimensions is clear. That meaning has nothing to do with defining a variable as categorical. As discussed in the reading on visual aesthetics, a dimension is an axis, which corresponds to a variable, in a space for which data values of that variable and usually one or two other variables are plotted.

Figure 1: Initial Tableau classification of read variables into storage types: text and measures.

When reading data, Tableau classifies variables with data values that consist of alphanumeric characters as storage type text, equivalent to the R type character. Tableau indicates a text variable by the icon Abc as shown in Figure 1. Tableau indicates measures with the icon #.

The text variables are automatically and properly classified as categorical variables, the Tableau dimensions. However, Tableau initially mis-classifies integer categorical variables as measures. Categorical variable Plan has three unique integer values, and so is incorrectly initially classified as a measure. Fortunately, as shown shortly, you can manually correct this mis-classification.

Figure 1 also shows that after reading the data, Tableau automatically creates three additional variables: Measure Names, Measure Values, and d(Count).

Analyze Categorical Variables

Tableau is generally an elegant, straightforward data visualization system. However, its handling of categorical variables is, in my opinion, not ideal. For the three situations that require manual adjustment in the analysis of categorical variables, each situation requires a different response without the consistency and efficiency of R’s factor variable.

Accept Existing Levels and Order

As shown in Figure 1, the Data pane on the left side of the screen lists the variables that Tableau has pre-categorized into dimensions and measures. The first step after reading the data is to correctly classify the categorical variables. Locate the integer-valued categorical variable in the Measures section. Then either:

  • Drag the field from the Measures section to the Dimensions section.
  • Right-click on the variable name and select Convert to Dimension from the context menu.

Order Character String Levels

Consider a categorical variable with text data values “low”, “med”, and “high”. With Tableau, to specify an order of the levels other than alphabetical, manually sort the levels. To sort, do the following.

  1. Under the Data tab in the left-side of the Worksheet window, right-click on the categorical variable’s name. Then select Default Properties followed by select Sort.
  2. In the Sort dialog, select the Manual sort option.
  3. Drag and arrange the values “low”, “med”, and “high” in the desired order.
  4. Click OK to apply the sort.

Figure 2 illustrates this process of sorting the categories for the JobSat variable.

(a) Right-click on the variable name in the Columns shelf and select the Sort option.
(b) Choose Manual sort.
(c) Drag the category names into the desired order.
Figure 2: Ordering the levels of a categorical variable in Tableau.

After this sort of the JobSat levels, all subsequent visualizations will show the levels in the desired order.

Label Integer Values

In Tableau, attaching value labels to integer values, especially for categorical variables, involves creating a calculated field to replace these integers with the corresponding text values.

Calculated field

The Tableau reference to what is typically referred to in data analytics as a transformed variable, with values computed from the data values of other variables.

Create a new categorical variable or field with text values in place of the integers. In this example, attach value labels such as GoodHealth, GetWell, BestCare to integers 1, 2, and 3, respectively. In this scenario, the text descriptors are not value labels per se, they are the actual data values for the newly created variable. Unlike R’s approach with the factor variable type, this approach wastes memory by storing all those longer text data values across all rows of data.

This transformation of data values apparently requires some SQL code, provided below, and easily adapted to other situations. After reading your data into Tableau, do the following.

  1. Create a Calculated Field (Variable):
    • Navigate to the Data pane on the left side.
    • Right-click on the variable name Plan and select
      Create -> Calculated Field to open the calculated field editor.
  2. Define the transformation to obtain the Calculated Field:
    • Name your calculated field, e.g., “Plan Labels”, shown in Figure 3.
    • Enter the following SQL code that maps each integer to its corresponding text value, then click OK.
Figure 3: SQL code to define a new variable, Plan Labels, from the given variable Plan.
  1. Now, there are two variables in place of a single variable: the original integer-valued categorical variable Plan and the new categorical variable Plan Labels with the corresponding descriptive text values. Use this new variable (field) in any subsequent visualizations.

Add Levels Beyond the Data

Unlike R, Tableau does not provide a method for directly indicating that a potential response in the original data table of individual data values for each person did not occur. The example here is of Gender, where, in this small Employee data table, the response of O for Other did not occur.

A workaround to display the missing level requires three steps.

  1. Count the values of the existing data values for each category, such as with a bar chart.
  2. Manually create the summary table of counts for the categories, such as in Excel or similar, here for existing values M and W and then add a row for O with a 0 count.
  3. Do the visualization directly from the summary table read as the data.

Figure 4 shows the manually constructed summary table.

Figure 4: Manually created summary table of counts.

From this summary table, construct the visualization, such as with a bar chart, as shown in the corresponding reading.