text and measures.
This document is the Tableau interpretation of the more general, conceptual discussion regarding continuous vs categorical variables.
Tableau refers to a variable with the corresponding name from relational database, a field. The data table read into Tableau is called a table.
Tableau calls numerical variables measures. The term measures is a reasonably appropriate term given that the data values of continuous variables are measured. The more generally recognized terms for continuous variables are either continuous or quantitative but measure it is for Tableau.
Tableau refers to categorical variables with what must be one of the more misleading and inappropriate terms in all of data science: dimensions. According to everyone else, from middle school algebra students to Ph.D. data scientists who use any analysis software other than Tableau, the meaning of dimensions is clear. That meaning has nothing to do with defining a variable as categorical. As discussed in the reading on visual aesthetics, a dimension is an axis, which corresponds to a variable, in a space for which data values of that variable and usually one or two other variables are plotted.
text and measures.
When reading data, Tableau classifies variables with data values that consist of alphanumeric characters as storage type text, equivalent to the R type character. Tableau indicates a text variable by the icon Abc as shown in Figure 1. Tableau indicates measures with the icon #.
The text variables are automatically and properly classified as categorical variables, the Tableau dimensions. However, Tableau initially mis-classifies integer categorical variables as measures. Categorical variable Plan has three unique integer values, and so is incorrectly initially classified as a measure. Fortunately, as shown shortly, you can manually correct this mis-classification.
Figure 1 also shows that after reading the data, Tableau automatically creates three additional variables: Measure Names, Measure Values, and d(Count).
Tableau is generally an elegant, straightforward data visualization system. However, its handling of categorical variables is, in my opinion, not ideal. For the three situations that require manual adjustment in the analysis of categorical variables, each situation requires a different response without the consistency and efficiency of R’s factor variable.
As shown in Figure 1, the Data pane on the left side of the screen lists the variables that Tableau has pre-categorized into dimensions and measures. The first step after reading the data is to correctly classify the categorical variables. Locate the integer-valued categorical variable in the Measures section. Then either:
Measures section to the Dimensions section.Convert to Dimension from the context menu.Consider a categorical variable with text data values “low”, “med”, and “high”. With Tableau, to specify an order of the levels other than alphabetical, manually sort the levels. To sort, do the following.
Data tab in the left-side of the Worksheet window, right-click on the categorical variable’s name. Then select Default Properties followed by select Sort.Sort dialog, select the Manual sort option.Figure 2 illustrates this process of sorting the categories for the JobSat variable.
Columns shelf and select the Sort option.
After this sort of the JobSat levels, all subsequent visualizations will show the levels in the desired order.
In Tableau, attaching value labels to integer values, especially for categorical variables, involves creating a calculated field to replace these integers with the corresponding text values.
The Tableau reference to what is typically referred to in data analytics as a transformed variable, with values computed from the data values of other variables.
Create a new categorical variable or field with text values in place of the integers. In this example, attach value labels such as GoodHealth, GetWell, BestCare to integers 1, 2, and 3, respectively. In this scenario, the text descriptors are not value labels per se, they are the actual data values for the newly created variable. Unlike R’s approach with the factor variable type, this approach wastes memory by storing all those longer text data values across all rows of data.
This transformation of data values apparently requires some SQL code, provided below, and easily adapted to other situations. After reading your data into Tableau, do the following.
Data pane on the left side.Create -> Calculated Field to open the calculated field editor.OK.
Unlike R, Tableau does not provide a method for directly indicating that a potential response in the original data table of individual data values for each person did not occur. The example here is of Gender, where, in this small Employee data table, the response of O for Other did not occur.
A workaround to display the missing level requires three steps.
Figure 4 shows the manually constructed summary table.
From this summary table, construct the visualization, such as with a bar chart, as shown in the corresponding reading.