Tableau: Visualize Relationships

Author

David Gerbing

Published

Apr 24, 2024, 08:42 am

This document is the Tableau implementation of the more general, conceptual discussion regarding data visualizations of relationships.

Categorical Variables

Define

Data analysis, regardless of the data analysis system, always begins with the identification and definition of categorical variables.

Define categorical variables

Before analysis begins, first define all categorical variables, what Tableau calls “dimensions”.

Only categorical variables with text data values will be automatically classified as categorical. Properly define the categorical variables as needed: Drag all categorical variables for analysis to the list of dimensions, order the levels, and attach meaningful labels to the levels. The visualization attaches a numerical value to each group, the combination of levels of the categorical variables.

Stacked Bar Chart

For more specific guidance, assume a vertical bar chart. Switch the column and row orientation for a horizontal bark chart.

  1. Select x-axis variable: Drag a categorical variable (dimension) to the Columns shelf.
  2. Select y-axis variable: Drag a continuous variable (measure) to the Rows shelf, which results in a one categorical variable bar chart.
    • If the y-axis variable is the pre-defined Count variable, then the aggregation is CNT.
    • If the y-axis is another variable, then Tableau defaults to aggregating the SUM of that variable for each level of the categorical variable on the x-axis. You may want to change that aggregation to the AVG.
  3. Add the second categorical variable: Drag another categorical variable to the Color mark.
  4. Add labels to the bars [optional]: Select the numerical variable and drag to the Label mark.
    • Tableau will again default aggregate to Sum for the labels even if the aggregation on the bar chart y-axis is AVG.
    • Usually change that aggregation for the labels to be the same aggregation on the y-axis for consistency.

Figure 1 shows the resulting stacked bar chart with the specified Marks parameters.

Figure 1: Stacked bar chart of average mean Salary by Dept sub-divided by Gender.

Unstacked Bar Chart

First, create the stacked bar chart.

To unstack the bars, drag the second categorical variable level over to the Columns shelf (or whatever shelf contains the first categorical variable).

Figure 2 shows the resulting stacked bar chart with the specified Marks parameters.

Figure 2: Unstacked bar chart of average mean Salary by Dept sub-divided by Gender.

100% Stacked Bar Chart

First create the regular stacked bar chart, except leave the aggregation of the numerical variable as SUM.

To convert to a 100% stacked bar chart:

  1. Right-click the on the numerical variable name in the respective shelf
  2. Select Quick Table Calculation
  3. In the drop-down menu, select Percent of Total
  4. Again, right-click on the numerical variable name
  5. Select Compute Using
  6. In the drop-down menu, select Cell

Figure 3 shows the resulting stacked bar chart with the specified Marks parameters.

Figure 3: 100% stacked bar chart of Salary by Dept sub-divided by Gender.

Treemap

One way to proceed is to first create the regular stacked bar chart.

Then, select the Treemap icon on the Show Me panel.

The boxes are shaded according to the value of the numerical variable for each box.

Figure 4 shows the resulting scatterplot with the specified Marks parameters. In this example, aggregate on the count of the number of occurrences in each group.

Figure 4: Treemap of counts for two categorical variables, Dept and Gender, with the size of the rectangles determined by mean Salary.

Bubble Plot

One way to proceed is to first create the regular stacked bar chart.

Then select the Bubble plot icon on the Show Me panel. Usually, change the aggregation of the numerical variable to AVG.

Figure 5 shows the resulting scatterplot with the specified Marks parameters.

Figure 5: Bubble scatterplot of two categorical variables, Dept and Gender, with the size of bubbles determined by mean Salary.

The link to the video of examples of these processes follows.

Video: Two categorical variables. [4:25]

Continuous Variables

Two-Variable Scatterplot

  1. Select x-axis variable: Drag one variable, measure, to the Columns shelf.
  2. Select y-axis variable: Drag the other variable, measure, to the Rows shelf.
  3. Disaggregate: On the Main Menu at the top of the screen, select Analysis, then select Aggregate Measures, which will uncheck the menu option, turning off aggregation.
  4. Fit line: Select the Analytics tab, next to the Data tab at the top of the list of variables. Under Model, select the mis-named Trend line option (misnamed because “trend” as generally understood applies to an orientation over time). Then, if a linear trend line is desired, choose the displayed Linear option.

Figure 6 shows the resulting scatterplot with the specified Marks parameters.

Figure 6: Scatterplot of Years and Salary with the best-fit regression line.

Stratification

Trellis plot: Drag the categorical variable (dimension) label over to one of the shelves.

Figure 7 shows the resulting Trellis (facet) scatterplot with the specified Marks parameters.

Figure 7: Scatterplot of Years and Salary with the best-fit regression line.

Same panel plot: Drag the categorical variable (dimension) label over to Color mark.

Figure 8 shows the resulting scatterplot with the specified Marks parameters.

Figure 8: Scatterplot of Years and Salary with the best-fit regression line.

Bubble Scatterplot

Drag the continuous variable (measure) over to the Size mark. Perhaps adjust the sizes, though apparently not a way to better differentiate among the different sizes other than make them all larger or smaller at the same scale.

Figure 9 shows the resulting scatterplot with the specified Marks parameters.

Figure 9: Bubble scatterplot for cars of MidPrice with HP (horsepower) with bubble size city MPG.

The link to the video of examples of these processes follows.

Video: Continuous variables. [3:31]