Histogram
Data Values into Bins
In contrast to the relatively few unique values of a categorical variable, a continuous variable has many potential values. How many potential values? Generally, there are too many unique data values to plot individually on a single graph. Consider annual salary, where every single value to the nearest penny must be considered from about $20,000 to $500,000 and more. Because there are so many potential data values, in most data sets, too many possible values never occur. For example, a specific annual salary of $83,924.79 would likely not occur unless the sample size was extremely large and even then not likely to the nearest penny.
What to do with all the unique data values that cannot be individually plotted?
A sequence of adjacent intervals, each generally the same size, which forms groups of values of a continuous variable.
From lessR version 4.5.4.
Partition the range of values into bins, sometimes called classes. Figure 1 presents an example of a bin that contains values from $50,000 to $60,000 for annual salaries.
Each bin contains similar data values, defined by a lower and an upper boundary that specify the bin’s width. In Figure 1, bin width is $10,000, and the midpoint is $55,000.
To evaluate the distribution of a continuous variable, first define the bins and then place each data value into its respective bin, as illustrated in Figure 2. Assign an annual salary of $57,358 to the bin $50,000 to $60,000.
Consistently assign values precisely equal to a bin boundary to either the adjacent lower bin or the adjacent higher bin. Each computer application that generates histograms defaults to one of those choices.
Example
The most typical graphical display of the variation of a continuous variable is the histogram.
The visualization of a continuous variable with values distributed across a set of adjacent bins, usually of the same width, plotted on the horizontal, x-axis, with the count of the number of data values in each bin plotted on the vertical, y-axis.
Like the bar graph, the histogram consists of a set of bars. However, instead of associating a single data value with a number, the histogram associates each bin with a frequency or count. Another distinction is that the adjacent bars of a histogram share a common side to indicate the underlying continuity of a continuous variable.
For example, consider the histogram shown in Figure 3 of the annual salaries of 37 different employees from the Employee data table included with lessR.
X(Salary)
type: Parameter to indicate the type of x-axis distribution plot. The default value is "histogram".
Unless you specify quiet=TRUE in the function call, the following text output also results.
The resulting frequency table lists each bin with the corresponding Count, Proportion, Cumulative Count, and Cumulative Proportion.
Bin Width: 10000
Number of Bins: 10
Bin Midpnt Count Prop Cumul.c Cumul.p
---------------------------------------------------------
50000 > 60000 55000 4 0.11 4 0.11
60000 > 70000 65000 8 0.22 12 0.32
70000 > 80000 75000 8 0.22 20 0.54
80000 > 90000 85000 5 0.14 25 0.68
90000 > 100000 95000 3 0.08 28 0.76
100000 > 110000 105000 5 0.14 33 0.89
110000 > 120000 115000 1 0.03 34 0.92
120000 > 130000 125000 1 0.03 35 0.95
130000 > 140000 135000 1 0.03 36 0.97
140000 > 150000 145000 1 0.03 37 1.00
Also provided are the summary statistics of the distribution.
--- Salary ---
n miss mean sd min mdn max
37 0 83795.557 21799.533 56124.970 79547.600 144419.230
An outlier analysis should always be done for each variable in an analysis. Since it should always be done, the function does that analysis by default. The box plot provides the same outlier analysis but without that visualization.
--- Outliers --- from the box plot: 1
Small Large
----- -----
144419.2
The width of the bins for this histogram is 10000. Here we see, for example, that there are four people who receive a salary between $50,000 and $60,000. Because the data are organized into bins, from the histogram alone we do not know the precise salary of that person, only that it is somewhere between $50,000 and $60,000.
Frequency Polygon
A visualization closely related to the histogram is the frequency polygon.
Replace the bars of a histogram by plotting points at the midpoints of the bins (at the tops of the bars) and connecting adjacent points with line segments to form a continuous outline of the distribution.
Figure 4 shows the frequency polygon that corresponds to the histogram in Figure 3.
X(Salary, type="freq_poly")
type: Parameter to indicate the type of x-axis distribution plot. Set to "freq_poly" to plot a frequency polygon.
Note: Can also control the bin width as shown shortly. The following frequency polygon is based on a bin width of 10,000.
A histogram is usually preferred when you want to emphasize actual bin counts and the contribution of bin width to the overall visualization. A frequency polygon is usually preferred when you want to focus on the distribution’s overall shape.
Artifacts and Issues
Bin Width
For both the histogram and the frequency polygon, the choice of bin width is critical for discerning the overall shape of a distribution.
Changing the width of the bins not only changes their width, it can also change the apparent overall shape of the histogram.
Yet this choice often relies on potentially subjective decisions made by the analyst.
Unlike many statistical analyses, there is no single correct histogram for a given data set, so it is often helpful to examine several histograms with different bin widths.
Although there is no single correct answer for choosing the bin width for a given data set, some choices provide more clarity than others for discerning the basic shape of the distribution. The default histogram generated by an analysis system uses a default bin width, but that initial choice may need adjustment because the underlying algorithm does not always identify a useful value. Through a process of trial and error, select a bin width that displays as much detail as possible for the available sample size without introducing excessive random noise.
What guidance do we have for choosing a bin width? Although virtually any distributional shape is possible, most distributions of a continuous variable encountered in practice tend either to increase or decrease smoothly, or to resemble the normal, or “bell-shaped,” curve, with frequencies increasing up to a value and then decreasing after that value. In particular, most real-life distributions are not characterized by repeated up-and-down zigzags.
As shown in the following examples, random ups and downs in a histogram’s bars usually indicate sampling error resulting from a bin width that is too small. That problem can be corrected by increasing the bin width, but only to a point. Bins that are too wide obscure information and fail to use all the information available in the data.
Choosing an effective bin width depends on several factors, so there is no single precisely best answer. Still, some choices are better than others. One common problem is bins that are too small.
A bin width can be too small relative to the amount of available data, yielding an undersmoothed histogram.
Too many bins produce too much detail and therefore too much random noise.
The result is an undersmoothed histogram that suggests random ups and downs that would not be reproduced if a new comparable data set were collected and analyzed. The histogram shown in Figure 3 is somewhat undersmoothed.
Could the irregularities characteristic of that histogram, or of its associated frequency polygon, be real? Could those irregularities consistently reappear in other samples from the same population? The answer is yes: such zigzags could reflect an actual characteristic of the underlying population. However, with such a small sample of only 37 employees, the repeatability of those irregular fluctuations is very unlikely. Instead, the underlying distributional shape is likely much smoother.
To be more dramatic, we can decrease the bin with even more, as in Figure 5.
X(Salary, bin_width=5000)
bin_width: Parameter to specify the width of the bins to provide for custom values.
These undersmoothed histograms reflect too much random sampling variability—too many random ups and downs—relative to the likely much smoother true shape of the underlying distribution. A bin width that is too narrow obscures the underlying distribution’s true shape. Imagine trying to interpret this histogram, having to repeatedly state: Salaries just under $80,000 occur relatively infrequently, then above $80,000 there is an increase, followed by a quick decline just above $85,000. True enough for this particular sample, but largely irrelevant for the population as a whole—and therefore for any other comparable sample.
In practice, when encountering undersmoothed histograms, increase the bin width and experiment. The histogram in Figure 6, with a bin width of $13,000, is an excellent candidate for a final display.
To further illustrate the more smoothed estimation of the underlying distribution, consider the corresponding frequency polygon in Figure 7.
Interpretation. The histogram in Figure 6 indicates a distribution with relatively few employees earning around $50,000, followed by increasing frequency up to about $90,000, and then a rapid decline for higher salaries.
This histogram shows the same general form as the histogram in Figure 3, which uses a bin width of $10,000. The improvement is that the revised histogram with a bin width of $13,000 reveals the underlying shape without the distracting irregular fluctuations.
An undersmoothed histogram obscures the underlying distribution. Similarly, a bin width that is too wide also fails to properly display the distribution’s shape.
Bins that are too large obscure details of the underlying distribution.
Oversmoothed histograms do not utilize the data efficiently. They bury some of the information inherent in the data.
The histogram in Figure 8 is intentionally oversmoothed to illustrate this loss of information.
For this histogram, the bin width is $25000. If we had only this display, we might conclude that the largest number of salaries falls between $40,000 and $65,000, followed by a steady decline at higher salary levels. While this description is technically correct, the more informative histogram in Figure 6 reveals a more nuanced pattern: an initial increase in frequency at lower salary levels, followed by a decline.
To summarize, the default bin width provided by a computer algorithm is often reasonable, but it can frequently be improved by experimentation. The goal is to display as much meaningful information as possible, given the size of the sample from which the histogram is constructed.
Bin Shift
Another example of the potential arbitrariness of a histogram is the bin shift artifact.
Changing the histogram’s starting point often changes its shape.
Particularly in small samples, the histogram’s shape may be overly dependent on the starting position of the first bin. A histogram of the same data with the same bin width can substantially change shape simply by changing the starting point of the first bin.
To illustrate the impact of bin shift, consider a sample of 15 shipment times, measured as the number of days from the order date until the shipment is received, shown in Figure 9.
The default starting point of the histogram in Figure 9 is 5. The histogram indicates steadily increasing frequencies for longer ship times. To revise the histogram, retain the bin width of 1 but instead start the first bin at 4.5, as shown in Figure 10.
X(ShipTime, bin_start=4.5)
bin_start: Parameter to specify the starting value of the bins to provide for custom values.
Note: The data are available at: https://web.pdx.edu/~gerbing/data/shiptime.csv
Compare the histograms in Figure 9 and Figure 10 of ship Time, which have the same bin width but differ by 0.5 in their start values. The resulting histograms differ to the extent that, at first glance, they appear to describe different data sets. The reason for this change is that, given this small sample size, there are relatively large gaps between some adjacent data values. The continuity of the possible underlying values would be better represented by the relatively few values in the data set. The one-dimensional scatterplot or strip plot, explained later, is usually a preferred alternative to visualize the distribution of data values in a small data set.
Density Curve
The histogram is a 19th-century innovation. Its notable limitation is that the underlying distribution of a continuous variable is, well, continuous, but the histogram plots it in discrete chunks, or bins. Ideally, a continuous distribution would be represented by a smooth curve. Modern computer technology delivers that representation by directly estimating that smooth curve.
The smooth curve that directly visualizes continuity.
As with the histogram, the density curve indicates where the variable’s values tend to occur more or less frequently, as shown in Figure 11. Here, return to the Employee data set.
X(Salary, type="density")
type: Parameter to indicate the type of visualization to plot along the x-axis. Set to "density" to display the smoothed density curve over the histogram.
Note: Can still specify bin_width and bin_start if desired.
The density curve in Figure 11 is a bit wobbly. Its smoothness can be adjusted.
Determines the smoothness of the estimated density curve.
The value of each point of the density curve is computed by averaging the values of nearby data values. A larger bandwidth results in a smoother curve because it averages over a broader range of data points, leading to less sensitivity to fluctuations in individual data points. A smaller bandwidth produces a more jagged curve that closely follows the individual data points.
The lack of smoothness in Figure 11 is likely due to random sampling fluctuations characteristic of such small samples. The bandwidth applied to that histogram is 9000. To derive a smoother curve more resistant to this likely random error, the curve in Figure 12 was computed with a larger bandwidth than the density curve in Figure 11: 15000.
X(Salary, type="density", bandwidth=15000)
bandwidth: Specify the bandwidth for the density curve to provide for custom values.
As with the underlying histogram, there is no one best density curve. The goal is to find the optimal bandwidth that balances excessive smoothness with over-emphasizing random fluctuations in the data, which often exhibit large, usually random ups and downs in the density curve.
Box Plot
The alternative primary display for a continuous variable is the box plot, along with related visualizations that can enhance the basic box plot. To understand the box plot, we need another set of statistics based on the position of a data value within the sorted data from smallest to largest.
Order Statistics
There are many positions within a set of ordered data values that can be identified, which helps us understand the distribution of a continuous variable.
Specify the position in an ordered set of data values.
One use for order statistics is that, because their values depend only on the order of the sorted values in a distribution, they can be applied to ordinal data. Statistics such as the mean and the standard deviation require higher-quality interval or ratio data.
An essential order statistic indicates the middle of the distribution.
The value midway between the smallest and largest values of the sorted distribution.
A distribution with an odd number of values has a data value in the middle. For a distribution with an even number of values, the median is generally not a data value but the mean of the two values closest to the middle.
The median requires one split of the sorted data, but any number of splits can be made. The quartiles split the distribution into four equal parts: 1st , 2nd , 3rd , and 4th quartiles.
One of three values that together separate the values of an ordered distribution into four equal parts.
The median is the second quartile. The first quartile cuts off the bottom 25% of the distribution, and the third quartile cuts off the bottom 75%.
The more variable the data values in the distribution, the wider the box. From the quartiles, we can define an order statistic that describes the variability of the data values because its value is the width of the box in the box plot.
The middle 50% of the data values in a distribution is the difference between the first and third quartiles.
We have the median, counterpart to the mean, and the IQR, counterpart to the standard deviation.
Statistics such as the mean and standard deviation are called parametric statistics, which are compared to order statistics, sometimes referred to as non-parametric statistics.
The Box
Construct the box in the box plot from the three quartiles. The length of the box is the interquartile range or IQR, the range of data that contains the middle 50% of all data values, with a line through the median and perpendicular lines extending out from the edges. Figure 13 shows the box plot based on these quartiles.
Skewness
For a symmetric distribution, such as the normal curve, the right side is a mirror image of the left side. The median will split the box evenly down the middle. However, many distributions lack symmetry, with one tail larger than the other.
An indicator of a lack of distribution symmetry.
An asymmetric distribution has a longer tail on either the right or left side of a histogram or density plot. In a box plot, this asymmetry appears when the side with the longer tail occupies a greater distance from the median to the end of the box than the opposite side. Figure Figure 14 illustrates a right-skewed distribution. The upper half of the values, those above the median, extends across a much larger range than the lower half.
The implications of skew for decision-making can be substantial. For example, consider a decision about setting a reorder time. If the visualizations in Figure 14 represent shipping time, it is apparent that some shipments arrive much later than others. The average shipping time may therefore be an insufficient indicator of the expected lag between placing an order and receiving the shipment. When setting a reorder point based on inventory levels, the cost of running out of inventory because of an unusually late shipment must also be considered.
Outliers
As revealed by the box plot visualizations, lines extend from either side of the box. To understand the meaning of those lines, we first need to understand the concept of an outlier. Some values in a distribution may be anomalies. An essential task of virtually every data analysis is to identify these anomalies and then understand why they occurred.
A value far from most of the remaining data values.
Outliers should always be identified for each variable of interest, as their values may reflect a simple coding error. Or an outlier could represent the outcome of a process different from the one that generated most of the other values in the distribution.
A summary statistic should summarize data sampled from a single population, that is, data generated by a single process.
Data are meaningful only when the same process generates all the values for a variable. Mixing values from multiple processes into a single variable may yield accurate numerical results, but does not reflect any single real-world process. The process that generates an outlier can differ from the process that generated the remaining values.
- An analysis of salaries may include some part-time employees as well as full-time employees. Any resulting conclusions, such as the average salary, may then fail to represent either group well.
- The mean of nine salaries and last year’s annual GNP can be correctly calculated, but this mean has no meaningful interpretation.
The lack of meaning that results from mixing data from different processes stems from the absence of a single concept or entity shared by all the data values. Identify outliers generated by a different population, analyze them separately, and then generalize results only to the population of interest.
The box plot is the classic visualization for detecting outliers in a single distribution of values for a variable. In a box plot, outliers are defined relative to the size of the box. Two categories of outliers are commonly defined. As you read these definitions, remember that the IQR is the width of the box.
The well-known and influential statistician John Tukey developed the concept of the box plot in the 1970’s and provided the following definitions of an outlier.
Values between 1.5 IQR’s and 3.0 IQR’s The following from the edges of the box.
Values more than 3.0 IQR’s from either box’s edge.
The purpose of the lines extending from the central box in a box plot is to help identify outliers.
A line from a box’s edge that extends to the most extreme data value that is not a potential outlier, that is, within 1.5 IQRs of the edges, with a perpendicular line segment at the end of the line.
Figure 15 shows the box plot of Salary from the Employee data set.
X(Salary, type="box")
type: Parameter that specifies the type of plot. To call X() to create only a box plot, set that parameter to box. The following function call produces the same box plot as shown in Figure 15. The analysis variable needs to be continuous.
Note: The lessR box plot analysis displays the potential outliers with a relatively dark red dot. Identify the actual outliers by a stronger red dot.
Notice the red point displayed past the right whisker of the box plot in Figure 15. That point is a potential outlier, the largest salary in the employee data set, almost $145,500. That value is more than 1.5 times the box width beyond the third quartile. The data analyst conducting this salary analysis needs to investigate why that potential outlier exists.
Interpretation. Does some characteristic distinguish the outlier employee from the remaining 36 employees? For example, is this employee at a higher management level than the others? If so, that observation should be excluded from the analysis, and the resulting conclusions should be generalized only to the 36 employees at the same level of the company hierarchy. Whether to remove that value cannot be determined from the data alone. The analyst must instead use substantive knowledge about how salaries are defined within the company and how management levels are structured.
Subjective judgment and business domain knowledge cannot be separated from data analytic conclusions. Many pure mathematicians or statisticians prefer to live in a world of their geeked-out mathematical objective reality. However, the reality in which we are working and doing data analysis requires domain knowledge and judgment. When you encounter an outlier, always use your subjective judgment to evaluate the reason for the outlier:
- a coding error
- from a different process as the other data values
- a weird result from the same process
Statistics alone cannot answer that essential question.
Here is an example of outliers that result from the same process. To illustrate, generate 1000 normal curve simulated data values, as shown in Figure 16.
The outliers at the two extremes of the normal curve provide an excellent example of how outliers can occur even when all data values are generated by the same process. Weird things happen occasionally. For example, if you flip a fair coin 10 times, getting nine heads has probability 0.00978, or about 1%. But 1% is not 0%, so expect nine heads about 1 out of every 100 outcomes.
All the data in Figure 16 are properly sampled from the same normal distribution. The detected outliers therefore reflect unusual but expected outcomes from that same process. These unusual outcomes are expected in a large sample such as these 1000 simulated normally distributed data values.
VBS Plot
John Tukey developed the box plot just before computer technology became dominant and widely accessible for data visualization. As such, it is a pre-computer technology designed for paper-and-pencil drawings. However, the box plot still performs quite well, even with our modern, sophisticated computer visualizations. That said, we have a modern update that enhances the box plot.
Violin Plot
Replace the top and bottom flat edges of the box plot with a density plot, which often, but not necessarily, resembles a violin.
X(Salary, type="violin")
type: Parameter to specify the type of plot. Set to "violin" to create the violin plot.
Figure 17 shows an example of a violin plot.
Those flat edges of the box in the box plot do not provide any information. Replacing those edges with the corresponding density plots provides more information about the locations of the data values for the continuous variable of interest.
Strip Plot
The scatterplot provides a direct visualization of the data values and corresponding sample size.
A one dimensional scatter plot.
Each data value is plotted as a point according to its location along the corresponding value axis.
X(Salary, type="strip")
type: Parameter to specify the type of plot. Set to "strip" to create the 1-D scatterplot of a continuous variable, a strip plot.
What happens when points that share the same value overlap each other? Plot overlapping points either with partial transparency or with random perturbations to distinguish between multiple points of the same value.
Random displacement of the horizontal and/or vertical coordinates of a plotted point.
More overlap requires more jitter. If possible, apply only vertical jitter to preserve the plotted point’s value. If there are too many points with the same value, add horizontal jitter as well.
The scatterplot in Figure 18 displays slight jitter. To emphasize, Figure 19 displays more vertical jitter for the same data values.
X(Salary, type="strip", jitter_y=7)
jitter_y: Parameter to specify the extent of vertical jitter for a scatterplot to provide for custom values.
jitter_x: Parameter to specify the extent of horizontal jitter for a scatterplot to provide for custom values.
By default, the plotted values are jittered vertically approximately as needed, which can then be customized. The values of the jitter parameters used to create the scatterplot are listed as part of the output at the R console.
Parameter values (can be manually set)
-------------------------------------------------------
size: 0.61 size of plotted points
out_size: 0.82 size of plotted outlier points
jitter_y: 0.45 random vertical movement of points
jitter_x: 0.00 random horizontal movement of points
bw: 9529.04 set bandwidth higher for smoother edges
Again, there is no one correct answer for the amount of jitter to apply to the points in a scatterplot. The amount to apply depends on the analyst’s subjective perception, usually just enough to separate the points.
Integrating Violin, Box, and Strip Plots
With modern computer technology, we can go further and combine the violin plot, the box plot, and the one-dimensional scatterplot into a single comprehensive visualization. Contributing to the development of this type of visualization [@vbs], I call this combination the VBS plot for Violin/Box/Scatterplot, illustrated in Figure 20.
X(Salary, type="vbs")
type: Parameter to specify the type of plot. Set to "vbs" to create the integrated violin/box/strip plot.
The VBS plot simultaneously retains the advantages of all three of its constituent component visualizations. We have a density plot to estimate the smooth distribution of the continuous variable over the range of date values, a box plot with the median and first and 3rd quartiles, along with whiskers to identify outliers, and a direct visualization of the data points themselves.
Stratification
A common and essential task of data visualization is comparing a distribution across different groups.
Divide a data table into subgroups, or strata, based on the levels of one or more categorical variables to analyze how the distribution of a single variable, or the relationship between two variables, varies across those groups.
In data visualization, a stratification variable can modify the display in one of two ways.
Two arguments support stratification within and across panels:
by: overlay or nest visualizations of the same geometric form for different groups within a panel, distinguished by color, symbol, or position.facet: partition visualizations of the same geometric form across separate panels for different groups, in the style of Trellis displays (Cleveland 1993), more recently generally referred to as facet displays (Wickham 2016).
by Stratification Variable
Figure 21 (a) shows plotting separately for the two genders in the data.
X(Salary, by=Gender, type="vbs")
X(Salary, by=Gender, type="histogram")
by: Specify the value of a stratification variable, a categorical variable for which its levels will plot on the same panel.
Figure 21 shows two different distribution visualizations with a by variable stratifying according to the two levels of Gender in the data.
The VBS plot stratifies according to the plotted points and the histogram stratifies according to the plotted bars. However, the common characteristic for these two visualizations is that the distributions for each gender are plotted on the same panel.
facet Stratification Variable
In the employee data set, we have the salaries for 37 employees and their gender. In this data set, only male and female values surfaced. Now compare salaries across these two groups.
Plot the same visualization across panels, with each panel displaying the data values for a different group.
The Trellis plot, also known as a facet plot, visualizes stratification by the levels of at least one categorical variable. Each level of a categorical variable is represented by a facet.
The Trellis plot is so named because it resembles a garden trellis, also called a lattice. The garden trellis is a structure that supports climbing plants and vines. The trellis provides a vertical surface for plants to grow up. Figure 22 shows four examples.
Figure 23 shows a simple Trellis data visualization, a separate histogram for Men and Women. The categorical variable that specifies the levels for which to plot separate groups is called a stratification variable. For Figure 23, there are two levels of the stratification variable Gender present in these data: Male and Female.
X(Salary, facet=Gender)
X(Salary, facet=Gender, type="vbs")
facet: Specify the value of a stratification variable, a categorical variable for which its levels define the different panels or facets of the Trellis plot.
Figure 24 shows the equivalent VBS faceted visualization.
Interpretation. We conclude that a higher percentage of women occupy the lower salary ranges. For example, the median Salary for women is considerably smaller than that for men. More of the data points are shifted downward for women, even though the highest salary is for a woman. The pattern of these visualizations encourages further investigation to establish whether there are significant differences in Salary and, perhaps, a pattern of discrimination.
Current discrimination, however, cannot be supported by these data. It is potentially suggestive, but alternative explanations exist. Perhaps women in the past did not apply for jobs in equal numbers as men. Perhaps in the past, the maternity leave and support were insufficient and have since been upgraded. Or, perhaps management used to be dominated by cigar smoking, whiskey drinking chauvinists who did not hire women, but now all the old guys are retired, dead, or in jail.
Two Facet Variables
Further data subsetting occurs when we provide two categorical variables for stratification. In that situation, groups are formed for every combination of levels for the two variables. For the Employee data, we have two levels of Gender and three levels of health Plan, numbered 1, 2, and 3. There is a total of six groups to plot, as shown in Figure 25.
X(Salary, facet=c(Gender, Plan), type="vbs")
facet: Stratification variable for plotting across separate panels. Here, as a vector of two stratification variables.
Both Types of Stratification
In this example, there is a single stratification variable for plotting within each panel, Gender, and two facet stratification variables for plotting across panels, JobSat and Plan.
X(Salary, by=Gender, facet=c(JobSat, Plan), type="vbs")
by: Parameter to specify the stratification variable for plotting on the same panel.
facet: Parameter to specify the stratification variable for plotting across separate panels. Here, as a vector of two stratification variables.
Ideally, this example would be based on a larger sample, but it does show how multiple stratification variables can serve as the basis for a data visualization.