Definition
Visualization of any kind depends on visual aesthetics.
A visual property such as shape, size, color, or location.
What is so intriguing about data visualizations?
Transform data into visual aesthetics.
To create a visualization, physically express a visual aesthetic, such as with a mark drawn on paper or a corresponding digital representation that selectively lights pixels on a computer monitor. Visualizations are typically drawn on two-dimensional surfaces, such as paper or a monitor.
Data visualization is an essential activity of virtually every analysis project. Why? As the opening paragraphs excerpted from the first edition of my book on data visualization explain (2nd edition due Fall of 2026), the history and survival of our species is rooted deeply in visualizing our world. On the contrary, data is a recent invention.
We are wonderfully competent visual processors. As we move about our daily lives, we do what our ancestors did so well in the distant past: effortlessly process a panorama of shapes and images that surround us, patterns immersed in the landscape of our visual world. Modern life, however, delivers a new invention for us to consider: data. With data, we search for patterns such as normality, trends, and relationships, and for exceptions to these patterns. Examine rows and columns of data to uncover this information? Our distant ancestors never encountered tables of data, so our brains never adapted to evaluate data directly.
The solution? We return to our familiar form: visual images. To visualize data, we use computer technology to transform rows and columns into visual objects. We perceive these objects according to their visual aesthetics:
- as different shapes (points, lines, bars)
- displayed at varying sizes (areas, lengths)
- with a palette of different colors (hue, saturation, brightness, transparency)
- which occupy different positions (by axes that define a coordinate system)
Visual aesthetics focus our perception on emergent patterns inherent in the data. We literally see the distributions and relationships.
For example, the lengths of the bars on a bar graph could indicate the number of employees in each department. Different colors of points in a scatterplot could also represent different employees in different departments. Or the shapes of the points could vary by department.
Coordinate System
Visualizations are typically, though not necessarily, drawn within a coordinate system defined by one or more axes. Each axis represents a variable. Each plotted coordinate corresponds to a data value on each axis.
One or more data values together determine the position of a point or object within a space.
A 1-dimensional space is a line with each point determined by a single coordinate and a single axis. A 2-dimensional space is a plane, with each point determined by two coordinates on two axes. A 3-dimensional space is a cube, with each point determined by three coordinates with three axes.
The generic names for the axes are well accepted: x for the first axis, typically horizontal; y for the second axis, usually vertical; and, if there is a third axis, z. Trying to project a two-dimensional surface into three dimensions is problematic; beyond three dimensions is not possible.
One Variable
Some visualizations consist of only a single variable, so they have only a single axis to define the coordinates.
Figure 1 shows a single axis, the x-axis, for plotting values of a single variable, generically x, but within a specific data analysis, a specific name such as categorical variable Gender or continuous variable Salary.
The axis can represent a continuum for a continuous variable or discrete values for a categorical variable. If continuous, the values can represent any real number, including positive, zero, or negative. Or, the values can be restricted, for example, to just zero and positive numbers.
Figure 2 illustrates plotting a single data value for variable x, where a continuous variable is represented by an axis of the real number line, in which values all along the axis are possible. In this visualization, assume the following values for the aesthetics.
- shape: point
- size: fixed diameter
- color:: shade of violet red
- position:: 3.5 coordinate relative to the axis
With only one dimension, one axis, the vertical height of the point above that axis is not relevant to the visualization. Choose a height that is visually pleasing and keeps the plot relatively compact. One potential height is zero, plotting the point directly on the x-axis, though that positioning can obscure the labels on the axis.
The variable plotted on the x-axis in Figure 2 represents the variable of interest. Suppose that variable is the number of years since the initiation of an investment. Then, the plotted point represents an investment with a duration of 3.5 years.
Two Variables
Most visualizations consist of two variables, plotted in two dimensions. The two axes in Figure 3 represent continuous dimensions on which the data values of generic variables x and y are plotted with potential \(+\) and \(-\) values for each variable. Each plotted point, identified as <x,y>, represents a set of paired data values for x and y for the unit of analysis, such as an employee or a geographic unit.
The axes for the continuous variables in Figure 3 accommodate both positive and negative values. Axes can also be defined for categorical variables or continuous variables with just positive (or negative) values.
An example of plotting a point in a two-dimensional coordinate system includes two variables regarding a person’s financial investment in a company’s stock: the number of years since the purchase of the stock on the x-axis and the investment’s percent return on the y-axis. Negative values on the y-axis are available because the percentage return on that investment, negative returns, losses, are possible, whereas the x-axis is restricted to non-negative values.
Suppose the investor doubled the return on a 3.5-year investment. Figure 4 shows a visualization for a single coordinate defined by the paired data values of variables x and y, Time and Percent_Return, here with respective values of 3.5 and 2. We have the same shape, size, and color aesthetics as in Figure 2, but now specify location with the dual coordinate <3.5,2>.
In practice, more than one point would likely be plotted. The investor could visualize investment success over many investments by plotting each investment as a separate point.