Plots of Data Collected over Time

David Gerbing

Plot ordered data values collected over time in one of two ways that correspond to how the values are labeled.

Run Chart

Meaningful for sequentially ordered numerical data values such as by time, plot a run chart of a single variable with the Index values generated by specifying the name of the \(x\) variable, the first variable typically listed, as .Index. The name begins with a \(.\) so as to confuse with an existing variable. Analogous to a time series visualization, the run chart plots the data values sequentially, but without dates or times. An analysis of the runs is also provided.

Illustrate with the lessR Employee data.

d <- Read("Employee")
## 
## >>> Suggestions
## Recommended binary format for data files: feather
##   Create with Write(d, "your_file", format="feather")
## More details about your data, Enter:  details()  for d, or  details(name)
## 
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1     Years   integer     36       1      16   7  NA  7 ... 1  2  10
##  2    Gender character     37       0       2   M  M  W ... W  W  M
##  3      Dept character     36       1       5   ADMN  SALE  FINC ... MKTG  SALE  FINC
##  4    Salary    double     37       0      37   53788.26  94494.58 ... 56508.32  57562.36
##  5    JobSat character     35       2       3   med  low  high ... high  low  high
##  6      Plan   integer     37       0       3   1  1  2 ... 2  2  1
##  7       Pre   integer     37       0      27   82  62  90 ... 83  59  80
##  8      Post   integer     37       0      22   92  74  86 ... 90  71  87
## ------------------------------------------------------------------------------------------

The data values for the variable Salary are not actually collected over time, but for illustration, here create a run chart of Salary as if the data were collected over time. The indices, the sequence of integers from 1 to the last data value, are created by Plot(). Only the data values are specified. Invoke the run parameter to instruct Plot() to plot the data in sequential order as a run chart.

Plot(.Index, Salary)

## >>> Suggestions
## Plot(.Index, Salary, lwd=0, fill="on")  # just area
## Plot(.Index, Salary, fill="on")  # default color fill 
## 
##      n   miss         mean           sd          min          mdn          max 
##      37      0    73795.557    21799.533    46124.970    69547.600   134419.230 
## 
## ------------
## Run Analysis
## ------------
## 
## Total number of runs: 21 
## Total number of values that do not equal the median: 36

The default run chart displays the plotted points in a small size with connecting line segments. Change the size of the points with the parameter size, here set to zero to remove the points entirely. Fill the area under the line segments with the parameter area_fill, here set to the default on but can express any color. Remove the center line with the parameter center_line set to off.

Plot(.Index, Salary, size=0, area_fill="on", center_line="off")

## >>> Suggestions
## Plot(.Index, Salary, size=0, area_fill="on", center_line="off", lwd=0, fill="on")  # just area 
## 
##      n   miss         mean           sd          min          mdn          max 
##      37      0    73795.557    21799.533    46124.970    69547.600   134419.230 
## 
## ------------
## Run Analysis
## ------------
## 
## Total number of runs: 21 
## Total number of values that do not equal the median: 36

Time Series Chart

Plot() can plot a time series from three different data structures:

A time series requires two variables, the \(x\)-variable time/date, and the \(y\)-variable, each corresponding measured value to be plotted.

Plotting a variable of type Date as the \(x\)-variable in a scatterplot automatically creates a time series visualization. Plot() draws the connecting line segments, without the points at each time period (size=0). To add the area fill, for lessR set the area_fill parameter to TRUE for the default color from the current color theme. Or, set to a specific color.

Long-Format Data

Read time series data of stock Price for three companies: Apple, IBM, and Intel. The data table is in long form, part of lessR.

d <- Read("StockPrice")
## 
## >>> Suggestions
## Recommended binary format for data files: feather
##   Create with Write(d, "your_file", format="feather")
## More details about your data, Enter:  details()  for d, or  details(name)
## 
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## Date: Date with year, month and day
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1     Month      Date   1419       0     473   1985-01-01 ... 2024-05-01
##  2   Company character   1419       0       3   Apple  Apple ... Intel  Intel
##  3     Price    double   1419       0    1400   0.100055  0.085392 ... 30.346739  30.555891
##  4    Volume    double   1419       0    1419   6366416000 ... 229147100
## ------------------------------------------------------------------------------------------
d[1:5,]
##        Month Company    Price     Volume
## 1 1985-01-01   Apple 0.100055 6366416000
## 2 1985-02-01   Apple 0.085392 4733388800
## 3 1985-03-01   Apple 0.076335 4615587200
## 4 1985-04-01   Apple 0.073316 2868028800
## 5 1985-05-01   Apple 0.059947 4639129600

Activate a time series plot by setting the \(x\)-variable to a variable of R type Date, which is true of the variable Month in this data set. Can also plot a time series by passing a time series object, created with the base R function ts() as the variable to plot. Plot() will attempt to convert a four-digit integer year sequentially organized in increments of 1 year, or a date expressed as digits with / or - delimiters, such as 08/18/2024, to a variable of type Date. However, this conversion is not without some ambiguity, so if it is incorrect, then specify the correct date format with parameter time_format.

Here, plot the stock price over time just for Apple, with the two variables Month and Price, stock price. The parameter filter specifies the rows of the input data frame retained for the analysis.

Plot(Month, Price, filter=(Company=="Apple"))
## 
## filter:  (Company == "Apple") 
## -----
## Rows of data before filtering:  1419 
## Rows of data after filtering:   473

## >>> Suggestions
## Plot(Month, Price, time_ahead=4)  # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years")  # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean")  # aggregate by yearly mean

Add the default fill color by setting the area_fill parameter to "on". Can also specify a custom color.

Plot(Month, Price, filter=(Company=="Apple"), area_fill="on")
## 
## filter:  (Company == "Apple") 
## -----
## Rows of data before filtering:  1419 
## Rows of data after filtering:   473

## >>> Suggestions
## Plot(Month, Price, time_ahead=4)  # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years")  # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean")  # aggregate by yearly mean

With the by parameter, plot all three companies on the same panel.

Plot(Month, Price, by=Company)

## >>> Suggestions
## Plot(Month, Price, time_ahead=4)  # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years")  # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean")  # aggregate by yearly mean

Stack the plots by setting the parameter stack to TRUE.

Plot(Month, Price, by=Company, stack=TRUE)

## >>> Suggestions
## Plot(Month, Price, time_ahead=4)  # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years")  # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean")  # aggregate by yearly mean

With the facet1 parameter, plot all three companies on the different panels, a Trellis plot.

Plot(Month, Price, facet1=Company)
## [Trellis (facet) graphics from Deepayan Sarkar's lattice package]

Do the Trellis plot with some color. Learn more about customizing visualizations in the vignette utlities.

style(sub_theme="black", window_fill="gray10")
Plot(Month, Price, facet1=Company, n_col=1, fill="darkred", color="red", trans=.55)
## [Trellis (facet) graphics from Deepayan Sarkar's lattice package]

Return to the default style, then turn off text output for subsequent analyses.

style()
## theme set to "colors"
style(quiet=TRUE)

Set a baseline of 25 with the area_origin parameter for a Trellis plot, with default fill color.

Plot(Month, Price, facet1=Company, xlab="", area_fill="on", area_origin=25)

Change the aspect ratio with the aspect parameter defined as height divided by width.

Plot(Month, Price, facet1=Company, aspect=.5, area_fill="slategray3")

Stack the three time series, fill under each curve with a version of the lessR sequential range "emeralds".

Plot(Month, Price, by=Company, trans=0.4, stack=TRUE, area_fill="emeralds")

Wide-Format Data

Plot() also reads wide-format data. We have no available wide form time data with lessR, so first convert the long form as read to the wide form. In the wide form, the three companies each have their own column of data, repeated for each date. Use the lessR function reshape_wide() to do the conversion.

dw <- reshape_wide(d, group="Company", response="Price", ID="Month")
head(dw)
##        Month    Apple      IBM    Intel
## 1 1985-01-01 0.100055 11.71846 0.359457
## 2 1985-02-01 0.085392 11.51437 0.327310
## 3 1985-03-01 0.076335 11.00154 0.324388
## 4 1985-04-01 0.073316 10.95822 0.321466
## 5 1985-05-01 0.059947 11.14231 0.308315
## 6 1985-06-01 0.062103 10.81489 0.303932

Now the analysis, which repeats a previous analysis, but with wide-form data. Because the data frame is not the default d, explicitly indicate with the data parameter.

Plot(Month, c(Intel, Apple, IBM), area_fill="blues", stack=TRUE, trans=.4, data=dw)

Time-Series Object Data

Can also plot directly from an R time series object, created with the base R ts() function.

a1.ts <- ts(dw$Apple, frequency=12, start=c(1980, 12))
Plot(a1.ts)

With style() many themes can be selected, such as "lightbronze", "dodgerblue", "darkred", and "gray" for gray scale. When no theme or any other parameter value is specified, return to the default theme, colors.

style()

Aggregation by Time

Here, aggregate monthly data to plot by quarter. Many time units are available, including "years", "quarters", "months", "weeks", and “days” and smaller units as well. Accomplished the aggregation with the parameter time_unit (which employees functions from the xts package).

n.q <- 42
month <- seq(as.Date("2013/1/1"), length=n.q, by="months")
x <- rnorm(n.q, 100, 15)
Plot(month, x, time_unit="quarters")

The stock price for each company is reported monthly in the data table. To aggregate to quarters, use the time_unit parameter. The default aggregation is the sum over the specified time period. That value is appropriate if we are, for example, aggregating monthly sales over each quarter, but for stock Price we want the mean stock price over the specified time period. So, set the parameter time_agg to "mean".

d <- Read("StockPrice", quiet=TRUE)
Plot(Month, Price, time_unit="quarters", time_agg="mean")
## >>> Warning
## The  Date  variable is not sorted in Increasing Order.
## 
## For a data frame named d, enter: 
##     d <- sort_by(d, Month)
## Maybe you have a  by  variable with repeating Date values?
## Enter  ?sort_by  for more information and examples.

Or, aggregate by years, here for all three companies.

Plot(Month, Price, by=Company, time_unit="years", time_agg="mean")

Forecast

Plot() implements exponential smoothing forecasting with accompanying visualization. New parameters include time_ahead for the number of time_units to forecast into the future, and time_format to provide a specific format for the date variable if not detected correctly by default. Control aspects of the exponential smoothing estimation and prediction algorithms with parameters es_level (alpha), es_trend (beta), es_seasons (gamma), es_type for additive or multiplicative seasonality, and es_PIlevel for the level of the prediction intervals.

To forecast Apple’s stock price, focus here on the last several years of the data, beginning with Row 400 through Row 473, the last row of data for apple. In this example, forecast ahead 24 months.

d <- d[400:473,]
Plot(Month, Price, time_unit="months", time_agg="mean", time_ahead=24)

Better to visually understand the characteristics of the time series before trying to forecast. As an aid to facilitate this understanding, consider the decomposition of the time series into the seasonal and trend components with the lessR function STL(), which relies upon the base R function stl() but provides more information and allows more flexible input.

STL(Month, Price)

## 
## Total variance of Price: 2728.807
## Proportion of variance for components:
##   seasonality --- 0.006 
##   trend --------- 0.936 
##   remainder ----- 0.026 
## 
## Range of Price: 157.8754
## Range of components:
##   seasonality --- 11.568 
##   trend --------- 136.006 
##   remainder ----- 48.350

Annotation

The annotations in the following visualization consist of the text field “iPhone” with an arrowhead that points to the time that the first iPhone became available. With lessR, list each component of the annotation as a vector for add. Any value listed that is not a keyword such as “rect” or “arrow” is interpreted as a text field. Then, in order of their occurrence in the vector for add, list the needed coordinates for the objects. To place the text field “iPhone” requires one coordinate, <x1,y1>. To place an “arrow” requires two coordinates, <x1,y1> and <x2,y2>. For example, the second element of the y1 vector is the y1 value for the “arrow”. The text field does not require a second coordinate, so specify x2 and y2 as single elements instead of vectors.

x <- as.Date("2007-06-01")
Plot(Month, Price, filter=(Company == "Apple"), area_fill="on",
            add=c("iPhone", "arrow"), 
            x1=c(x,x), y1=c(100,90), x2=x, y2=30)

Full Manual

Use the base R help() function to view the full manual for Plot(). Simply enter a question mark followed by the name of the function.

?Plot

More

More on Scatterplots, Time Series plots, and other visualizations from lessR and other packages such as ggplot2 at:

Gerbing, D., R Visualizations: Derive Meaning from Data, CRC Press, May, 2020, ISBN 978-1138599635.