Plot ordered data values collected over time in one of two ways that correspond to how the values are labeled.
Meaningful for sequentially ordered numerical data values such as by time, plot a run chart of a single variable with the Index values generated by specifying the name of the \(x\) variable, the first variable typically listed, as .Index. The name begins with a \(.\) so as to confuse with an existing variable. Analogous to a time series visualization, the run chart plots the data values sequentially, but without dates or times. An analysis of the runs is also provided.
Illustrate with the lessR Employee data.
##
## >>> Suggestions
## Recommended binary format for data files: feather
## Create with Write(d, "your_file", format="feather")
## More details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 7 ... 1 2 10
## 2 Gender character 37 0 2 M M W ... W W M
## 3 Dept character 36 1 5 ADMN SALE FINC ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low high ... high low high
## 6 Plan integer 37 0 3 1 1 2 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 90 ... 83 59 80
## 8 Post integer 37 0 22 92 74 86 ... 90 71 87
## ------------------------------------------------------------------------------------------
The data values for the variable Salary are not actually
collected over time, but for illustration, here create a run chart of
Salary as if the data were collected over time. The indices,
the sequence of integers from 1 to the last data value, are created by
Plot()
. Only the data values are specified. Invoke the
run
parameter to instruct Plot()
to plot the
data in sequential order as a run chart.
## >>> Suggestions
## Plot(.Index, Salary, lwd=0, fill="on") # just area
## Plot(.Index, Salary, fill="on") # default color fill
##
## n miss mean sd min mdn max
## 37 0 73795.557 21799.533 46124.970 69547.600 134419.230
##
## ------------
## Run Analysis
## ------------
##
## Total number of runs: 21
## Total number of values that do not equal the median: 36
The default run chart displays the plotted points in a small size
with connecting line segments. Change the size of the points with the
parameter size
, here set to zero to remove the points
entirely. Fill the area under the line segments with the parameter
area_fill
, here set to the default on
but can
express any color. Remove the center line with the parameter
center_line
set to off
.
## >>> Suggestions
## Plot(.Index, Salary, size=0, area_fill="on", center_line="off", lwd=0, fill="on") # just area
##
## n miss mean sd min mdn max
## 37 0 73795.557 21799.533 46124.970 69547.600 134419.230
##
## ------------
## Run Analysis
## ------------
##
## Total number of runs: 21
## Total number of values that do not equal the median: 36
Plot()
can plot a time series from three different data
structures:
A time series requires two variables, the \(x\)-variable time/date, and the \(y\)-variable, each corresponding measured value to be plotted.
Plotting a variable of type Date
as the \(x\)-variable in a scatterplot automatically
creates a time series visualization. Plot() draws the connecting line
segments, without the points at each time period (size
=0).
To add the area fill, for lessR set the
area_fill
parameter to TRUE
for the default
color from the current color theme. Or, set to a specific color.
Read time series data of stock Price for three companies: Apple, IBM, and Intel. The data table is in long form, part of lessR.
##
## >>> Suggestions
## Recommended binary format for data files: feather
## Create with Write(d, "your_file", format="feather")
## More details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## Date: Date with year, month and day
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Month Date 1419 0 473 1985-01-01 ... 2024-05-01
## 2 Company character 1419 0 3 Apple Apple ... Intel Intel
## 3 Price double 1419 0 1400 0.100055 0.085392 ... 30.346739 30.555891
## 4 Volume double 1419 0 1419 6366416000 ... 229147100
## ------------------------------------------------------------------------------------------
## Month Company Price Volume
## 1 1985-01-01 Apple 0.100055 6366416000
## 2 1985-02-01 Apple 0.085392 4733388800
## 3 1985-03-01 Apple 0.076335 4615587200
## 4 1985-04-01 Apple 0.073316 2868028800
## 5 1985-05-01 Apple 0.059947 4639129600
Activate a time series plot by setting the \(x\)-variable to a variable of R type
Date
, which is true of the variable Month in this
data set. Can also plot a time series by passing a time series object,
created with the base R function ts()
as the variable to
plot. Plot()
will attempt to convert a four-digit integer
year sequentially organized in increments of 1 year, or a date expressed
as digits with /
or -
delimiters, such as
08/18/2024
, to a variable of type Date
.
However, this conversion is not without some ambiguity, so if it is
incorrect, then specify the correct date format with parameter
time_format
.
Here, plot the stock price over time just for Apple, with
the two variables Month and Price, stock price. The
parameter filter
specifies the rows of the input data frame
retained for the analysis.
##
## filter: (Company == "Apple")
## -----
## Rows of data before filtering: 1419
## Rows of data after filtering: 473
## >>> Suggestions
## Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years") # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean") # aggregate by yearly mean
Add the default fill color by setting the area_fill
parameter to "on"
. Can also specify a custom color.
##
## filter: (Company == "Apple")
## -----
## Rows of data before filtering: 1419
## Rows of data after filtering: 473
## >>> Suggestions
## Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years") # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean") # aggregate by yearly mean
With the by
parameter, plot all three companies on the
same panel.
## >>> Suggestions
## Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years") # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean") # aggregate by yearly mean
Stack the plots by setting the parameter stack
to
TRUE
.
## >>> Suggestions
## Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
## Plot(Month, Price, time_unit="years") # aggregate time by yearly sum
## Plot(Month, Price, time_unit="years", time_agg="mean") # aggregate by yearly mean
With the facet1
parameter, plot all three companies on
the different panels, a Trellis plot.
## [Trellis (facet) graphics from Deepayan Sarkar's lattice package]
Do the Trellis plot with some color. Learn more about customizing
visualizations in the vignette utlities
.
style(sub_theme="black", window_fill="gray10")
Plot(Month, Price, facet1=Company, n_col=1, fill="darkred", color="red", trans=.55)
## [Trellis (facet) graphics from Deepayan Sarkar's lattice package]
Return to the default style, then turn off text output for subsequent analyses.
## theme set to "colors"
Set a baseline of 25 with the area_origin
parameter for
a Trellis plot, with default fill color.
Change the aspect ratio with the aspect
parameter
defined as height divided by width.
Stack the three time series, fill under each curve with a version of
the lessR sequential range "emeralds"
.
Plot()
also reads wide-format data. We have no available
wide form time data with lessR, so first convert the
long form as read to the wide form. In the wide form, the three
companies each have their own column of data, repeated for each date.
Use the lessR function reshape_wide()
to
do the conversion.
## Month Apple IBM Intel
## 1 1985-01-01 0.100055 11.71846 0.359457
## 2 1985-02-01 0.085392 11.51437 0.327310
## 3 1985-03-01 0.076335 11.00154 0.324388
## 4 1985-04-01 0.073316 10.95822 0.321466
## 5 1985-05-01 0.059947 11.14231 0.308315
## 6 1985-06-01 0.062103 10.81489 0.303932
Now the analysis, which repeats a previous analysis, but with
wide-form data. Because the data frame is not the default d,
explicitly indicate with the data
parameter.
Can also plot directly from an R time series object, created with the
base R ts()
function.
With style()
many themes can be selected, such as
"lightbronze"
, "dodgerblue"
,
"darkred"
, and "gray"
for gray scale. When no
theme
or any other parameter value is specified, return to
the default theme, colors
.
Here, aggregate monthly data to plot by quarter. Many time units are
available, including "years"
, "quarters"
,
"months"
, "weeks"
, and “days
” and
smaller units as well. Accomplished the aggregation with the parameter
time_unit
(which employees functions from the
xts
package).
n.q <- 42
month <- seq(as.Date("2013/1/1"), length=n.q, by="months")
x <- rnorm(n.q, 100, 15)
Plot(month, x, time_unit="quarters")
The stock price for each company is reported monthly in the data
table. To aggregate to quarters, use the time_unit
parameter. The default aggregation is the sum over the specified time
period. That value is appropriate if we are, for example, aggregating
monthly sales over each quarter, but for stock Price we want
the mean stock price over the specified time period. So, set the
parameter time_agg
to "mean"
.
## >>> Warning
## The Date variable is not sorted in Increasing Order.
##
## For a data frame named d, enter:
## d <- sort_by(d, Month)
## Maybe you have a by variable with repeating Date values?
## Enter ?sort_by for more information and examples.
Or, aggregate by years, here for all three companies.
Plot()
implements exponential smoothing forecasting with
accompanying visualization. New parameters include
time_ahead
for the number of time_units
to
forecast into the future, and time_format
to provide a
specific format for the date variable if not detected correctly by
default. Control aspects of the exponential smoothing estimation and
prediction algorithms with parameters es_level
(alpha),
es_trend
(beta), es_seasons
(gamma),
es_type
for additive or multiplicative seasonality, and
es_PIlevel
for the level of the prediction intervals.
To forecast Apple’s stock price, focus here on the last several years of the data, beginning with Row 400 through Row 473, the last row of data for apple. In this example, forecast ahead 24 months.
Better to visually understand the characteristics of the time series
before trying to forecast. As an aid to facilitate this understanding,
consider the decomposition of the time series into the seasonal and
trend components with the lessR function STL()
, which
relies upon the base R function stl()
but provides more
information and allows more flexible input.
##
## Total variance of Price: 2728.807
## Proportion of variance for components:
## seasonality --- 0.006
## trend --------- 0.936
## remainder ----- 0.026
##
## Range of Price: 157.8754
## Range of components:
## seasonality --- 11.568
## trend --------- 136.006
## remainder ----- 48.350
The annotations in the following visualization consist of the text
field “iPhone” with an arrowhead that points to the time that the first
iPhone became available. With lessR, list each
component of the annotation as a vector for add. Any value listed that
is not a keyword such as “rect” or “arrow” is interpreted as a text
field. Then, in order of their occurrence in the vector for add, list
the needed coordinates for the objects. To place the text field “iPhone”
requires one coordinate, <x1,y1>
. To place an “arrow”
requires two coordinates, <x1,y1>
and
<x2,y2>
. For example, the second element of the
y1
vector is the y1
value for the “arrow”. The
text field does not require a second coordinate, so specify
x2
and y2
as single elements instead of
vectors.
Use the base R help()
function to view the full manual
for Plot()
. Simply enter a question mark followed by the
name of the function.
?Plot
More on Scatterplots, Time Series plots, and other visualizations from lessR and other packages such as ggplot2 at:
Gerbing, D., R Visualizations: Derive Meaning from Data, CRC Press, May, 2020, ISBN 978-1138599635.