Plot ordered data values collected over time in one of two ways that correspond to how the values are labeled.
Create a run chart from two variables, the \(x\)-variable as the sequence of consecutive integers from 1 to the number of data values, the Index values, and the \(y\)-variable that specifies the corresponding data values to be plotted. Meaningful for sequentially ordered numerical data values such as by time, plot a run chart of a single variable with the option of generating the Index values by specifying the name of the \(x\) variable, the first variable typically listed, as .Index. The name begins with a \(.\) so as to not confuse with an existing variable. Analogous to a time series visualization, the run chart plots the data values sequentially, but without dates or times. An analysis of the runs can also be obtained.
d <- Read("Employee")The data values for the variable Salary were not collected over time, but for illustration, here create a run chart of Salary as if the data were collected over time. The indices, the sequence of integers from 1 to the last data value, are created by XY() by specifying the \(x\)-variable as .Index. Invoke the run parameter to instruct XY() to plot the data in sequential order as a run chart.
XY(.Index, Salary)## [Interactive chart from the Plotly R package (Sievert, 2020)]
The default run chart displays the plotted points in a small size with connecting line segments. Change the size of the points with the parameter size, here set to zero to remove the points entirely. Fill the area under the line segments with the parameter ts_area_fill, here set to the default on but can express any color. Remove the center line with the parameter center_line set to off. Display the analysis of the runs with the parameter show_runs set to TRUE.
XY(.Index, Salary, size=0, ts_area_fill="on", center_line="off", show_runs=TRUE)## [Interactive chart from the Plotly R package (Sievert, 2020)]
Create a time series from two variables, the \(x\)-variable as a date, and the \(y\)-variable that specifies the corresponding measured values to be plotted. Internally, the \(x\)-variable is stored as a variable of R type Date. Traditionally, the Date variable is created prior to calling XY(), such as with the R function as.Date(). However, XY() can also implicitly convert a character string numeric date value such as "08/18/2024" to a formal Date data value, as explained below. Plotting a variable of type Date as the \(x\)-variable in a scatterplot automatically creates a time series visualization with each pair of adjacent points connected by a line segment.
R does not provide an automatic conversion of character string dates to a formal date variable, likely because the conversion is inherently ambiguous. There are multiple ways in which a numerical date can be specified and inferring the date format from data values is not always guaranteed but can usually work. XY() will attempt the conversion for you. To facilitate verification of the correct date format, XY() displays its inferred format. XY() allows an explicit date format specification with the parameter ts_format. View the list of all possible date formats, by entering ?strptime to display the corresponding help file.
Following are the five different possibilities of numerical data values read as character strings that XY() will convert to actual dates, an R variable of type Date. Expressing the year with all four digits is recommended though not usually necessary. The following examples use the hyphen, -, delimiter but the backslash, /, and period, ., can also be used.
Enter the dates for daily data values in one of the above five numerical formats. Or, use the ts_format parameter to specify a format for non-numerical date values that can include the name of the corresponding month (as per ?strptime).
Enter the dates for weekly data values as with daily data values except that consecutive dates are one week apart. For example, each date represents the first day of the corresponding week, such as "04/03/2024" for the fourth day of March 4, 2024, which begins the first full week in March 2024, followed by "11/03/2024" for the 11th day of the same month.
Two possibilities exist for entering monthly data. Enter the dates for monthly data values as either:
"01/03/2024" for the first day of March 2024, followed by "1/04/2024" for the first day of April, 2024."2024 Jan" followed by "2024 Feb".Two possibilities exist for entering quarterly data. Enter the dates for quarterly data values as either:
"01/01/2024" for the first day of the first quarter followed by "01/04/2024" for the first day of the second quarter."2024 Q1" followed by "2024 Q2".Two possibilities exist for entering annual data. Enter the dates for annual data values as either:
"01/01/2024" for the first day of the year for 2024, followed by "01/01/2025" for the first day of the following year."2024" followed by "2025".Read time series data of stock Price for three companies: Apple, IBM, and Intel. The data table is part of lessR, called StockPrice.
d <- Read("StockPrice")d[1:5,]## Month Company Price Volume
## 1 1985-01-01 Apple 0.09559606 175302400
## 23 1985-02-01 Apple 0.09816800 137737600
## 42 1985-03-01 Apple 0.08530761 247430400
## 63 1985-04-01 Apple 0.07416180 114060800
## 84 1985-05-01 Apple 0.07158987 57344000
Activate a time series plot by setting the \(x\)-variable to a variable of R type Date, which is true of the variable Month in this data set. Can also plot a time series by passing a time series object, created with the base R function ts() as the variable to plot. XY() will attempt to convert a four-digit integer year sequentially organized in increments of 1 year, or a date expressed as digits with / or - delimiters, such as "08/18/2024", to a variable of type Date. However, this conversion is not without some ambiguity, so if it is incorrect, then specify the correct date format with parameter ts_format.
Here, plot the stock price over time just for Apple, with the two variables Month and Price, stock price. The parameter filter specifies the rows of the input data frame retained for the analysis.
XY(Month, Price, filter=(Company=="Apple"))## [Interactive chart from the Plotly R package (Sievert, 2020)]
Add the default fill color by setting the ts_area_fill parameter to "on". Can also specify a custom color.
XY(Month, Price, filter=(Company=="Apple"), ts_area_fill="on")## [Interactive chart from the Plotly R package (Sievert, 2020)]
With the by parameter, plot all three companies on the same panel.
XY(Month, Price, by=Company)## [Interactive chart from the Plotly R package (Sievert, 2020)]
Stack the plots by setting the parameter stack to TRUE.
XY(Month, Price, by=Company, ts_stack=TRUE)## [Interactive chart from the Plotly R package (Sievert, 2020)]
With the facet parameter, plot all three companies on the different panels, a Trellis plot.
XY(Month, Price, facet=Company)## [Interactive chart from the Plotly R package (Sievert, 2020)]
Do the Trellis plot with some color. Learn more about customizing visualizations in the vignette utlities.
style(sub_theme="black", window_fill="gray10")
XY(Month, Price, facet=Company, n_col=1, fill="darkred", color="red", trans=.55)## [Interactive chart from the Plotly R package (Sievert, 2020)]
Return to the default style and turn off text output for subsequent analyses.
style()
style(quiet=TRUE)Set a baseline of 25 with the ts_area_split parameter for a Trellis plot, with default fill color.
XY(Month, Price, facet=Company, xlab="", ts_area_fill="on", ts_area_split=25)## [Interactive chart from the Plotly R package (Sievert, 2020)]
Change the aspect ratio with the aspect parameter defined as height divided by width.
XY(Month, Price, facet=Company, aspect=.5, ts_area_fill="slategray3")## [Interactive chart from the Plotly R package (Sievert, 2020)]
Stack the three time series, fill under each curve with a version of the lessR sequential range "emeralds".
XY(Month, Price, by=Company, trans=0.4, ts_stack=TRUE, ts_area_fill="emeralds")## [Interactive chart from the Plotly R package (Sievert, 2020)]
This example aggregates monthly stock price data by quarter. Available time units are "years", "quarters", "months", "weeks", and “days”. Also included is the special time unit "days7" explained below in the Forecasting section. Aggregate with the parameter ts_unit (which relies upon functions from the xts package). Generate and display the first several months of the monthly data.
The stock price for each company is reported monthly in the data table. To aggregate to quarters, use the ts_unit parameter. The default aggregation is the sum over the specified time period. That value is appropriate if we are, for example, aggregating monthly sales over each quarter, but for stock Price we want the mean stock price over the specified time period. Set the parameter ts_agg to "mean". Focus just on the Apple stock price data with the filter parameter.
d <- Read("StockPrice", quiet=TRUE)XY(Month, Price, ts_unit="quarters", ts_agg="mean", filter=(Company=="Apple"))## [Interactive chart from the Plotly R package (Sievert, 2020)]
Or, aggregate by years to smooth the curve further, with a darkred line.
XY(Month, Price, ts_unit="years", ts_agg="mean", filter=(Company=="Apple"),
color="darkred")## [Interactive chart from the Plotly R package (Sievert, 2020)]
In the following example, aggregate by years for each of the three companies.
XY(Month, Price, by=Company, ts_unit="years", ts_agg="mean")## [Interactive chart from the Plotly R package (Sievert, 2020)]
XY() implements time series forecasting based on trend and seasonality with either exponential smoothing or regression analysis, including the accompanying visualization. Time series parameters include:
ts_method: Set at "es" for exponential smoothing, the default, or "lm" for linear model regression.ts_unit: The time unit, either as the natural occurring interval between dates in the data, the default, or aggregated to a wider time interval.ts_ahead: The number of time units to forecast into the future.ts_agg: If aggregating the time unit, aggregate as the "sum", the default, or as the "mean".ts_PIlevel: The confidence level of the prediction intervals, with 0.95 the default.ts_seasons: Set to FALSE to turn off seasonality in the estimated model.ts_trend: Set to FALSE to turn off trend in the estimated model.ts_error: Type of error term.ts_format: Provides a specific format for the date variable if not detected correctly by default.ts_source: Default is time series forecasting from "fable" and related packages, or specify "classic".To forecast Apple’s stock price, focus on the last several years of the data, beginning with Row 400 through Row 473, the last row of data for Apple. In this example, forecast ahead 24 months. Here, rely upon the default exponential smoothing estimation procedure from the fpp3 ecosystem package fable.
d <- d[400:473,]
XY(Month, Price, ts_unit="months", ts_agg="mean", ts_ahead=24)## [Interactive chart from the Plotly R package (Sievert, 2020)]
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
## Loading required package: fabletools
##
## Attaching package: 'fabletools'
## The following object is masked from 'package:lessR':
##
## model
Next, implement the classic Holt-Winters exponential smoothing method from the Base~R function Holt-Winters().
XY(Month, Price, ts_unit="months", ts_agg="mean", ts_ahead=24,
ts_source="classic")## [Interactive chart from the Plotly R package (Sievert, 2020)]
Or, do the regression with seasonality to forecast according to the parameter ts_method, here changed from its default value of es exponential smoothing to lm for linear model. The data are de-seasonalized, the regression analysis performed, and then the seasonality added back.
XY(Month, Price, ts_unit="months", ts_agg="mean", ts_ahead=24, ts_source="classic", ts_method="lm")## [Interactive chart from the Plotly R package (Sievert, 2020)]
## Warning in summary.lm(object, ...): essentially perfect fit: summary may be unreliable
Here, do the linear regression forecast but without seasonality according to the parameter
ts_seasons.
XY(Month, Price, ts_unit="months", ts_agg="mean", ts_ahead=24, ts_source="classic", ts_method="lm", ts_seasons="N")## [Interactive chart from the Plotly R package (Sievert, 2020)]
## Warning in summary.lm(object, ...): essentially perfect fit: summary may be unreliable
Better to visually understand the characteristics of the time series before trying to forecast. As an aid to facilitate this understanding, here decompose the time series into its seasonal and trend components with the lessR function STL(), which relies upon the base R function stl() but provides more information and allows more flexible input.
STL(Month, Price)##
## Total variance of Price: 2700.759
## Proportion of variance for components:
## seasonality --- 0.007
## trend --------- 0.926
## remainder ----- 0.027
The traditional time units, such as "days" or "quarters", evaluate seasonality over the entire year. Quarterly and even monthly data can be usually be meaningfully assessed for seasonality over the entire year. With daily data, however, seasonality is generally more meaningfully assessed over the days of the week. For example, sales may typically be higher on Monday than they are on Sunday.
Consider the following daily data for which we wish to evaluate seasonality over the days of the week. To indicate potential seasonality of daily data within a week, specify the time unit with parameter ts_unit set to "days7".
XY(days, sales, ts_ahead=8, ts_unit="days7")## [Interactive chart from the Plotly R package (Sievert, 2020)]
## Warning in xy.coords(x, y): NAs introduced by coercion
## Warning in xy.coords(x, y): NAs introduced by coercion
We now have seasonality coefficient for each day of the week, which are projected into the future for forecasting.
If the date value and the y-value are missing, then the nearest adjacent points are connected by a line segment that runs over the missing data value, effectively linearly interpolating the missing value across the two adjacent present values. For example, consider a daily time series related to the Tableau Superstore data such that “2021-01-07” and “2021-01-09” are both present with their corresponding y values, but there is no date value or y value for January 8, that is, “2021-01-08”. To yield a single data value of Sales for each day, aggregate Sales by day.
d <- read.table(text="
Order.Date Sales
2021-01-05 19.536
2021-01-06 473.820
2021-01-06 5.480
2021-01-06 12.780
2021-01-06 609.980
2021-01-06 31.120
2021-01-06 6.540
2021-01-06 19.440
2021-01-07 176.728
2021-01-07 10.430
2021-01-09 9.344
2021-01-09 31.200
2021-01-10 51.940
2021-01-10 2.890",
header = TRUE)Two sales are recorded on January 7 and two sales are recorded on January 9 but there is no record for any sales or even a date for January 8. The entire row of data for January 8 is missing.
Next, plot the aggregated Sales data by day for dates from January 3 through January 10.
XY(Order.Date, Sales, ts_unit="days")## [Interactive chart from the Plotly R package (Sievert, 2020)]
##
## Best guess for the date format: %Y-%m-%d
## If this format is wrong, specify with parameter: ts_format
## To see all possible formats, enter: ?strptime
## Examples: "08/18/2024" format is "%m/%d/%Y"
## "18-08-24" format is "%d-%m-%y"
## "August 18, 2024" format is "%B %d, %Y"
The resulting visualization plots the y-value for January 7 and also for January 9, with a line segment connecting those two points. There is no corresponding label on the x-axis for the missing data value nor is there a plotted point. And, the January 9 value is appropriately placed two days after the January 7 value on the visualization.
In terms of missing data, if the date value exists and the corresponding y-value is missing, with value , then the visualization leaves the corresponding y-value blank. Here, insert the missing row for January 8 with missing data, NA, for that date.
new_row <- data.frame(
Order.Date = "2021-01-08",
Sales = NA
)
d <- rbind(d, new_row)
d <- order_by(d, by=Order.Date)d[9:12,]## Order.Date Sales
## 9 2021-01-07 176.728
## 10 2021-01-07 10.430
## 15 2021-01-08 NA
## 11 2021-01-09 9.344
Now, plot.
XY(Order.Date, Sales, ts_unit="days")## [Interactive chart from the Plotly R package (Sievert, 2020)]
##
## Best guess for the date format: %Y-%m-%d
## If this format is wrong, specify with parameter: ts_format
## To see all possible formats, enter: ?strptime
## Examples: "08/18/2024" format is "%m/%d/%Y"
## "18-08-24" format is "%d-%m-%y"
## "August 18, 2024" format is "%B %d, %Y"
There is now a blank space in visualization for January 8. If instead, better to treat the missing value as zero sales for that day, specify the value of 0 for parameter ts_NA.
XY(Order.Date, Sales, ts_unit="days", ts_NA=0)## [Interactive chart from the Plotly R package (Sievert, 2020)]
##
## Best guess for the date format: %Y-%m-%d
## If this format is wrong, specify with parameter: ts_format
## To see all possible formats, enter: ?strptime
## Examples: "08/18/2024" format is "%m/%d/%Y"
## "18-08-24" format is "%d-%m-%y"
## "August 18, 2024" format is "%B %d, %Y"
Data can be stored in in different types of structures, different forms of organization. XY() can plot a time series from three different data structures:
The previous examples of plotting time series data read data stored in long format. Long format data organizes data with each row of the data table containing only a single measurement. If the entity provides multiple data values, then the data values are stored in multiple rows.
For example, if observations of Apple’s stock price are taken monthly, then the data for each row of the data table contain only a single stock price. Or, from another perspective, the data values for each company are each store on a separate row.
d <- Read("StockPrice", quiet=TRUE)head(d)## Month Company Price Volume
## 1 1985-01-01 Apple 0.09559606 175302400
## 23 1985-02-01 Apple 0.09816800 137737600
## 42 1985-03-01 Apple 0.08530761 247430400
## 63 1985-04-01 Apple 0.07416180 114060800
## 84 1985-05-01 Apple 0.07158987 57344000
## 106 1985-06-01 Apple 0.05487159 576016000
Many data analysis and visualization functions across a variety of statistical systems require long format data. As such, this organization of data is the most common data structure but other possibilities do exist.
XY() can also plot directly from an R time series object, created with the base R ts() function. We create the object from a wide for data table. In the wide form, the three companies each have their own column of data, repeated for each date. Use the lessR function reshape_wide() to do the conversion.
dw <- reshape_wide(d, widen=Company, response=Price, ID=Month)
head(dw)## Month Apple IBM Intel
## 1 1985-01-01 0.09559606 9.992542 0.3194396
## 23 1985-02-01 0.09816800 11.200316 0.3484795
## 42 1985-03-01 0.08530761 11.312118 0.3252473
## 63 1985-04-01 0.07416180 10.666898 0.3339593
## 84 1985-05-01 0.07158987 10.458775 0.3107277
## 106 1985-06-01 0.05487159 10.845310 0.3078233
From the wide-form data table for Apple stock price, create the time series object.
a1.ts <- ts(dw$Apple, frequency=12, start=c(1980, 12))
XY(a1.ts)## [Interactive chart from the Plotly R package (Sievert, 2020)]
With the lessR style() function many themes can be selected, such as "lightbronze", "dodgerblue", "darkred", and "gray" for gray scale. When no theme or any other parameter value is specified, return to the default theme, colors.
style()The annotations in the following visualization consist of the text field “iPhone” with an arrowhead that points to the time that the first iPhone became available. With lessR, list each component of the annotation as a vector for add. Any value listed that is not a keyword such as “rect” or “arrow” is interpreted as a text field. Then, in order of their occurrence in the vector for add, list the needed coordinates for the objects. To place the text field “iPhone” requires one coordinate, <x1,y1>. To place an “arrow” requires two coordinates, <x1,y1> and <x2,y2>. For example, the second element of the y1 vector is the y1 value for the “arrow”. The text field does not require a second coordinate, so specify x2 and y2 as single elements instead of vectors.
d <- Read("StockPrice")x <- as.Date("2007-06-01")
XY(Month, Price, filter=(Company == "Apple"), ts_area_fill="on",
add=c("iPhone", "arrow"),
x1=c(x,x), y1=c(100,90), x2=x, y2=30)## [Interactive chart from the Plotly R package (Sievert, 2020)]
Use the base R help() function to view the full manual for XY(). Simply enter a question mark followed by the name of the function.
?Plot
More on Scatterplots, Time Series plots, and other visualizations from lessR and other packages such as ggplot2 at:
Gerbing, D., R Visualizations: Derive Meaning from Data, CRC Press, May, 2020, ISBN 978-1138599635.