Visualize Time Series

Time Series

A time series plots the value of a continuous variable against the corresponding dates or times at which each value was measured. The date values are usually plotted on the $x$-axis.

Time series visualization

A sequence of data values plotted against the corresponding dates and/or times at which the values were recorded, usually at regular intervals.

The analysis begins with a visualization of the time series. Before extracting structure analytically, view the patterns present over time. If forecasting is your goal, the more you understand the structure of the time series before proceeding with analytical forecasting methods, the better you can direct the forecasting methods to obtain the most accurate forecast.

Example Data

The data for the following examples is the stock price of Apple, IBM, and Intel from 1985 through mid-2024, obtained from finance.yahoo.com. The data were obtained from three separate downloads and then concatenated into a single file. The data are available on the web as an Excel file at the following location, or as part of the lessR download.

lessR data read

Always enclose the file reference in quotes within the call to any read function, including lessR Read(). The following reads an Excel file of some stock prices.

#d <- Read("https://web.pdx.edu/~gerbing/data/StockPrice.xlsx")
or
d <- Read("StockPrice")

The output of Read() provides useful information regarding the data file read into the R date frame, here d. Always review this information to ensure the data was read and formatted correctly. There are three variables in the data table: Month, Company, and Price. The file contains 1350 rows of data, with 450 unique dates reported on the first day of each month. There is no missing data.

The variable Month contains the dates to be plotted on the $x$-axis for the resulting time series vi of customized colorssualization. The dates are repeated for each of the three companies.

Date/time variable type

Data analysis systems typically provide a variable type specifically for dates and times, which in R is the Date data type. Each Date data value consists of the year, month, and day.

Viewing the output of Read() under the heading of Type reveals that Month was read as a Date variable. This correct classification is because the data are stored in an Excel file, and Excel recognizes and treats character strings such as “08/18/2024” as date values. This Excel date formatting then transfers over to R when the data are read with the lessR function Read().

If the data file were stored as a text file, R would not automatically translate the character string of dates into a variable of type Date. In a text file, all data values are character strings. R converts character strings that are numbers to numeric variables when reading data into an R data frame. Not so with dates, which remain as character strings after reading into R. However, the relevant lessR functions also properly classify dates.

Traditionally, the variable with these date fields must be explicitly converted to a variable of type Date with the as.Date() function. Storing the data as an Excel file avoids this extra step because Excel has already done the conversion. Since lessR Version 4.3.9, lessR implicitly performs this conversion to an R Date variable.

Data Types
------------------------------------------------------------
character: Non-numeric data values
Date: Date with year, month and day
double: Numeric data values with decimal digits
------------------------------------------------------------

    Variable                  Missing  Unique 
        Name     Type  Values  Values  Values   First and last values
------------------------------------------------------------------------------------------
 1     Month      Date   1488       0     496   1985-01-01 ... 2026-04-01
 2   Company character   1488       0       3   Apple  Apple ... Intel  Intel
 3     Price    double   1488       0    1467   0.0953060239553452 ... 48.0299987792969
 4    Volume    double   1488       0    1486   175302400  137737600 ... 60705100  129598500
------------------------------------------------------------------------------------------

The dates are stored within R according to the ISO 8601 international standard, which defines a four-digit year, a hyphen, a two-digit month, a hyphen, and then a two-digit day.

ISO is the acronym for the organization that sets global standards for goods and services: the International Organization for Standardization (www.iso.org).

The following are sample data rows. The first column of numbers is not data values but rather row names.

The first four rows of data, which are the first four rows of Apple data.
The first four rows of IBM data.
The first four rows of Intel data.

        Month Company      Price    Volume
1  1985-01-01   Apple 0.09530602 175302400
23 1985-02-01   Apple 0.09787019 137737600
42 1985-03-01   Apple 0.08504876 247430400
63 1985-04-01   Apple 0.07393682 114060800

         Month Company     Price  Volume
11  1985-01-01     IBM  9.815045 3650540
231 1985-02-01     IBM 11.001374 3474394
422 1985-03-01     IBM 11.111190 5786472
632 1985-04-01     IBM 10.477431 3517907

         Month Company     Price   Volume
12  1985-01-01   Intel 0.3194395 27259200
233 1985-02-01   Intel 0.3484793 17068800
423 1985-03-01   Intel 0.3252473 30768000
633 1985-04-01   Intel 0.3339592 34238400

With the data, we can proceed to the visualizations.

One Time Series

We can plot the time series for any one of the three companies in the data table. Because the data file contains stock prices for three companies, to plot the time series for only one company, we need to subset the data using filtering.

Data filtering

Extract a subset of the entire data table for the specified analysis.

Every analysis system provides a way to filter the data. Set up the time series visualization by plotting share Price vs. Month, filtering the data to show only Apple’s stock price, as shown in Figure 1.

lessR filtered time series

Plot two variables with the lessR XY() function of the form XY(x,y). When the $x$-variable is of type Date, here named Month, XY() creates a time series visualization. The $y$-variable in this example is Price.

XY(Month, Price, filter=(Company=="Apple"))

filter: Parameter to specify the logical condition for selecting rows of data for the analysis.

To visualize data for only one company, we need to select the rows for that company. Select specified rows from the data table for analysis according to a logical condition.

The R double equal sign, == means is equal to.
The == does not set to equality, it evaluates equality, resulting in a value that is either TRUE or FALSE.
The expression (Company==“Apple”) evaluates to TRUE only for those rows of data for which the data value for the variable Company equals “Apple”.

pt_size: Parameter to specify the size of the plotted points. By default, when plotting a time series with lessR, the point size is 0. Set a positive number to visualize the plotted points, which are connected by line segments by default. However, for a time series with as many points to plot as in this example, monthly from January 1985 for more than 30 years, showing individual points is not feasible.

Figure 1: Time series of Apple stock price.

A desirable option is the ability to fill the area under the curve to highlight the form of the plotted time series, a visualization often referred to as an area chart. Figure 2 illustrates the area chart.

lessR area chart

style(quiet=TRUE, lab_x_cex=.75, lab_y_cex=.75, axis_x_cex=.62, axis_y_cex=.62) XY(Month, Price, filter=(Company=="Apple"),
ts_area_fill="slategray2", line_width=3)

ts_area_fill: Parameter to indicate to fill the area under the curve. Set the value to on to obtain the default fill color for the given color theme, or specify a specific color, such as with a color name.

line_width: Parameter to specify the line width of the time series line segments. In the accompanying plot, to increase the line thickness, the line width was set to 3 instead of the default 1.5.

Figure 2: Time series of Apple stock price with the default fill color.

Visualization systems also offer many customization options, including color options. Find an example in Figure 3.

lessR color customization

style(sub_theme="black")
XY(Month, Price, filter=(Company=="Apple"),
color="steelblue2", ts_area_fill="steelblue3", trans=.55)

style(): lessR function to set many style parameters. Here, set the background to black by setting the sub_theme parameter. Styles set with style() are persistent, that is, they remain set across the remaining visualizations until explicitly changed.

color: Parameter that sets the line color or edge color of a geometric object.

ts_area_fill: Parameter that sets the color of the area under the curve.

transparency: Parameter to set the transparency level, which can be shortened to trans. The value is a proportion from 0 (no transparency) to 1 (complete transparency, i.e., invisible).

Figure 3: Time series of Apple stock prices with a transparent fill color against a black background.

Several Times Series

One Panel

The variable Company in this data table is a categorical variable with three values: Apple, IBM, and Intel. Visualization systems typically offer options to stratify time series plots by a categorical variable, such as Company. One option plots all three time series in the same panel.

lessR stratified time series, same panel

XY(Month, Price, by=Company)

by: Parameter that specifies to plot a different visualization for each value of a specified categorical variable on the same panel as in Figure 4.

Figure 4: Time series of stock price for Apple, IBM, and Intel plotted on the same panel.

Another option when plotting multiple times series on the same panel offered by some visualization systems is to stack each time series on top of each other, what is often called a stacked area chart.

lessR stratified stacked area chart

XY(Month, Price, by=Company, ts_stack=TRUE, trans=0.4)

ts_stack: Set this parameter to TRUE to stack the plots on top of each other. When stacked, the Price variable on the y-axis is the sum of the corresponding Price values for each Company. The y-value for Apple at each date is its actual value because it is listed first (alphabetically by default). The y-value for IBM is the corresponding value for Apple plus IBM’s value. And, for Intel, listed last, each point on the y-axis is the sum of all three prices.

Figure 5: Stacked time series of stock price for Apple, IBM, and Intel plotted on the same panel.

Several Panels

A Trellis plot, more recently called a facet plot, stratifies the visualization on the levels of a categorical variable by plotting each level separately on a different panel (analogous to garden trellis).

lessR Trellis time series

XY(Month, Price, facet=Company)

facet: Parameter indicates to plot each time series on a separate panel according to the levels of the specified categorical variable.

Figure 6: Trellis time series visualizations of stock prices for Apple, IBM, and Intel.

Enhance the Trellis (facet) plot with a transparent orange fill on a black background, as shown in Figure 7.

lessR customized Trellis time series

XY(Month, Price, facet=Company,
color="orange3", ts_area_fill="orange1", trans=.55)

Figure 7: Trellis time series visualizations of stock prices for Apple, IBM, and Intel, with customized colors.

Missing Data

To demonstrate, modify the data table for Apple stock prices by removing the stock prices for 2017 through 2019. If the date were initially available in an Excel file, the corresponding cells for the missing Price values would typically be left blank.

Set Price for Years 2017 through 2019 as missing in the R data table

d[385:420, "Price"] <- NA

Rows 385 through 420 represent the data for years 2017 through 2019. The rows of data for those months are still present but the corresponding value of Price is set to the missing data code for R, NA, for not available.

The following brief excerpt from the modified data table shows the Price value missing for the first four months of 2017. These missing values continue throughout 2019. The NA value is the R code for data that is not available. Each data analysis system has a code for missing data, such as R’s NA. Blank cells, such as in an Excel workbook, would be converted to the missing data code, such as NA, when the data are read into the corresponding analysis system.

          Month Company    Price    Volume
8027 2016-11-01   Apple 25.54166 175303200
8048 2016-12-01   Apple 25.21226 148347600
8069 2017-01-01   Apple       NA 115127600
8089 2017-02-01   Apple       NA 447940000
8108 2017-03-01   Apple       NA 145658400
8131 2017-04-01   Apple       NA  79942800

Figure 8 shows the resulting time series visualization. The corresponding time series visualization plots as before, see Figure 1, except that no values are plotted for rows with Price missing.

Figure 8: Apple stock price with years from 2017 through 2019 missing the value of Price.

The Price values cannot be plotted when they are unavailable. Do note, however, that the data table contains entries even when the monthly Price is missing. For example, monthly data must be reported as monthly data even if the corresponding month’s data value is unavailable.

Aggregate Over Time

Suppose your data are recorded daily, but you wish to analyze quarterly sales. Visualizing the data as they exist shows the time series of daily sales. To visualize the time series of sales by quarter, sum the sales for each day across each entire quarter.

Aggregation

Compute a statistic such as a sum or a mean over a range of data, which, for time series data, is over a time unit such as months or quarters.

Of course, you cannot aggregate at a level below the detail at which the data are specified. If you have monthly data, as with Apple stock price data, you cannot specify an aggregation level of weeks because the data are not available at that frequency. Doing so would result in termination of the analysis with an explanatory message.

Aggregate by Sums

Consider three variables in the Superstore data table included with the Tableau data visualization system: Order.Date, Sales, and Profit.

Read, partially display relevant subset of Tableau Superstore data

#d <- Read("https://web.pdx.edu/~gerbing/data/Superstore.xlsx")
d <- Read("~/Documents/BookNew/data/Superstore/Superstore.xlsx")
d[1:10, .(Order.Date, Sales, Profit)]

   Order.Date    Sales   Profit
1  2021-01-03   16.448   5.5512
2  2021-01-04    3.540  -5.4870
3  2021-01-04   11.784   4.2717
4  2021-01-04  272.736 -64.7748
5  2021-01-05   19.536   4.8840
6  2021-01-06 2573.820 746.4078
7  2021-01-06    5.480   1.4796
8  2021-01-06   12.780   5.2398
9  2021-01-06  609.980 274.4910
10 2021-01-06   31.120   0.3112

The resulting d data frame is reasonably large, with 10,194 rows of data reporting 10,194 individual sales. As shown in the data for these three variables in the first 10 rows, sales are reported daily, with multiple sales per day. For example, on January 4, 2021, there were three sales for $3.54, $11.78, and $272.74. Their sum represents the total sales for that day.

Because the data are in Excel format, the variable Order.Date is already properly formatted as a date variable. However, there are multiple orders per day, so the time series plot of the original data is not what would be typically desired. At least, the sales data needs to be collapsed and aggregated to a daily basis, presumably by summing. For example, the sum of the three sales for January 4, 2021 is $288.06, the value plotted for that date. The aggregated daily sales data provide the data needed to plot the daily time series shown in Figure 9.

lessR time series aggregation by days

XY(Order.Date, Sales, ts_unit="days")

To aggregate with the lessR function XY(), access the first and possibly the second of the following parameters.

ts_unit: Specify a value that is longer than the natural time intervals in the data. Possible values are days, weeks, months, quarters, and years. For example, if each sale were recorded with its date, then a value of days would aggregate sales by day, yielding a daily time series of sales.

ts_agg: If the aggregation should be based on the mean, then specify the value of the parameter as "mean". The default value is "sum". For example, if a stock price for a company is recorded weekly and the time series should be visualized as monthly, then the average stock price for a month is generally the preferred basis for the aggregation.

There are gaps in the dates, so that there are not regular intervals between all the dates.

Figure 9: Superstore data aggregated by day.

However, even after sales are aggregated by day, the data remains too detailed, reporting a time series over three years on a daily basis. Sales at the extreme right of the time series in Figure 9 appear to be generally larger than those at the extreme left, but it is difficult to assess the change more precisely. Moreover, seasonality is also difficult to discern from the visualization of the daily time series. Instead, aggregate the data further by some larger time unit, such as quarters.

lessR time series aggregation by quarters

XY(Order.Date, Sales, ts_unit="quarters")

ts_unit: Parameter to specify the time unit for the aggregation. Based on functions from the xts package, currently implemented valid values include "days", "weeks", "months", "quarters", and "years".

ts_agg: Parameter that specifies the arithmetic operation of the aggregation. The default value is "sum", so no need to specify in this function call.

When aggregating sales by summing over consecutive quarters, the overall upward trend in sales is evident, as are the consistent seasonal fluctuations, with maximum sales in Q4 each year, as shown in Figure 10.

Figure 10: Superstore data aggregated by quarters.

In the next example, aggregate by the largest available time unit, years.

lessR time series aggregation by years

XY(Order.Date, Sales, ts_unit="years")

Figure 11 shows the trend over the four years.

Figure 11: Superstore data aggregated by years

Aggregating by year obscures seasonal variations but clearly shows the overall trend. There was an initial downturn from 2022 to 2023, followed by an observable increase in sales.

We can compare Sales and Profit in a single visualization by stacking the plots of the two time series, as shown in Figure 12.

Two stacked time series aggregated by year

XY(Order.Date, c(Sales, Profit), ts_unit="years", ts_stack=TRUE)

Specify the second parameter, the $y$-variable, as a vector, here of two variables: Sales and Profit.

ts_stack: Parameter that, when set to TRUE, stacks the second variable in the visualization, Profit, on top of the first variable, Sales.

Figure 12: Two stacked time series with data aggregated by year.

The visualization demonstrates that profitability increased with sales.

Aggregate by Means

Return to the StockPrice monthly data.

lessR data read

#d <- Read("https://web.pdx.edu/~gerbing/data/StockPrice.xlsx")\
d <- Read("StockPrice")

head(d)

         Month Company      Price    Volume
1   1985-01-01   Apple 0.09530602 175302400
23  1985-02-01   Apple 0.09787019 137737600
42  1985-03-01   Apple 0.08504876 247430400
63  1985-04-01   Apple 0.07393682 114060800
84  1985-05-01   Apple 0.07137269  57344000
106 1985-06-01   Apple 0.05470512 576016000

Previous examples visualized the data by the unit in which it is collected, monthly. Here, let’s aggregate to quarterly and yearly data. However, unlike the Tableau Superstore data, in this example, we wish to aggregate by mean rather than sum. In the Superstore data, each original data point, a recorded sales, is a part of the overall whole, such as a part of daily sales or monthly sales. To get the full daily or monthly sales, we cumulate the sales over that period.

However, for the stock price, each monthly price indicates a value for that time unit. To aggregate, we want the average stock price over the given time period to represent the stock’s value during that period. In this example, focus on Apple’s stock price as in Figure 13.

lessR filtered time series aggregated by quarterly means

XY(Month, Price, filter=(Company=="Apple"),
ts_unit="quarters", ts_agg="mean")

filter: Parameter to specify the logical condition for selecting rows of data for the analysis.

ts_agg: Parameter to specify the arithmetic operation for which to aggregate over time. The default value is "sum", so explicitly specify the "mean" aggregation.

size: Parameter to specify the size of the plotted points. By default, when plotting a time series with lessR, the point size is 0. Set a positive number to visualize the plotted points, which are connected by line segments by default.

Figure 13: Time series of Apple stock price aggregated by quarters.

Or, aggregate yearly for a smoother curve focusing more on the overall trend, shown in Figure 14.

lessR filtered time series aggregated by the yearly means.

XY(Month, Price, filter=(Company=="Apple"),
ts_unit="years", ts_agg="mean")

Figure 14: Time series of Apple stock price aggregated by yearly means