Overview

A choropleth map shades map areas according to the value of a variable for each area.

Build a choropleth map from two data files:

The purpose of the data manipulations is to merge the data files by the entity we are mapping, here the USA states. We need to get the data that we wish to highlight in the choropleth map merged with the file that contains the mapping data in the form of a simple features data frame. Use the ggplot2 function geom_sf() to plot the map from the merged data file, which is the simple features data file with the added information by state, here the Gini index.

Gini Index

The Gini index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income).

Access Needed Functions

library(maps)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)
suppressPackageStartupMessages(library(lessR))
style(suggest=FALSE)

Get the Data Files

Mapping Data

Get the mapping data for the states from the map() function from the maps package.

states <- map(database="state", fill=TRUE)

To draw a map with ggplot2(), we need the mapping data in the form of a simple features data file. To obtain this file format, convert the native map data to the simple features format with the sf package function st_as_sf(). (You can see the mapping data in the simple features data file is listed for the variable geom, given as coordinates for a multi-point polygon for each state.)

states <- st_as_sf(states)
head(states)
## Simple feature collection with 6 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                      ID                           geom
## alabama         alabama MULTIPOLYGON (((-87.46201 3...
## arizona         arizona MULTIPOLYGON (((-114.6374 3...
## arkansas       arkansas MULTIPOLYGON (((-94.05103 3...
## california   california MULTIPOLYGON (((-120.006 42...
## colorado       colorado MULTIPOLYGON (((-102.0552 4...
## connecticut connecticut MULTIPOLYGON (((-73.49902 4...

2021 Gini Data

The gini data is sourced from the USA government census site. The Gini information was extracted into the converted to a standard data file in Excel format.

d.gini <- Read("http://web.pdx.edu/~gerbing/521/resources/Gini2021byState.xlsx")
## [with the read.xlsx() function from Schauberger and Walker's openxlsx package] 
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1     State character     51       0      51   Alabama  Alaska ... Wisconsin  Wyoming
##  2      Gini    double     51       0      51   0.4823  0.4392 ... 0.4464  0.4638
## ------------------------------------------------------------------------------------------
## 
## 
## For the column State, each row of data is unique. Are these values
## a unique ID for each row? To define as a row name, re-read the data file
## with the following setting added to your Read() statement: row_names=1
head(d.gini)
##        State   Gini
## 1    Alabama 0.4823
## 2     Alaska 0.4392
## 3    Arizona 0.4629
## 4   Arkansas 0.4751
## 5 California 0.4924
## 6   Colorado 0.4604

We will merge the two files by USA state. The mapping data stores the names of the states in lowercase. The data file with the Gini data stores the names of states in uppercase. To merge data files the state names in the two data files must match. Here, convert the uppercase letters in the Gini data file to lowercase. Use the Base R function tolower() for the conversion.

d.gini$State = tolower(d.gini$State)
head(d.gini)
##        State   Gini
## 1    alabama 0.4823
## 2     alaska 0.4392
## 3    arizona 0.4629
## 4   arkansas 0.4751
## 5 california 0.4924
## 6   colorado 0.4604

Merge Data Files

Merge Gini data by USA State into the simple features file of mapping state info, which is named: states. After the merge, the simple features data file contains the variable Gini.

states = merge(states, d.gini, by.x="ID", by.y="State")
head(states)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##            ID   Gini                       geometry
## 1     alabama 0.4823 MULTIPOLYGON (((-87.46201 3...
## 2     arizona 0.4629 MULTIPOLYGON (((-114.6374 3...
## 3    arkansas 0.4751 MULTIPOLYGON (((-94.05103 3...
## 4  california 0.4924 MULTIPOLYGON (((-120.006 42...
## 5    colorado 0.4604 MULTIPOLYGON (((-102.0552 4...
## 6 connecticut 0.4985 MULTIPOLYGON (((-73.49902 4...

Create the Map

Plot with default blue sequential scale. The fill color of each state is set according to its Gini index, creating a choropleth map. Accomplish the choropleth map with one line of ggplot2 code via the geom_sf() function that accesses our simple features data file: states.

ggplot() + geom_sf(data=states, aes(fill=Gini))

We can enhance the map by plotting with a custom sequential scale using the Base R function hcl().

ggplot() + geom_sf(data=states, aes(fill=Gini)) +
  scale_fill_gradient(low=hcl(0, 70, 20), high=hcl(0, 70, 85))

For more enhancement, create a divergent palette to further highlight the differences between the Gini values of the states.

Also, clean the map some by removing the axis values and tick marks, as well as the gray background.

ggplot() + geom_sf(data=states, aes(fill=Gini)) +
  scale_fill_gradient(low=hcl(0, 70, 20), high=hcl(240, 70, 85)) +
  theme(
    panel.background = element_rect(fill="white"),
    axis.ticks=element_blank(),
    axis.text.x=element_blank(),
    axis.text.y=element_blank()
  )