A choropleth map shades map areas according to the value of a variable for each area.
Build a choropleth map from two data files:
The purpose of the data manipulations is to merge the data files by
the entity we are mapping, here the USA states. We need to get the data
that we wish to highlight in the choropleth map merged with the file
that contains the mapping data in the form of a simple features data
frame. Use the ggplot2 function geom_sf()
to plot the map
from the merged data file, which is the simple features data file with
the added information by state, here the Gini index.
The Gini index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income).
library(maps)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)
suppressPackageStartupMessages(library(lessR))
style(suggest=FALSE)
Get the mapping data for the states from the map()
function from the maps
package.
states <- map(database="state", fill=TRUE)
To draw a map with ggplot2()
, we need the mapping data
in the form of a simple features data file. To obtain this file format,
convert the native map
data to the simple features format
with the sf
package function st_as_sf()
. (You
can see the mapping data in the simple features data file is listed for
the variable geom
, given as coordinates for a multi-point
polygon for each state.)
states <- st_as_sf(states)
head(states)
## Simple feature collection with 6 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937
## Geodetic CRS: +proj=longlat +ellps=clrk66 +no_defs +type=crs
## ID geom
## alabama alabama MULTIPOLYGON (((-87.46201 3...
## arizona arizona MULTIPOLYGON (((-114.6374 3...
## arkansas arkansas MULTIPOLYGON (((-94.05103 3...
## california california MULTIPOLYGON (((-120.006 42...
## colorado colorado MULTIPOLYGON (((-102.0552 4...
## connecticut connecticut MULTIPOLYGON (((-73.49902 4...
The gini data is sourced from the USA government census site. The Gini information was extracted into the converted to a standard data file in Excel format.
d.gini <- Read("http://web.pdx.edu/~gerbing/521/resources/Gini2021byState.xlsx")
## [with the read.xlsx() function from Schauberger and Walker's openxlsx package]
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 State character 51 0 51 Alabama Alaska ... Wisconsin Wyoming
## 2 Gini double 51 0 51 0.4823 0.4392 ... 0.4464 0.4638
## ------------------------------------------------------------------------------------------
##
##
## For the column State, each row of data is unique. Are these values
## a unique ID for each row? To define as a row name, re-read the data file
## with the following setting added to your Read() statement: row_names=1
head(d.gini)
## State Gini
## 1 Alabama 0.4823
## 2 Alaska 0.4392
## 3 Arizona 0.4629
## 4 Arkansas 0.4751
## 5 California 0.4924
## 6 Colorado 0.4604
We will merge the two files by USA state. The mapping data stores the
names of the states in lowercase. The data file with the Gini data
stores the names of states in uppercase. To merge data files the state
names in the two data files must match. Here, convert the uppercase
letters in the Gini data file to lowercase. Use the Base R function
tolower()
for the conversion.
d.gini$State = tolower(d.gini$State)
head(d.gini)
## State Gini
## 1 alabama 0.4823
## 2 alaska 0.4392
## 3 arizona 0.4629
## 4 arkansas 0.4751
## 5 california 0.4924
## 6 colorado 0.4604
Merge Gini data by USA State into the simple features file of mapping state info, which is named: states. After the merge, the simple features data file contains the variable Gini.
states = merge(states, d.gini, by.x="ID", by.y="State")
head(states)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937
## Geodetic CRS: +proj=longlat +ellps=clrk66 +no_defs +type=crs
## ID Gini geometry
## 1 alabama 0.4823 MULTIPOLYGON (((-87.46201 3...
## 2 arizona 0.4629 MULTIPOLYGON (((-114.6374 3...
## 3 arkansas 0.4751 MULTIPOLYGON (((-94.05103 3...
## 4 california 0.4924 MULTIPOLYGON (((-120.006 42...
## 5 colorado 0.4604 MULTIPOLYGON (((-102.0552 4...
## 6 connecticut 0.4985 MULTIPOLYGON (((-73.49902 4...
Plot with default blue sequential scale. The fill
color
of each state is set according to its Gini index, creating a choropleth
map. Accomplish the choropleth map with one line of ggplot2
code via the geom_sf()
function that accesses our simple
features data file: states.
ggplot() + geom_sf(data=states, aes(fill=Gini))
We can enhance the map by plotting with a custom sequential scale
using the Base R function hcl()
.
ggplot() + geom_sf(data=states, aes(fill=Gini)) +
scale_fill_gradient(low=hcl(0, 70, 20), high=hcl(0, 70, 85))
For more enhancement, create a divergent palette to further highlight the differences between the Gini values of the states.
Also, clean the map some by removing the axis values and tick marks, as well as the gray background.
ggplot() + geom_sf(data=states, aes(fill=Gini)) +
scale_fill_gradient(low=hcl(0, 70, 20), high=hcl(240, 70, 85)) +
theme(
panel.background = element_rect(fill="white"),
axis.ticks=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank()
)