Overview

Local spatial autocorrelation

Setup

Load required packages.

It is easy to use leaflet in R. The final results will be on a leaflet, so this outlines how to setup a leaflet map. First, lets use the pacman package to load all our dependent packages for the entire project:

if(!require(pacman)) {

    install.packages('pacman')

    library(pacman)
}

## Loading required package: pacman

# With pacman installed we can use the p_load() function instead of install.packages() and/or library()/require() to include all the packages we'll need for this project.

p_load(plyr

    ,sp
    
    ,spdep

    ,rgdal
    
    ,tidycensus

   #,sf # formatting this way with leading commas makes it very easy to exclude libraries. Here I've excluded the sf package using the hashtag

        ,leaflet
    )

Leaflet: interactive webmaps in R

Now we can begin to build a leaflet:

leaflet()

The screen is blank because we haven’t given any information to leaflet. Basemaps come in all types of styles which you can explore here. OpenStreetMaps (OSM) is the default, and that’s what we’ll set for now. We will change this later.

leaflet() %>% addTiles()

Notice we’re using a tidyverse tool by chaining functions together with a pipe (%>%) operator. The pipe is extremely useful to adding to the leaflet and workflow organization.

Next we’ll add data to the leaflet. It is important to know all data added to leaflet must be in WGS84 (this is known as “Web Mercator”). The tool sp::spTransform() can easily reproject data (NOTE: this can be done in the sf package as well and is some times easier). First, let’s add points in the form of fire stations around the Portland area.

# Read in fire stations spatial data. This is a shapefil from Metro - a management body for the city of Portland. The data can be found here: http://rlisdiscovery.oregonmetro.gov/?action=viewDetail&layerID=182
fs <- rgdal::readOGR('D:/projects/fire_stations/fire_sta.shp',verbose = FALSE)

# Assign a variable with the proj4 value for WGS84. 
wgs84 <- '+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0'

# Reproject
fs_wgs84 <- sp::spTransform(fs, sp::CRS(wgs84))

# Add it to a leaflet map using "addMarkers" (Markers means points)
leaflet() %>% 
  addTiles() %>% 
  addMarkers(data = fs_wgs84)

This is the result in its simplest form. We can add much more to this leaflet and make it look cleaner with a few more lines:

leaflet() %>% 
  addTiles() %>% 
  addCircleMarkers( # this will change how fire stations are represented on our map
    data = fs_wgs84,
    radius = 6, # Circe size
    color = 'red', # Line color
    fillColor = 'darkred', 
    weight = 1, # Thickness of the line
    fillOpacity = 0.7, # how transparent the fire stations are (between 0 and 1)
    popup = paste0("Station Number: ", fs$STATION) # This creates informational "popup" for our fire stations. It is pulled from the layer attributes.
  )

That is points. We can also map polygons. For this example we’ll use census data. A census API key is required. It is easy to obtain a ccensus API Key. Go to http://api.census.gov/data/key_signup.html, and then supply the key to the census_api_key function (in the tidycensus package we already loaded!) to use it throughout this session. You can get census data for anything anywhere in the US! For this portion, we’re looking at Portland, Oregon. Our census data will be for Multnomah County.

# Get tidycensus data for Multnomah County:
mult <- get_acs(
  geography = "block group", # this is how the data will be relayed
  state = "OR", # this is for Oregon
  variables = "B19049_001E", # this is the code for data attribution. In this case we're looking at median household income (MHI)
  county = "Multnomah", # specify county
  output = "wide",
  geometry = TRUE)

# Convert it to "Spatial" (sp) instead of sf. This is not necessary, and sf is better to work with in a lot of ways. Because this demo is in sp we'll stick with sp.
mult <- as(mult,"Spatial")

Note: When coding a leaflet map in RMarkdown, be sure to avoid using results='hide' in code chunks.

Now let’s add the census data we obtained to a leaflet!

leaflet() %>% 
  addTiles() %>% 
  addPolygons(data=mult)

Again, this is the simplest result, but we should add more to it to make it more informative and pretty! We changed addTiles() to addProviderTiles to chose a different basemap option.

# Leaflet has color ramp creator. We'll use it to make our map look beautiful:
mhi_colors <- leaflet::colorQuantile(
  palette = 'Greens', # This will accept any ColorBrewer color pallette! You can get an idea of how you want to color your map by visiting colorbrewer2.org.
  domain = mult$B19049_001E, # the MHI variable
  n = 9 # Number of breaks (separates into 9 different colors)
)

leaflet() %>% 
  addProviderTiles(providers$Stamen.Toner) %>% # A different basemap option.
  addPolygons(
    data = mult, 
    weight = 0.5, # outline weight of polygons
    color = 'black', # color of outline on polygons
    fillColor = mhi_colors(mult$B19049_001E), # the color ramp we created
    fillOpacity = 0.7,
    popup = paste0(
      "<b>Median Household Income</b>: $", # title of the popup
      prettyNum(mult$B19049_001E, ',') # the comma gets put into the MHI $value to make it easy to interpret.
    )
  )

By adding a new value using the ifelse function it is very easy to look at highest and lowest MHI areas in Portland:

color_new <- ifelse(mult$B19049_001E > 150000, "green", "cornsilk") # block groups with high MHI will be green, else they will have a very pale color (cornsilk)
color_new <- ifelse(mult$B19049_001E < 15000, "red", color_new) # block groups with low MHI will be red, else read the above code

# This is all the same as above, except "fillColor":
leaflet() %>% 
  addTiles('http://{s}.tile.stamen.com/toner-lite/{z}/{x}/{y}.png') %>% # This is a different way to add provider tiles
  addPolygons(
    data = mult, 
    weight = 0.5,
    color = 'black',
    fillColor = color_new, # we changed the way we look at our map
    fillOpacity = 0.7,
    popup = paste0(
      "<b>Median Household Income</b>: $",
      prettyNum(mult$B19049_001E, ',')
    )
  )

Popups provide critical information to maps, but can complicate info quickly. Adding a tiny bit of HTML can help you out:

italics: italics

BOLD: bold

 : line break

Going back to fire station data, we can add a bunch more information in a popup:

# Create a popup
fs_pop <- paste0(
  "<b>Station Number: </b>", fs$STATION, "<br>",
  "<b>Address: </b>", fs$ADDRESS, "<br>",
  "<b>City: </b>", fs$CITY, "<br>",
  "<b>District: </b>", fs$DISTRICT
)
fs_pop[1] # this is shows us a quick example, in the console, of how the popup will look

## [1] "<b>Station Number: </b>50<br><b>Address: </b>12617 SW WALNUT ST<br><b>City: </b>TIGARD<br><b>District: </b>TUALATIN VALLEY FIRE & RESCUE"

This bit of HTML tells browsers how to format the popup

leaflet() %>% 
  addTiles() %>% 
  addMarkers(
    data = fs_wgs84, 
    popup = fs_pop) # this is the new popup. Click around to see the info supplied

There is a TON more to do with leaflet. This basic tutorial should provide enough information to explain how the next section displays spatial autocorrelation!

Local Getis-Ord G Analysis

To understand the Local Getis-Ord statistic, use tidycensus to obtain data. First, use get_acs() to acquire data on percentage of households on food assistance in the Detroit Metro Area at the tract level. This table explains the codes in census data with the information we’re looking for:

variable	code
Total Households	B22010_001E
Number of Houses on Food Assistance	B22010_002E

The Detroit Metro Area will be defined as the following counties: Oakland, Macomb, and Wayne.

These steps will create:

A leaflet web map with results from a spdep::localG() analysis. The map will have:
- a color ramp depicting the autocorrelation test
- popups describing polygon results
I’ll provide a little write up about the results. It is essential we know how to explain our results, what they mean, and what this means in the bigger picture.

Part 1: Leaflet web map with Local Getis-Ord

# Get ACS for total households in detroit.
detroit <- tidycensus::get_acs(
  geography = 'tract',
  state = 'MI',
  county = c('Wayne', 'Oakland', 'Macomb'),# Metro Detroit is in three counties for this example
  variables = c('B22010_001E', 'B22010_002E'), # the codes for total number of households and households on food assistance
  geometry = TRUE,
  output = 'wide', )

## Getting data from the 2013-2017 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

We have collected our data using the code above. It is stored as detroit. We’ll transform the projection to map data. We already assigned a proj4 variable: wgs84 <- '+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0'

# Reproject detroit. Notice any difference in the code?
detroit_wgs84 <- st_transform(detroit, CRS(wgs84))

Now it can be mapped:

# Maps tracts in the detroit area:
leaflet() %>% 
  addTiles() %>% 
  addPolygons(data=detroit_wgs84)

Let’s start creating the variables needed to run Getis-Ord. This requires data manipulation, and these simple steps can complete the needed process.

# thh = total households
detroit$thh <- detroit$B22010_001E 

# foods = total households with food assistance
detroit$foods <- detroit$B22010_002E

# Divide these two vectors together to get percentage on stamps
detroit$pct_snap <- detroit$foods/detroit$thh

# Make NAs = 0 (***NOTE: this is NOT a scientifically accurate way to do this. It is just the easiest...
detroit$pct_snap <- ifelse(is.na(detroit$pct_snap), 0 , detroit$pct_snap) 

# All results in the sf object have to be above 0. 
isAboveZero <- as.vector(sf::st_area(detroit)) > 0 

# Checking the result shows one item is NOT above zero...
table(isAboveZero)

## isAboveZero
## FALSE  TRUE 
##     1  1165

# Remove that one result and convert the vector to polygons:
detroit <- as(detroit[isAboveZero,], "Spatial")

# Using Queen Contiguity, reading from the spdep library, set pl(polygons list). This DEFINES THE NEIGHBOR
detroit_nb <- spdep::poly2nb(pl = detroit, queen = TRUE)

# Above code set our neighbors and now this sets a weight for our neighbors. Style = W 
detroit_w <- spdep::nb2listw(neighbours = detroit_nb, style = "W", zero.policy = TRUE)

#now that I have weights and % food assistance I can run local getis-ord at the tract level:
tract_local_g <- as.vector(spdep::localG(x = detroit$pct_snap,listw = detroit_w, zero.policy = TRUE))

# Take a look at some results to get an idea of what was just created (we can look at all the results by typing just tract_local_g):
summary(tract_local_g)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -3.42154 -1.78094 -0.84754  0.00482  1.87669  6.19447

Local Getis-Ord has returned z-scores between -3.4 and 6.2. This statistic can describe where hot and cold spots cluster. In this case, the statistic describes where percentages of households on SNAP benefits are highest and where they are lowest. Because the result is a z-score it doesn’t clearly describe what is or is not significant. To parse out significance, create a color ramp at the critical values of bounds on normal distribution: anything outside of -1.96 to 1.96.

# For all colors put nothing
color4_foods_z <- NA

# For all z-scores between -1.96 and 1.96 put a boring color. These are not critical.
color4_foods_z <- ifelse(tract_local_g > -1.96 & tract_local_g < 1.96, "gray", color4_foods_z)

# For all z-scores between 1.96 and 2.58 put a color that stands out a bit. These are critical at p=0.05
color4_foods_z <- ifelse(tract_local_g >= 1.96 & tract_local_g < 2.58, "pink", color4_foods_z)

# For all z-scores above 2.58 put a color that REALLY stands out; p = 0.01
color4_foods_z <- ifelse(tract_local_g >= 2.58, "firebrick", color4_foods_z)

# For all z-scores between -1.96 and -2.58 put a color that stands out a bit. These are critical at p=0.05 at the other tail.
color4_foods_z <- ifelse(tract_local_g < -1.96 & tract_local_g > -2.58, 'lightblue', color4_foods_z)

# For all z-scores below -2.58 put a color that REALLY stands out. p = 0.01
color4_foods_z <- ifelse(tract_local_g <= -2.58, "blue", color4_foods_z)

We have a Local Getis-Ord statistic and a color ramp. The next step is to add it to a leaflet.

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = detroit, 
    weight = 1,
    color = 'black',
    fillColor = color4_foods_z, # The color ramp created above
    fillOpacity = 0.7,
    popup = paste0(
      "<b>Households on Food Assistance (%)</b>: ",
      prettyNum(100*detroit$pct_snap)
    )
  )

# For more experience, figure out how to add % in popup (instead of in the title) and reduce decimals

Part 2: what do the results show?

I created a little write-up about the results. It is only meant to demonstrate how the results might tell a story. Many stories can be told from these same results:

The percentage of households on food assistance increases substantially around one spot. The data display major clustering. The location of 8 mile is the southern border of the hot spot (8 mile is an infamous road separating high and low income areas in Detroit). This fits with our visual. It is difficult to tell without further analysis, but there appears to be a density component as well. The cluster of tracts with more food assistance appears to be highly dense which increases the number of households in the area, while tract polygons grow much larger on the fringe. These individual households are likely much larger by area - houses commonly found in suburbs. There is a somewhat large buffer between areas of high food assistance and areas with low food assistance. This city seems to have an extreme income disparity. I’ve heard the vacancy rate in Detroit was one of the highest in the country. I wonder how this might impact the data.

Local Moran’s I (Univariate)

This will repeat the addition of Local Getis-Ord statistic on a leaflet, but instead we’ll use a different statistic: Local Moran’s I.

What’s the difference?

There are similarities and differences between Local Moran’s I and Local Getis-Ord, and it can be complicated to know which to use and when. The biggest difference is when using Local Moran’s I the value of the feature is not included, just neighboring values. For Local Getis-Ord each feature is included in its analysis. For example, say you are looking at your house compared to the houses around your house; for Moran’s I your house would not be included in the analysis but it would impact the results of the houses around you (and vice versa). For Getis-Ord your house is included in the analysis. This is described further here.

To determine which statistic to use in what scenarios it depends on the question one asks over what scale. This demonstration does both to show how to run different statistics and how to add them to a leaflet. It is up to the user to determine which statistic to add in their own analysis.

Products from the addition of Local Moran’s I

A Local Moran’s I scatter plot with clusters colored (eg. High-High, Low-High, etc.).
A leaflet map showing results of a spdep::localG() analysis. This map has:
- color representation for this spatial autocorrelation test (the Local Moran’s I results)
- a popup with basic information about the resulting polygons
Again, I’ll provide a little write up about the results because it is still essential we know how to explain our results, what they mean and what this means in the bigger picture.

Part 1. Make a pretty graph

First, a Local Moran’s I scatter plot provides information about the variable we’re looking at: food assistance in Detroit.

# This runs Local Moran's I using the perent food assistance and list weight created in part one:
tract_local_morans <- as.data.frame(spdep::localmoran(x = detroit$pct_snap, listw = detroit_w))

# Create a lag in the variable "detroit$pct_snap" using the list weight. 
detroit$lag_pct_snap <- spdep::lag.listw(x = detroit_w, var = detroit$pct_snap, zero.policy = TRUE)

# Take a look at a plot to check if things are looking good so far. Yes they are.
spplot(detroit,'pct_snap', main='Percent of households on food assistance in Detroit')

spplot(detroit,'lag_pct_snap', main='Lag of Percent of households on food assistance in Detroit')

# Scale data to make the graph look cleaner
scale_lag_pct_snap <- scale(detroit$lag_pct_snap)  
scale_pct_snap <- scale(detroit$pct_snap) 

# Make a linear model using the unscaled data. The slope of the line is Moran's I!
mod <- lm(detroit$lag_pct_snap ~ detroit$pct_snap)

The Local Moran’s I statistic is reported between -1 and 1, where -1 represents clustered negative values, 0 represents spatial randomness and 1 represents clustered positive values.

# The Global Moran's I value:
coefficients(mod)[2]

## detroit$pct_snap 
##        0.7309537

# Take a closer look at the results
summary(mod)

## 
## Call:
## lm(formula = detroit$lag_pct_snap ~ detroit$pct_snap)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30523 -0.04309 -0.01142  0.03747  0.51746 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.052755   0.003524   14.97   <2e-16 ***
## detroit$pct_snap 0.730954   0.013139   55.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08103 on 1163 degrees of freedom
## Multiple R-squared:  0.7269, Adjusted R-squared:  0.7266 
## F-statistic:  3095 on 1 and 1163 DF,  p-value: < 2.2e-16

The next step builds a color ramp, similar to what we did for Local Getis-Ord. However, this is just for visualization on a graph. We’ll add the color ramp to a leaflet in the next section.

# Set up the color scheme for the symbology on the graph. h = high and l = low, so we can easily graph clusters and see a trend
hh <- as.vector(scale_pct_snap > 0 & scale_lag_pct_snap > 0)
lh <- as.vector(scale_pct_snap < 0 & scale_lag_pct_snap > 0)
ll <- as.vector(scale_pct_snap < 0 & scale_lag_pct_snap < 0)
hl <- as.vector(scale_pct_snap > 0 & scale_lag_pct_snap < 0)

quad_color <- NA
quad_color <- ifelse(hh, 'red', quad_color)
quad_color <- ifelse(lh, 'skyblue', quad_color)
quad_color <- ifelse(ll, 'navy', quad_color)
quad_color <- ifelse(hl, 'orange', quad_color)

Plot the results on a graph

# Main = title, xlab = label on the x axis, pch = plotted symbology, col = color for pch, xlim = length of x axis, ylim = length of yaxis, cex = pch size.
plot(scale_lag_pct_snap ~ scale_pct_snap, 
    main = "Food assistance in Detroit: Lag vs. Estimate", 
    xlab = "Estimated food assistance", 
    ylab = "Lag food assistance", 
    pch = 16, 
    col = quad_color, 
    xlim = c(-2,4), 
    ylim = c(-2,4), 
    cex = 0.7)

# Add the linear model. col = line color, lwd = thickness of line
abline(mod, col = 'maroon', lwd = 1.5)

# Add crosshairs on x =0 and y = 0. h = x axis, v = y axis, col = line color, lwd = line thickness, lty = line type
abline(h = 0, col = 'gray60', lwd = 2, lty = 5)
abline(v = 0, col = 'gray60', lwd = 2, lty = 5)

# Add text for each quadrant and for Morans I (the slope). Place text on the graph using coordinates (2, 3; -1.5, 3; etc.)
text(2, 3, "High-High", cex = 1.5, col = 'royalblue3')
text(-1.5, 3, "Low-High", cex = 1.5, col = 'royalblue3')
text(-1.5, -1.8, "Low-Low", cex = 1.5, col = 'royalblue3')
text(2, -1.8, "High-Low", cex = 1.5, col = 'royalblue3')
text(2.5, -0.5, paste0("Moran's I: ~", round(mod$coefficients[2],3)))

Part 2: Add this to a leaflet web map….

# Re-center a leaflet on detroit (for organizational purposes):
leaflet() %>% 
  addTiles() %>% 
  addPolygons(data = detroit)

# Take a look at Local Moran's I summary:
summary(tract_local_morans)

##        Ii               E.Ii                Var.Ii       
##  Min.   :-2.2588   Min.   :-0.0008591   Min.   :0.06161  
##  1st Qu.: 0.1569   1st Qu.:-0.0008591   1st Qu.:0.12407  
##  Median : 0.5331   Median :-0.0008591   Median :0.14192  
##  Mean   : 0.7310   Mean   :-0.0008591   Mean   :0.16251  
##  3rd Qu.: 0.9101   3rd Qu.:-0.0008591   3rd Qu.:0.16572  
##  Max.   : 5.3230   Max.   :-0.0008591   Max.   :0.99860  
##       Z.Ii           Pr(z > 0)       
##  Min.   :-6.8019   Min.   :0.000000  
##  1st Qu.: 0.3971   1st Qu.:0.009928  
##  Median : 1.2868   Median :0.099085  
##  Mean   : 1.8625   Mean   :0.191358  
##  3rd Qu.: 2.3290   3rd Qu.:0.345632  
##  Max.   :14.2847   Max.   :1.000000

# Copy the color ramp from the graph above and add code for insignificant values.
hh <- as.vector(scale_pct_snap > 0 & scale_lag_pct_snap > 0)
lh <- as.vector(scale_pct_snap < 0 & scale_lag_pct_snap > 0)
ll <- as.vector(scale_pct_snap < 0 & scale_lag_pct_snap < 0)
hl <- as.vector(scale_pct_snap > 0 & scale_lag_pct_snap < 0)

quad_color <- NA
quad_color <- ifelse(hh, 'red', quad_color)
quad_color <- ifelse(lh, 'skyblue', quad_color)
quad_color <- ifelse(ll, 'navy', quad_color)
quad_color <- ifelse(hl, 'orange', quad_color)
quad_color <- ifelse(tract_local_morans[5] > 0.05, 'snow', quad_color)


# Get MHI data for detroit so we can put it in a pop up
detroit_mhi <- tidycensus::get_acs(
  geography = 'tract',
  state = 'MI',
  county = c('Wayne', 'Oakland', 'Macomb'),# Detroit is in three counties
  variables = c('B19049_001E'), # Code for median household income
  geometry = TRUE,
  output = 'wide')

## Getting data from the 2013-2017 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

# Adds MHI column to our dataset
detroit_mhi$mhi <- detroit_mhi$B19049_001E

This should set up everything to put our data onto a leaflet!

# Add color ramp to the map along with a pop up stating MHI.
leaflet() %>% 
  addTiles() %>% 
  addPolygons(
  data = detroit, 
  weight = 1,
  color = 'black',
  fillColor = quad_color, # Created color ramp
  fillOpacity = 0.7,
  popup = paste0(
      "<b>Median Household Income ($) </b>: ",
      prettyNum(detroit_mhi$mhi, ',')
    )
  )

Part 3. Brief story about these results:

The biggest difference I see between the Getis-Ord leaflet and the Morans I leaflet is that none of the middling areas (I made them light blue or orange) are significant. The final map for Morans I is just red, blue or white with all other tracts showing no significance. It’s hard for me to tell if I did something wrong or if these results are accurate…other than that, I see very little difference between the leaflets for part 1 and part 2. The scaled Morans I plot shows lots of clustering in the low-low quadrant which seems to suggest high-income people live near high-income people less frequently than low-income people living near low-income people. As noted before, this is likely tied to the size of the tracts: tracts get bigger moving out from the center of the city.

I added median household income. Looking at median household income might provide a glimpse into the misuse of government assistance. In this case, I’m not seeing tracts with high income and high percentages of food assistance. Other interesting variables might include access to health insurance (or lack thereof), average household price, and it would be really interesting to determine average food prices in red areas compared to blue areas. Who is paying more for food overall? Is it easier to increase food prices for high-income areas because they are more willing to pay more for increased quality? Or can low-income areas afford high quality food because they benefit from a system that can provide access to the same goods high-income households have access to?

Local Spatial Autocorrelation and Leaflet

Spatial Quantitative Analysis

February 6th, 2019