Geospatial data refers to data which has a geographic component in it. Usually this means that the data entries are associated to some point on the surface of the earth. Therefore, geospatial data are usually visualized on maps.

Because the earth is round, in order to make a flat map,
one must transform the earth's surface to a flat surface. Such transformations are called *projections* by cartographers (not to be confused with linear projections from linear algebra). Mathematicians know that a transformation between topologically different regions
must be discontinous somewhere. So these projections, while very useful, cannot be perfect replicas of reality. It is useful to know this and a bit more about python modules for projections while attempting to visualize geospatial data on the globe.

Many references, including
Geographic Data with Basemap of [JV-H], use the python module `basemap`

. However in recent years, the module basemap has been deprecated in favor of the new python mapping project called Cartopy. Therefore, this activity aims at taking a quick look at cartopy. Cartopy, together with geopandas, a package built on top of pandas, shows promise in easing geospatial visualization. They are nonetheless relatively new efforts. You will notice that their documentation, while constantly improving, is not as mature as, say numpy. There will likely be a number of changes in the provided facilities as these efforts take hold. Our goal in this activity is to get acquainted with these new developments for visualizing geospatial data.

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import geopandas as gpd
from cartopy import crs
```

The `GeoDataFrame`

class of `geopandas`

is a `pandas`

data frame with a special column representing geometry. This column is a `GeoSeries`

object, which may be viewed as a `pandas`

series where each entry is a set of shapes. The shapes are geometric objects like a a set of points, lines, a single polygon, or many polygons. These shapes are objects made using the shapely package. Together they allow easy interaction with `matplotlib`

for plotting geospatial data.

Here is a `GeoDataFrame`

object we have already used in 01 Overview:

In [2]:

```
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
type(world)
```

Out[2]:

The above-mentioned special column of the `world`

data frame is this.

In [3]:

```
type(world.geometry)
```

Out[3]:

A `GeoDataFrame`

can have many columns of type `GeoSeries`

, but there can only be one *active* geometry column, which is what is returned by the attribute `GeoDataFrame.geometry`

. Note also that the name of this active `GeoSeries`

can be anything (not necessarily `geometry`

), but in the `world`

object, the column happens to be called `geometry`

, as can be seen below.

In [4]:

```
world.geometry.name
```

Out[4]:

In [5]:

```
world.head()
```

Out[5]:

The `plot`

method of the data frame is redefined in a `GeoDataFrame`

to use the geometry objects in the active geometry column. So to plot this map, all we have to do is use the `plot`

method:

In [6]:

```
world.plot()
```

Out[6]:

A `GeoDataFrame`

has more attributes than a regular `pandas`

data frame. For example, it stores the centroids of the shapes in the active geometry column.

In [7]:

```
type(world.centroid)
```

Out[7]:

This is a `GeoSeries`

that did not show up when we queried `world.head()`

, but it is an attribute of `world`

. We can, of course, make it an additional column of `world`

by in the usual pandas way.

In [8]:

```
world['centroids'] = world.centroid
world.head()
```

Out[8]:

Now, `world`

has two `GeoSeries`

columns. If we make the `centroids`

column the *active* geometry column, then the output of the `plot`

method changes since it uses the active column's geometry specifications.

In [9]:

```
world = world.set_geometry('centroids') # change the active geometry column
world.plot();
```

An essential data structure of cartopy is CRS, or *Coordinate Reference Systems,* the name used by cartopy (and other python projects) for the projections used in maps. A CRS often used as default is the the Plate Carrée projection, which cartopy provides as follows.

In [10]:

```
crs.PlateCarree()
```

Out[10]:

As you guessed from the above output, the CRS object is able to plot itself using matplotlib. This points to one avenue for visualizing geospatial data that has no need for geopandas. Cartopy produces a matplotlib axis on which you can overlay your data as you see fit: if your data has latitude and longitude associated to it, cartopy can apply the relevant projection automatically to place it at the proper place in the figure. Below, we will focus on alternative visualization methods using geopandas and facilities to interact between cartopy and geopandas.

The `world`

object of class `GeoDataFrame`

comes with a CRS attribute, another attribute that does not show up in `world.head()`

output.

In [11]:

```
world.crs
```

Out[11]:

We see that the CRS above is codenamed EPSG:4326. Here EPSG stands for European Petroleum Survey Group. They maintain several data sets at https://spatialreference.org/ref/epsg/. Each set in “EPSG Geodetic Parameter Dataset is a collection of definitions of coordinate reference systems and coordinate transformations which may be global, regional, national or local in application.”

EPSG-number 4326 that we have here belongs to the WGS 84 coordinate system,
the latest in the
World Geodetic System (WGS)
where the earth is represented as an oblate spheroid with coordinates in
decimal degrees (`Lat, Lon`

).

Changing the active geometry from previously set `centroids`

to the original `geometry`

column, let us plot the `world`

again in order to compare the result with another CRS.

In [12]:

```
world = world.set_geometry('geometry') # set the active geometry
world.plot(); plt.title('World in WGS 84 CRS');
```

We compare this output with another very well-known projection, the Mercator projection, which has the nice property that it is conformal, i.e., it preserves angles. EPSG has made available a *Web Mercator* projection. We can easily
convert `world`

from the WGS 84 to the Mercator CRS:

In [13]:

```
world_Mercator = world.to_crs("EPSG:3395")
world_Mercator.plot();
plt.title('World in Mercator CRS');
```

Generations of school kids have grown up with this Mercator map.
Note how
the Mercator projection distorts the size of objects as the latitude increases from the equator to the poles.
Some countries near equator look smaller than they really are. For example, Brazil is actually *larger* than the contiguous United States, but it looks smaller in the Mercator projection. If you have time for a digression, have a look into the many discussions online, e.g., The Problem With Our Maps,
on how the false sizes on maps (perhaps inadvertently) shape our views on countries.

The *Azimuthal Equidistant* and *Albers Equal Area* coordinate reference systems show areas of the globe in better proportions than the Mercator projection, as their names suggest. These are implemented in cartopy. We want to leverage geopandas' ability to work with cartopy in the next step. We don't always get meaningful plots after an arbitrary CRS to CRS conversion, but what is officially possible is laid out in the current geopandas documentation, which would be a good source to check back on for future changes.

First, for conversion from the default WGS 84 to the azimuthal equidistant CRS, we create a cartopy CRS object.

In [14]:

```
ae = crs.AzimuthalEquidistant()
type(ae)
```

Out[14]:

Then we convert the cartopy object to a form usable by geopandas through an intermediate step. That step offers a glimpse of what lies at the core of many of the mapping tools, the PROJ project. (All these dependencies made installing geopandas more involved, as you recall from our early install sessions.)

In [15]:

```
aeproj4 = ae.proj4_init # Convert to`proj4` string/dict usable in gpd
world_ae = world.to_crs(aeproj4) # Then call to_crs method
world_ae.plot()
```

Out[15]:

This represents the geopandas object `world_ae`

, which is the `world`

object whose geometry has been converted to the azimuthal equidistant CRS. From the above output, it we see that the azimuthal equidistant projection shows the central view in good proportions, with obvious distortions farther out although the distortions are evocative of the globe. However, this CRS is often difficult to use for showing data that is spread over the populous parts of the globe. (You can change the central view as shown in the next cell.) See, for example, how difficult it is to get the far east, the west, and Europe, together in any perspective, due to the vastness of the intermediate Pacific ocean.

In [16]:

```
crs.AzimuthalEquidistant(central_longitude=200, central_latitude=10)
```

Out[16]:

It is therefore useful to get to know another projection from `cartopy`

called `AlbersEqualArea`

projection.

In [17]:

```
aea = crs.AlbersEqualArea()
aea
```

Out[17]:

Finally, as an illustration of how to plot geopandas geometries into an axis generated by cartopy, we will convert or project the existing geometry objects in `world_ae`

to `AlbersEqualArea`

CRS as shown below.

In [18]:

```
aea_geo = [aea.project_geometry(ii, src_crs=ae)
for ii in world_ae['geometry'].values]
```

Since `cartopy`

works directly with `matplotlib`

, we can immediately render the resulting geometries in matplotlib's axis.

In [19]:

```
fig, ax = plt.subplots(subplot_kw={'projection': aea})
ax.add_geometries(aea_geo, crs=aea);
```

We can alternately produce essentially the same plot using geopandas as follows.

In [20]:

```
gpd.GeoDataFrame(world, geometry=aea_geo, crs=aea.proj4_init).plot();
```

As an application of the above-discussed geospatial visualization techniques, we will now make a map of COVID-19 cases throughout the world using the `AlbersEqualArea`

coordinate reference system.

In [21]:

```
import os
from git import Repo
```

First update your data folder by pulling the newest reports of COVID-19 from the GitHub repository maintained by Johns Hopkins' researchers.

In [22]:

```
covidfolder = '../../data_external/covid19'
if os.path.isdir(covidfolder): # if repo exists, pull newest data
repo = Repo(covidfolder)
repo.remotes.origin.pull()
else: # otherwise, clone from remote
repo = Repo.clone_from('https://github.com/CSSEGISandData/COVID-19.git',
covidfolder)
datadir = repo.working_dir + '/csse_covid_19_data/csse_covid_19_time_series'
f = datadir + '/time_series_covid19_confirmed_global.csv'
```

In [23]:

```
c = pd.read_csv(os.path.abspath(f))
c = c.rename(columns={'Country/Region': 'country'}).iloc[:, 1:]
c.head()
```

Out[23]:

Some countries are repeated (as their provinces are being counted in separate data entries), as can be seen from the following nonzero difference:

In [24]:

```
len(c['country']) - len(set(c['country']))
```

Out[24]:

Therefore, we sum up each country's confirmed COVID-19 counts for each day using a `pandas`

groupby operation. Of course, it doesn't make sense to sum up the latitudes and longitudes, so we take their average values within countries. (Taking the mean of latitudes and longitudes will do for our current purposes, but it is definitely not a perfect solution, which is why I'll be asking you to improve on it by making a chloropleth map in the next assignment.)

In [25]:

```
cg = c.groupby('country')[c.columns[3:]].sum()
cg['Lat'] = c.groupby('country')['Lat'].mean()
cg['Long'] = c.groupby('country')['Long'].mean()
```

This newly created data frame has no `GeoSeries`

. We use the latitude and longitude information to create point geometries. With this information, we can make a `GeoDataFrame`

.

In [26]:

```
geo = gpd.points_from_xy(cg['Long'], cg['Lat'])
c_aea_geo = [aea.project_geometry(ii) for ii in geo]
cg = gpd.GeoDataFrame(cg, geometry=c_aea_geo, crs=aea.proj4_init)
```

Now, in `cg`

, we have a `GeoDataFrame`

object that should know how to plot its data columns in the data's associated places on the globe.

In [27]:

```
def covidworldmap(date):
fig, ax = plt.subplots(figsize=(12, 10))
# put the world map on an axis
w = gpd.GeoDataFrame(world, geometry=aea_geo, crs=aea.proj4_init)
w.plot(ax=ax, color='midnightblue', edgecolor='darkslategray')
ax.set_facecolor('dimgray')
mx = cg.iloc[:, :-3].max().max() # get max across data
# set marker sizes, with a min marker size for cases > 1000
msz = 500 * np.where(cg[date]-1000, np.maximum(cg[date]/mx, 0.001), 0)
cg.plot(ax=ax, cmap='Wistia', markersize=msz, alpha=0.5)
ax.set_xticks([]) # remove axis marks
ax.set_yticks([]);
```

In [28]:

```
covidworldmap('5/5/20')
```