The SEIR model of infectious diseases

April 22, 2020

Recent news of COVID-19 has brought to our attention the stories of the many earlier pandemics the world has seen. A classic case is a strain of influenza that invaded New York City during 1968-1969, then dubbed the Hong Kong flu. The following data (from [DM]) shows the number of deaths that winter in New York City believed to be due to this flu.

In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn; seaborn.set();,14), [14,28,50,66,156,190,156,108,68,77,33,65,24])
plt.xlabel('Week'); plt.ylabel('Excess deaths');
plt.xticks(range(1,14)); plt.title('1968-69 Influenza in New York City');

Notice how the data from Week 1 to Week 13 roughly fits into a bell-shaped curve. You have, by now, no doubt heard enough times that we all must do our part to flatten the curve. The bell-shaped curve, which has been identified in many disease progressions, is the curve we want to flatten. Some mathematical models of epidemic evolution, for instance the well-known "SIR model" discussed in [DM], produces such bell curves. Flattening the curve can then be interpreted as bringing relevant model parameters into a range that produces a shallow bell.

Mathematical models are often used as tools for prediction. However, we should be wary that models only approximate a few features of reality, and only when realistic parameter values (which are often missing) are supplied. Yet, as the saying goes, "All models are wrong, but some are useful." Even if a model is far away from the "truth", the "whole truth", it helps us understand the process being modeled by revealing the consequences of various hypotheses. Hence mathematical models are key instruments of computational thinking.

In this activity, we will study a mathematical model called the SEIR model of infectious disease progression. In the last few weeks, many researchers have been furiously working to fit the emerging COVID-19 data into variants of the SEIR model. A number of contributions can be viewed at the Bulletin of World Health Organization (WHO) which now maintains a special COVID-19 Open archive.

A number that emerges from models like the SIR or the SEIR model, called $R_0$, or the basic reproduction number often makes its appearance in popular science. It is even explained in a film from 2011 called the Contagion), which has now gained in popularity in view of its almost prescient plot. The epidemiological definition of $R_0$ is the average number of secondary cases produced by one infected individual introduced into a population entirely of susceptible individuals. One suspects from this definition that if $R_0>1$, then there will be an epidemic outbreak. We will see that this number also naturally emerges from a mathematical model. A quite readable review of $R_0$ (written before the COVID-19 pandemic) gives an $R_0$ of 14.5 for a measles outbreak in Ghana in the sixties. By all current accounts, the $R_0$ for COVID-19 appears to be between 2 and 3.

Construction of the SEIR model

The SEIR model divides the population into four categories, called "S", "E", "I", and "R".

  • Category "S" consists of individuals who are susceptible to the disease being modeled.
  • Category "E" consists of individuals who are exposed to the disease. Diseases (like COVID-19) often have an incubation period or a latency period and this category accommodates it. (The SIR model does not have this category.)
  • Category "I" consists of individuals infected with the disease and are capable of infecting others.
  • Category "R" consists of individuals who can be removed from the system, e.g., because they have gained immunity to the disease, or because they have succumbed to the disease.


The model then postulates rules on how populations in each category can move to other categories. Let us consider the following simple set of rules.

  • Assume that individuals move from S to E at the exposure rate $\lambda$, i.e., the population in category S decreases with respect to time $t$ at the rate $\lambda \times S$ and the population in E correspondingly increases at the same rate: $$ \begin{aligned} \frac{dS}{dt} & = -\lambda S + \cdots \\ \frac{dE}{dt} & = +\lambda S + \cdots \end{aligned} $$ where "$\cdots$" serves to remind us that there may be other unmodeled factors. In this discussion, the number of individuals in each category (S, E, etc.) is denoted in italic type by the same letter ($S, E$ etc.).

  • The exposure rate $\lambda$ should grow with $I$, the number of infected individuals. A standard hypothesis is that $\lambda$ is the product of the transmission rate (or the rate of contact) $\beta$ and the probability of infection given that contact occurred, which is $I / N$ in a total population of $N$ individuals, i.e.,

$$ \lambda = \frac{\beta \, I}{N}. $$
  • The incubation rate $\sigma$ is the rate at which exposed hosts become infected, i.e.,
\begin{alignat*}{4} \frac{dE}{dt} & = +\lambda S &- \sigma E + \cdots \\ \frac{dI}{dt} & = & + \sigma E + \cdots \end{alignat*}
  • The recovery rate $\gamma$ is the rate at which infected individuals move to the R category:
\begin{alignat*}{4} \frac{dI}{dt} & = + \sigma E & - \gamma I + \cdots \\ \frac{dR}{dt} & = & + \gamma I + \cdots \end{alignat*}

Collecting the above-derived equations (and omitting the unknown/unmodeled "$\cdots$"), we have the following basic SEIR model system:

$$ \begin{aligned} \frac{dS}{dt} & = -\frac{\beta \, I}{N} S, \\ \frac{dE}{dt} & = \frac{\beta \, I}{N} S - \sigma E, \\ \frac{dI}{dt} & = \sigma E - \gamma I \\ \frac{dR}{dt} & = \gamma I \end{aligned} $$

The three critical parameters in the model are $\beta, \sigma,$ and $\gamma$.

Note that we have left several features unmodeled: exposed individuals in "E" might contribute to $\lambda$ to spread the infection; some exposed individuals in "E" might move directly to the "R" category; some infected individuals in category "I" might not gain perfect immunity and so may move back to susceptible category "S." Despite these limitations, even this basic SEIR model can provide some useful insights on the disease evolution.

Initial value problem

A system of ordinary differential equations (ODE) like the above, together with some initial conditions (values of the variables of the model at initial starting time, say $t=0$), make up an initial value problem, or IVP. IVPs are ubiquitous in modeling systems that evolve in time. They encapsulate how a future state of a system is determined by the present state (the initial data) plus certain rules on how quantities evolve (the ODEs).

Before we talk about a python module to numerically solve an IVP, let us make a simplification. The total population $N$ in the system (the sum of individuals in all categories) is likely to be a huge number. Instead of working with such large numbers, let us divide each side of each equation by $N$ and work instead with the proportions $$ s = \frac{S}{N}, \quad e = \frac{E}{N}, \quad i = \frac{I}{N}, \quad r =\frac{R}{N}. $$

The equivalent ODE system to be solved for the unknown functions $s(t), e(t), i(t),$ and $r(t)$, has now become $$ \begin{aligned} \frac{ds}{dt} & = -\beta \, i \, s , \\ \frac{de}{dt} & = \beta \, i \, s - \sigma \,e, \\ \frac{di}{dt} & = \sigma \, e - \gamma\, i \\ \frac{dr}{dt} & = \gamma \, i. \end{aligned} $$ When supplemented with some initial conditions, say $$ s(0) = 0.99, \quad e(0) = 0.01, \quad i(0) = 0, \quad r(0) = 0, $$ we have completed our formulation of the IVP to be solved. Note that the above initial conditions correspond to a starting scenario where just 1% of the population is exposed.

Solving the IVP using scipy module

In [2]:
from scipy.integrate import solve_ivp
import numpy as np

Scipy's integrate module provides a solve_ivp facility for solving IVPs like the above. The facility assumes you have an IVP of the form $$ \begin{aligned} \frac{d\vec{Y}}{dt} &= \vec{f} (t, \vec{Y}), && \qquad t_0 \le t \le t_1, \\ \vec{Y}(t_0) & = \vec{Y}_0, && \qquad t = t_0, \end{aligned} $$ where you know the function $\vec{f}: [t_0, t_1] \times \mathbb{R}^n \to \mathbb{R}^n$ and the initial data $\vec{Y}_0$. It can then compute an approximation of the solution $\vec{Y}(t)$ for $t$ in the interval $[t_0, t_1]$ using numerical ODE solvers that you might learn about if you take a numerical analysis course. Type in help(solve_ivp) into a cell to get more information on how to use this function.

Let us apply this to the SEIR model. To fit to the setting required for solve_ivp, we put $$ \vec Y = \begin{bmatrix} s \\ e \\ i \\ r \end{bmatrix} $$ and $$ \vec{f} (t, \vec{Y}) = \begin{bmatrix} -\beta \, i \, s , \\ \beta \, i \, s - \sigma \,e, \\ \sigma \, e - \gamma\, i \\ \gamma \, i \end{bmatrix}. $$ We have to give this $\vec f$ as a function argument to solve_ivp. Let's first define this $\vec f$, called seir_f in the code below, keeping in mind that we also need to provide some values of $\beta, \sigma,$ and $\gamma$ before we can solve it. We pass these values as additional arguments to seir_f.

In [3]:
def seir_f(t, y, beta, sigma, gamma):
    s, e, i, r = y
    return np.array([-beta * i * s,
                     -sigma * e + beta * i * s, 
                     -gamma * i + sigma * e, 
                     gamma * i])
In [4]:
# try some parameter values
beta = 1
sigma = 1
gamma = 0.1

Following the documentation from help(solve_ivp) we now proceed to solve by calling solve_ivp as follows.

In [5]:
sol = solve_ivp(seir_f, [0, 60], [0.99, 0.01, 0, 0], 
                rtol=1e-6, args=(beta, sigma, gamma))

Examining the resulting solution object sol you will notice that it has a numpy array as its data member sol.y containing the values of the computed solution $\vec Y (t)$ at values of t contained in another data member sol.t. We can easily send these arrays to the matplotlib module to get a plot of the solution.

In [6]:
fig = plt.figure(); ax = fig.gca()
curves = ax.plot(sol.t, sol.y.T)
ax.legend(curves, ['S', 'E', 'I', 'R']);

As you can see, even with 1% exposed population, the number of infections rapidly rise. However, with more time, they begin to fall, making for a bell-shaped curve, like the one from the previously mentioned New York City data.

Parameter study

Having a function to compute and plot $E$ and $I$ together makes it easy to study the variations in solutions with respect to the three parameters. Let's make such a function by putting together the previous steps.

In [7]:
def plot_ei(beta=1, sigma=1, gamma=0.1, s0=0.99,
            e0=0.01, i0=0, r0=0, t1=60):
    # apply ODE solver 
    sol = solve_ivp(seir_f, [0, t1], [s0, e0, i0, r0], rtol=1e-7,
                    args=(beta, sigma, gamma))    
    # plot I and E components 
    fig = plt.figure(); ax = fig.gca()
    ax.plot(sol.t, sol.y[1, :].T, color='brown',
            linestyle='dashed', label='Exposed')
    ax.plot(sol.t, sol.y[2, :].T, color='red', label='Infected')
    return ax
In [8]:
plot_ei();   # baseline with the default  parameters above
In [9]:
plot_ei(beta=0.5);    # what happens if beta is reduced?  
In [10]:
plot_ei(gamma=0.5);   # what happens if gamma is increased?  
In [11]:
plot_ei(sigma=0.1);   # what's the effect of sigma? 


In the study of evolution of dynamical systems like the SEIR model, equilibria play an important role. An equilibrium state is a value of the vector $\vec{Y}$ (i.e., values of $s, e, i,$ and $r$) for which the rate of change $d\vec{Y}/dt = 0$, i.e., if the system happens to enter an exact equilibrium, then it no longer changes.

For our SEIR system, an equilibrium state $s, e, i, r$ satisfies $$ \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} -\beta \, i \, s , \\ \beta \, i \, s - \sigma \,e, \\ \sigma \, e - \gamma\, i \\ \gamma \, i \end{bmatrix}. $$ You should be able to conclude (exercise!) that the only solutions for this system are of the form $$ s \equiv constant, \quad e = i = 0, \qquad r \equiv constant. $$ In other words, since $e=i=0$, all equilibria of our model are disease-free equilibria. This matches our previous observations from our simulations. After a transitional phase, where $i$ and $e$ increases and decreases per the bell-curve, the system settles into an equilibrium of the form above.

There are other scenarios where an infection persists and never quite disappears from the population. Such equilibria where the disease is endemic are sometimes called endemic equilibria.

As an example, suppose our model represents a city's population, and suppose travel into and out of the city is allowed. Then we must add terms that represent the influx of travelers in each category (the number of people entering minus the number of people leaving). Even if we assume that infected people do not travel, a small influx into susceptible category S and exposed category E will disturb the disease-free equilibrium of our model. Let us add terms a and b representing these influxes and see what happens.

In [12]:
def seir_f2(t, y, beta, sigma, gamma, a, b):
    s, e, i, r = y
    return np.array([-beta * i * s + a,
                     -sigma * e + beta * i * s  + b, 
                     -gamma * i + sigma * e, 
                     gamma * i - (a + b)])

def plot_ei2(beta=1, sigma=1, gamma=0.1, a=0.005, b=0.001, t1=150):
    sol = solve_ivp(seir_f2, [0, t1], [0.99, 0.01, 0, 0], rtol=1e-7,
                    args=(beta, sigma, gamma, a, b))
    fig = plt.figure(); ax = fig.gca()
    ax.plot(sol.t, sol.y[1, :].T, color='brown', linestyle='dashed', label='Exposed')
    ax.plot(sol.t, sol.y[2, :].T, color='red', label='Infected')
In [13]:
plot_ei2(a=0.005, b=0.001)

As you can see from this output, the percentage of the population with the disease now remains at around 5% and never quite vanishes, an example of an endemic equilibrium.

The emergence of $R_0$

The stability of equilibria is another important consideration in the study of dynamical systems. Loosely speaking, an equilibrium is considered stable if a solution, when perturbed from the equilibrium, moves back to it over time. Returning to our simple model $$ \frac{d}{dt} \begin{bmatrix} s \\ e \\ i \\ r \end{bmatrix} = \begin{bmatrix} -\beta \, i \, s , \\ \beta \, i \, s - \sigma \,e, \\ \sigma \, e - \gamma\, i \\ \gamma \, i \end{bmatrix} $$ suppose we want to guess the stability of one of the previously discussed disease-free equilibrium states, $$ s = s_0, \quad e = i = 0, \quad r = r_0. $$ where $s_0$ and $r_0$ are some constants. Adding the $e$ and the $i$ equations, we observe that

$$ \begin{aligned} \frac{d (e + i)}{ d t} & = (\beta \, s - \gamma)\, i. \end{aligned} $$

Thus, despite a perturbation brought about by a small surge in the infected population (resulting in a small positive $i$ value), if the above derivative is negative, i.e., if $$ \beta \, s_0 - \gamma <0, $$ then, the value of $e+i$ will decrease to its equilibrium value. This simple argument already hints at the relevance of the number $$ R_0 = \frac{\beta}{\gamma}s_0, $$ which is the basic reproductive number for this model. In some texts, $R_0$ is defined (to match the epidemiological definition) using an initial population that is 100% susceptible, in which case $s_0=1$ and $R_0 = \beta / \gamma.$

The argument sketched above was not a proof that the system will return to the disease-free equilibrium, but rather a sketch to show you why $R_0$ naturally emerges from the model, using only the calculus tools you have already studied. (Nonetheless, one can indeed rigorously prove that if $R_0 < 1$, the disease-free equilibrium is stable, see e.g., [HSW].)

The effect of $R_0$: outbreak or no outbreak

The simple argument sketched above shows that if $R_0 = \beta s_0/ \gamma >1$ then $e+i$ will increase (at least for some time), while if $R_0 <1$, then $e+i$ will decrease. Let us return to the code and examine the results from the model to see if there is agreement.

Here is an example where $R_0 = \beta s_0/ \gamma < 1$.

In [14]:
plot_ei(beta=0.6, gamma=1, s0=0.9, i0=0.1);

Clearly, the infected population, despite a positive bump in infections, decays as seen below. In other words, when $R_0<1$ there is no outbreak.

Next, consider an example where $R_0 >1$.

In [15]:
plot_ei(beta=1, gamma=0.5, s0=0.9, i0=0.1);

This output clearly shows that the small percentage of introduced infections rapidly increase. (If you worry about the small initial dip in $i$ observed in the output, do please try to plot $e+i$ to convince yourselves.) The system eventually goes on to attain (the unique) disease-free equilibrium, but only after inflicting some damage. Summarizing, when $R_0 >1,$ we can expect an epidemic outbreak.

Regarding the impact of vaccinations (provided a vaccine exists) the model does have something to say. If a fraction, say $v$, of the population is vaccinated, then that population moves directly from the S category to the R category. Therefore, changing $s_0$ to $s_0(1-v)$, we see that $R_0$ reduces to $R_0 = \beta s_0 (1- v) / \gamma$. A vaccine would therefore be effective to prevent an epidemic outbreak if enough people are vaccinated, i.e., if $v$ is sufficiently large in order to bring $R_0$ under 1.

Application to COVID-19

There are a number of difficulties in applying the SEIR model to the COVID-19 epidemic we are now facing. One difficulty is in applying it to a single country: we would have to carefully develop terms that model inflow due to travel to or from the country. Of course, this problem disappears if we consider the entire world as our system. But other problems remain. As you must have heard in the news, we now suspect that exposed individuals in E category, who are not symptomatic, might be spreading the infection (i.e., $\lambda$ might depend not only on $I$, but also on $E$). Our model does not take this into account. Although we can easily add terms to model this, without accurate testing of both symptomatic and asymptomatic populations, it is impossible to conclude the required parameter values. Notwithstanding these (significant) limitations, we can forge ahead to see what a simulation would give us, provided we can gather some data on the remaining parameter values.

A recent submission to the Bulletin of WHO uses an $R_0$ of $2.2$, which was reported in an earlier paper, published in January 2020, in the New England Journal of Medicine. Other reported values now found in the internet seem to be higher. (Inexact parameter values are indeed one of the difficulties in dealing with real-word problems.) Nonetheless, let us continue with $R_0 = \beta/\gamma = 2.2$. Let us also use the values of $\sigma$ and $\gamma$ that others have used:

  • $\sigma = 1/5.2$ $\text{days}^{-1}$,
  • $\gamma = 1/2.3$ $\text{days}^{-1}$,
  • $\beta = R_0 \gamma = 2.2 \gamma.$

Finally, let us additionally assume a scenario where 0.02% of the world's population is infected at the start of the simulation, and thrice that many are exposed. (The current number of active infections worldwide appear to be around 0.02% of the world's population.) Here are the outputs of the simulation under these values.

In [16]:
ax = plot_ei(beta=2.2/2.3, sigma=1/5.2, gamma=1/2.3,
             i0=0.02/100, e0=3*0.02/100, t1=100)
ax.set_xlabel('days'); ax.set_ylabel('population fraction');

These output curves seem to suggest that the infection will proceed well into the next two months before subsiding.

The social distancing and other governmental measures that we are now practicing can be viewed from the perspective of this simple SEIR model. They are designed to reduce the transmission rate $\beta$. Please go ahead and experiment to see what you get with lower $\beta$ values that you can imagine.

You will see that lowering $\beta$ by a little has two effects:

  • it reduces the peak value of the curves (multiply the percentage value by the world's population $\approx$ 7.5 billion, to see the effect in terms of the reduction in number of people affected), and

  • it moves the location of the infection peak farther out in time (i.e., the infection persists longer but in lower numbers).

On the other hand, lowering $\beta$ by a lot (enough to make $R_0<1$) will take you to a regime where $e+i$ decreases, indicating the other side of the peak, where we really want the world to be as soon as possible.