Short-term forecasts of COVID-19 deaths in multiple countries

Introduction

As of 30th May 2022, more than 526,100,000 cases of COVID-19 (including potential re-infections) have been reported across the world, with more than 6,286,000 deaths (1).

This weekly report presents forecasts of the reported number of deaths in the week ahead for 35 countries with active transmission.

The accuracy of these forecasts vary with the quality of surveillance and reporting in each country. We use the reported number of deaths due to COVID-19 to make these short-term forecasts as these are likely more reliable and stable over time than reported cases. In countries with poor reporting of deaths, these forecasts will likely represent an under-estimate while the forecasts for countries with few deaths might be unreliable. If there are reporting delays, prominent outliers, or infrequent reporting, then we exclude those countries from the analysis.

Note that the results presented in this report do not explicitly model the various interventions and control efforts put in place by countries. Our estimates of transmissibility reflect the epidemiological situation at the time of the infection of COVID-19 fatalities. Therefore, the impact of controls on estimated transmissibility will be quantifiable with a delay between transmission and death.

For short-term forecasts in low-and-middle-income countries using models explicitly accounting for interventions, see here.

Figure 1. (A) The reported number of deaths due to COVID-19 in Africa, Asia, Europe, North & Central America, Oceania, and South America. (B) The number of countries with active transmission (at least 100 deaths reported, and at least ten deaths observed in each of the past two weeks) in Africa, Asia, Europe, North & Central America, Oceania, and South America.

Objectives and Caveats

The main objective in this report is to produce forecasts of the number of deaths in the week ahead for each country with active transmission.

  • We define a country as having active transmission if at least 100 deaths have been reported in a country so far, and at least ten deaths were observed in the country in each of the past two weeks. For the week starting 30th May 2022, the number of countries/regions included based on these thresholds is 35.

  • We forecast the number of potential deaths as the reporting of deaths is likely to be more reliable and stable over time than the reporting of cases.

  • As we are forecasting deaths, the latest estimates of transmissibility reflect the epidemiological situation at the time of the infection of COVID-19 fatalities. Therefore, the impact of controls on estimated transmissibility will be quantifiable with a delay between transmission and death.

Methods

We define a country to have active transmission if

  • at least 100 deaths have been observed in the country so far; and
  • at least ten deaths were observed in the country in the last two consecutive weeks.

We intend to produce forecasts every week, for the week ahead. Ensemble forecasts are produced from the outputs of three different models. We assume a gamma distributed serial interval with mean 3.3 days and standard deviation of 3.5 days following (2).

Model 1

The approach estimates the current reproduction number (the average number of secondary cases generated by a typical infected individual, \(R_t\)) and to use that to forecast future incidence of death. The current reproduction number is estimated assuming constant transmissibility during a chosen time-window (here, one week).

Estimating current transmissibility

Here we relied on a well-established and simple method (3) that assumed the daily incidence, It (here representing deaths), could be approximated with a Poisson process following the renewal equation (4):

\[I_t \sim Pois\left( R_t \sum_{s=0}^tI_{t-s}w_s\right)\]

where \(R_t\) is the instantaneous reproduction number and \(w\) is the serial interval distribution. From this a likelihood of the data given a set of model parameters can be calculated, as well the posterior distribution of \(R_t\) given previous observations of incidence and knowledge of the serial interval (5).

We used this approach to estimate \(R_t\) over three alternative time-windows defined by assuming a constant \(R_t\) for 10 days prior to the most recent data-point. We made no assumptions regarding the epidemiological situation and transmissibility prior to each time-window. Therefore, no data prior to the time-window were used to estimate \(R_t\), and instead we jointly estimated \(R_t\) as well as back-calculated the incidence before the time-window. Specifically, we jointly estimated the \(R_t\) and the incidence level 100 days before the time-widow. Past incidence was then calculated using the known relationship between the serial interval, growth rate and reproduction number. The joint posterior distribution of \(R_t\) and the early epidemic curve (from which forecasts will be generated) were inferred using Markov Chain Monte Carlo (MCMC) sampling.

The model has the advantage of being robust to changes in reporting before the time-window used for inference.

Forward projections

We used the renewal equation (4) to project the incidence forward, given a back-calculated early incidence curve, an estimated reproduction number, and the observed incidence over the calibration period. We sampled sets of back-calculated early incidence curves and reproduction numbers from the posterior distribution obtained in the estimation process. For each of these sets, we simulated stochastic realisations of the renewal equation from the end of the calibration period leading to projected incidence trajectories.

Projections were made on a 7-day horizon. The transmissibility is assumed to remain constant over this time period. If transmissibility were to decrease as a result of control interventions and/or changes in behaviour over this time period, we would predict fewer deaths; similarly, if transmissibility were to increase over this time period, we would predict more deaths We limited our projection to 7 days only as assuming constant transmissibility over longer time horizons seemed unrealistic in light of the different interventions implemented by different countries and potential voluntary behaviour changes.

Model 2

Estimating current transmissibility

The standard approach to inferring the effective reproduction number at \(t\), \(R_t\), from an incidence curve (with cases at t denoted It) is provided by (5). This method assumes that \(R_t\) is constant over a window back in time of size k units (e.g. days or weeks) and uses the part of the incidence curve contained in this window to estimate \(R_t\). However, estimates of \(R_t\) can depend strongly on the width of the time-window used for estimation. Thus mis-specified time-windows can bias our inference. In (6) we use information theory to extend the approach of Cori et al. to optimise the choice of the time-window and refine estimates of \(R_t\). Specifically:

  • We integrate over the entire posterior distribution of \(R_t\), to obtain the posterior predictive distribution of incidence at time t+1 as P(It+1 | I1t) with I1t as the incidence curve up to t. For a gamma posterior distribution over \(R_t\) this is analytic and negative binomial ((6) for exact formulae).

  • We compute this distribution sequentially and causally across the existing incidence curve and then evaluate every observed case-count according to this posterior predictive distribution. For example at t = 5, we pick the true incidence value I5* and evaluate the probability of seeing this value under the predictive distribution i.e. P(I5 = I5* | I14).

This allows us to construct the accumulated predictive error (APE) under some window length k and under a given generation time distribution as:

\[\text{APE}_{k} = \sum_{t = 0}^{T - 1}{- \log{P\left( I_{t + 1} = I_{t + 1}^{*}\ \right|\ I_{t - k + 1}^{t})\ \ }}\]

The optimal window length k* is then \(k^{*} = \arg{\min_{k}{\text{APE}_{k}}}\). Here T is the last time point in the existing incidence curve.

Forward Projections

Forward projections are made assuming that the transmissibility remains unchanged over the projection horizon and same as the transmissibility in the last time-window. The projections are made using the standard branching process model using a Poisson offspring distribution.

Model 3

Objectives

  • Estimate trends in case ascertainment and the ratio of deaths to reported cases.
  • Use these to forecast the number of deaths in the coming week.

Assumptions

We assume

  • that deaths due to COVID-19 are perfectly reported;
  • a known distribution for delay from report to death (gamma distribution with mean 10 days and standard deviation 2 days).

Let \(D_{i, t}\) be the number of deaths in location \(i\) at time \(t\). Let \(I_{i, t}^r\) be the reported number of cases in location \(i\) at time \(t\) and \(I_{i, t}^{true}\) be the true number of cases. We assume that the reporting to death delay \(\delta\) is distributed according to a gamma distribution with mean \(\mu\) and standard deviation \(\sigma\). That is,

\[\delta \sim \Gamma(\mu, \sigma).\]

Let \(r_{i, t}\) be the ratio of deaths to reported cases in location \(i\) at time \(t\). We assume that deaths are distributed according to a Binomial distribution thus: \[ D_{i, t} \sim Binom\left( \int\limits_0^{\infty}{\Gamma(x \mid \mu, \sigma)I_{i, t - x}^{r}dx} , r_{i, \mu}\right). \]

This allows us to obtain a posterior distribution for \(r_{i, t}\).

To obtain forecast of deaths, we rely on reported cases to obtain \(\int\limits_0^{\infty}{\Gamma(x \mid \mu,\sigma)I_{i, t - x}^{r}dx}\). As cases reported in the coming week may die within the same week ( i.e. for \(x \in \{0,7\}\), \(\Gamma(x \mid \mu,\sigma) > 0\)), we estimate new reporting cases in the coming week by sampling from a Gamma distribution with mean and standard deviation estimated from the number of observed cases in the last week.

While this assumes no growth or decline in the coming week, this baseline assumption is justifiable as a null-hypothesis scenario as it does not influence our results dramatically given the contribution to deaths due to those being very small (i.e. less than 2%).

We therefore obtain the forecasted number of deaths as: \[ D_{i, t} \sim Binom\left( \int\limits_0^{\infty}{\Gamma(x \mid \mu, \sigma)I_{i, t - x}^{r}dx} , r_{i, \mu}\right). \]

where \(r_{i, \mu}\) is the estimated ratio of deaths to reported cases for the last week of data, and \(\int\limits_0^{\infty}{\Gamma(x \mid \mu,\sigma)I_{i, t - x}^{r}dx}\) relies on observed reported cases up to the last day with available and estimated reported cases as described above.

Ensemble Model

For all countries included in the analysis, the ensemble model is built from Models 1, 2 and 3.

Individual Model Outputs

Projections

Europe

Figure 15. Projections (7-day ahead) for the week starting 30th May 2022 from individual models for each country in Europe with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.

Asia

Figure 16. Projections (7-day ahead) for the week starting 30th May 2022 from individual models (Models 1, 2 and 3) for each country in Asia with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.

Africa

Figure 17. Projections (7-day ahead) for the week starting 30th May 2022 from individual models (Models 1, 2 and 3) for each country in Africa with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.

North & Central America

Figure 18. Projections (7-day ahead) for the week starting 30th May 2022 from individual models (Models 1, 2 and 3) for each country in North & Central America with active transmission (see Methods). Model 4 did not include these countries. For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.

Oceania

Figure 19. Projections (7-day ahead) for the week starting 30th May 2022 from individual models (Models 1, 2 and 3) for each country in Oceania with active transmission (see Methods). Model 4 did not include these countries. For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.

South America

Figure 20. Projections (7-day ahead) for the week starting 30th May 2022 from individual models (Models 1, 2 and 3) for each country in South America with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.

Effective Reproduction Number

Europe

Figure 21. Estimates of \(R_t\) from individual models for each country in Europe with active transmission (see Methods) for the week starting 30th May 2022.

Asia

Figure 22. Estimates of \(R_t\) from individual models for each country in Asia with active transmission (see Methods) for the week starting 30th May 2022.

Africa

Figure 23. Estimates of \(R_t\) from individual models for each country in Africa with active transmission (see Methods) for the week starting 30th May 2022.

North & Central America

Figure 24. Estimates of \(R_t\) from individual models for each country in North & Central America with active transmission (see Methods) for the week starting 30th May 2022.

Oceania

Figure 25. Estimates of \(R_t\) from individual models for each country in Oceania with active transmission (see Methods) for the week starting 30th May 2022.

South America

Figure 26. Estimates of \(R_t\) from individual models for each country in South America with active transmission (see Methods) for the week starting 30th May 2022.

Authors

This is an official product of the Imperial College COVID-19 response team: the WHO Collaborating Centre for Infectious Disease Modelling within the MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), Imperial College London.

Sangeeta Bhatia, Jack Wardle, Rebecca K Nash, Anne Cori, Kris V Parag, Swapnil Mishra, Laura V Cooper, Kylie E C Ainslie, Marc Baguelin, Samir Bhatt, Adhiratha Boonyasiri, Olivia Boyd, Lorenzo Cattarino, Zulma Cucunubá, Gina Cuomo-Dannenburg, Amy Dighe, Ilaria Dorigatti, Sabine van Elsland, Rich FitzJohn, Han Fu, Katy Gaythorpe, Will Green, Arran Hamlet, David Haw, Sarah Hayes, Wes Hinsley, Natsuko Imai, David Jorgensen, Edward Knock, Daniel Laydon, Gemma Nedjati-Gilani, Lucy C Okell, Steven Riley, Hayley Thompson, Juliette Unwin, Robert Verity, Michaela Vollmer, Caroline Walters, Hao Wei Wang, Patrick GT Walker, Oliver Watson, Charles Whittaker, Yuanrong Wang, Peter Winskill, Xiaoyue Xi, Azra C Ghani, Christl A Donnelly, Neil M Ferguson, Pierre Nouvellet

References

The forecasts produced use the reported daily counts of deaths per country available on the WHO website: https://covid19.who.int

Notes

  • Some countries have been excluded from the analysis despite meeting the threshold because the number of deaths per day did not allow reliable inference.

  • We have excluded US states from the analysis because a majority of the states are not reporting data on a daily basis.

1. WHO Coronavirus Disease (COVID-19) Dashboard. 2022

2. Abbott S, Sherratt K, Gerstung M, Funk S. 2022. Estimation of the test to test distribution as a proxy for generation interval distribution for the Omicron variant in England. medRxiv

3. Nouvellet P, Cori A, Garske T, Blake IM, Dorigatti I, et al. 2018. A simple approach to measure transmissibility and forecast incidence. Epidemics. 22:29–35

4. Fraser C. 2007. Estimating individual and household reproduction numbers in an emerging epidemic. PloS One. 2(8):

5. Cori A, Ferguson NM, Fraser C, Cauchemez S. 2013. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology. 178(9):1505–12

6. Parag KV, Donnelly CA. 2020. Using information theory to optimise epidemic models for real-time prediction and estimation. PLOS Computational Biology. 16(7):e1007990