Guide 4 - Generating Epidemiological Data From Model Output
CGuideDEpiDataGenerate.Rmd
This vignette demonstrates how the output of the dynamic yellow fever model functions (see Worked Example 1 - Single Model Run) can be processed to produce epidemiological data similar to that produced by real-world surveillance. Two types of data are discussed here:
-Serological survey data, the proportion of people tested in a chosen age range who test positive for yellow fever antibodies (indicating that they have either recovered from the disease or been vaccinated, with individuals known to have been vaccinated normally excluded from surveys - see below for more on this) [TBA - Link to example publications]. -Annual reported cases and deaths, representing the number of laboratory-confirmed yellow fever cases in a region in a given year and the number which resulted in death. These usually represent a fraction of the actual number of annual yellow fever infections and deaths in a region (in particular, it is usually assumed that infections causing mild or no symptoms are never reported as non-fatal cases cases).
A set of epidemiological data for multiple regions can be constructed using additional functions described in Worked Example 2 - Generating Epidemiological Data Sets; this combines running the model as described in Worked Example 1, calculating epidemiological output data as described in this vignette, and averaging the epidemiological data across multiple stochastic repetitions where relevant.
Data is generated as described in Worked Example 1:
library(YEP)
input_data <- readRDS(file=paste(path.package("YEP"),
"/exdata/input_data_example.Rds",sep=""))
model_data <- Model_Run(1.0e-7, 0.0, input_data$vacc_data[1,,], input_data$pop_data[1,,], c(1990:2000),
NULL, "full", input_data$years_labels[1], 1, 1.0, 1.0, 1, 1, FALSE)
The seroprevalence in a given age range in one or more years for one particle may be calculated from model data using the sero_calculate() function as shown below (in this case, for 0-15 year olds from 1990-2000). Seroprevalence for a given year calculated in this way is a year-round average. The variable vc_factor is used when creating modelled seroprevalence data to compare with real-world surveys, to account for the fact that in real-world seroprevalence surveys, while normally only unvaccinated individuals are tested, some surveyed individuals’ vaccination status may be unknown. If vc_factor = 0 (i.e. all individuals tested are confirmed unvaccinated), seroprevalence is calculated only for the unvaccinated population. If vc_factor > 0, seroprevalence is calculated for an appropriate mixture of the unvaccinated population (where seroprevalence = R/(S+E+I+R)) and the overall population (where seroprevalence = (R+V)/(S+E+I+R+V)).
# Minimum age in age range to survey
age_min <- 0
# Maximum age in age range to survey
age_max <- 15
# Year for which to calculate seroprevalence
year <- 1990
# Fraction of the population (from 0 to 1) for whom vaccination status is unknown
vc_factor <- 0
# Particle selected from model output data
n_p <- 1
n_t=which(model_data$year %in% year)
ages=c((age_min+1):age_max)
samples=model_data$S[ages,n_p,n_t]+model_data$E[ages,n_p,n_t]+model_data$I[ages,n_p,n_t]+model_data$R[ages,n_p,n_t]
positives=model_data$R[ages,n_p,n_t]
sero_value=sum(positives)/sum(samples)
Seroprevalence calculation is automated using the sero_calculate() function:
years <- c(1990:2000)
seroprevalence <- sero_calculate(age_min, age_max, years, vc_factor, model_data, n_p)
matplot(x=years,y=seroprevalence,type="p",pch=1,col=1,xlab="Year",ylab="Seroprevalence")
A separate function, sero_calculate2(), can be used to generate numbers of samples and positive tests based on a template. The number of samples and positives is based on population. [TBC?]
template=data.frame(year=rep(1990,3),age_min=c(0,16,41),age_max=c(15,40,100),vc_factor=rep(0,3))
sero_data_pn <- sero_calculate2(template, model_data, n_p = 1)
sero_data_pn
Annual reported severe and fatal case data is generated from C, the output data on the number of new infectious individuals at each time point. The actual number of severe and fatal cases is calculated via binomial functions using the variables p_severe_inf (the probability of an infection leading to severe symptoms) and p_death_severe_inf (the probability that a severe infection with lead to death). The number of reported severe and fatal cases is then similarly calculated binomially using the variables p_rep_severe (probability of a severe case being reported in public health figures, usually meaning lab-confirmed cases) and p_rep_death (probability of a fatal case being reported). The total number of reported cases (rep_cases in the output data frame annual_data below) is the sum of reported severe cases and reported fatal cases (so it will always be equal to or greater than the number of reported deaths).
p_severe_inf=0.12
p_death_severe_inf=0.39
p_rep_severe=0.05
p_rep_death=0.1
n_years=length(years)
annual_data=data.frame(rep_cases=rep(NA,n_years),rep_deaths=rep(NA,n_years))
set.seed(1)
for(n_year in 1:n_years){
infs=sum(model_data$C[,1,model_data$year==years[n_year]])
severe_infs=rbinom(1,floor(infs),p_severe_inf)
deaths=rbinom(1,severe_infs,p_death_severe_inf)
annual_data$rep_deaths[n_year]=rbinom(1,deaths,p_rep_death)
annual_data$rep_cases[n_year]=annual_data$rep_deaths[n_year] + rbinom(1,severe_infs-deaths,p_rep_severe)
}
matplot(x=years,y=annual_data$rep_cases,type="p",pch=1,col=1,xlab="Year",ylab="Individuals")
matplot(x=years,y=annual_data$rep_deaths,type="p",pch=2,col=2,add=TRUE)
legend("topright",legend=c("Reported cases","Reported deaths"),pch=c(1,2),col=c(1,2))
Case data calculation can similarly be automated with the case_data_calculate() function:
set.seed(1)
annual_data=case_data_calculate(model_data, 1, p_severe_inf, p_death_severe_inf, p_rep_severe, p_rep_death,
output_type = "annual", deterministic = FALSE)