Worked Example 3 - Generating Epidemiological Data Sets
BWorkedExampleCEpiDatasetGenerate.Rmd
This worked example demonstrates how to use the package to generate a set of epidemiological data (serological and/or annual reported severe/fatal cases) for specified years and regions (and age ranges, in the case of serological data).
This can be set up to either:
- Match a set of observed data supplied in appropriate formats
- Create a hypothetical set of data with no existing counterpart by matching to “dummy” data which presents dates, regions and/or age ranges in the appropriate formats
We first load the input data set, regional environmental covariate values, coefficients of environmental covariate values used to calculate spillover FOI and R0 values, and observed serological and annual case data (here generated by the model itself):
library(YEP)
input_data <- readRDS(file = paste(path.package("YEP"),
"/exdata/input_data_example.Rds", sep = ""))
enviro_data <- read.csv(file = paste(path.package("YEP"),
"/exdata/enviro_data_example.csv", sep = ""),
header = TRUE)
enviro_coeffs <- read.csv(file = paste(path.package("YEP"),
"/exdata/enviro_coeffs_example.csv",
sep = ""), header = TRUE)
# Seroprevalence data for comparison, by region, year & age group, in format
# no. samples/no. positives
sero_template <- read.csv(file = paste(path.package("YEP"),
"/exdata/sero_template_example.csv",
sep = ""), header = TRUE)
# Annual reported case/death data for comparison, by region and year, in format
# no. cases/no. deaths
case_template <- read.csv(file = paste(path.package("YEP"),
"/exdata/case_template_example.csv",
sep = ""), header = TRUE)
We calculate values of spillover FOI and R0 as described in Guide 2 - Calculating Parameters From Environmental Data:
FOI_values=epi_param_calc(coeffs_const = as.numeric(enviro_coeffs[c(1:5)]), coeffs_var = c(0), enviro_data_const = enviro_data, enviro_data_var = NULL)
R0_values=epi_param_calc(coeffs_const = as.numeric(enviro_coeffs[c(6:10)]), coeffs_var = c(0), enviro_data_const = enviro_data, enviro_data_var = NULL)
We set the non-region-specific parameters for the dataset generation:
vaccine_efficacy <- 1.0 # Vaccine efficacy
time_inc <- 1.0 # Time increment in days
mode_start <- 1 #
start_SEIRV <- NULL #
# Type of time variation of FOI_spillover and R0; here set to 0 for constant values
mode_time <- 0
n_reps <- 1 # Number of stochastic repetitions
# True/false flag indicating whether or not to run model in deterministic mode
# (so that binomial calculations give average instead of randomized output)
deterministic = FALSE
p_severe_inf = 0.12 # Probability of an infection causing severe symptoms
p_death_severe_inf = 0.39 # Probability of an infection with severe symptoms causing death
p_rep_severe = 0.1 # Probability of reporting of an infection with severe (non-fatal) symptoms
p_rep_death = 0.2 # Probability of reporting of a fatal infection
# Variable to set different modes for running on multiple processors simultaneously;
# here set to FALSE so that parallel processing is not used
mode_parallel=FALSE
# TBA
output_frame = TRUE
We run the Generate_Dataset() function to produce the dataset. This uses the approaches described in Guide 4 - Generating Epidemiological Data From Model Output to generate serological and/or case data.
set.seed(1)
dataset1 <- Generate_Dataset(FOI_values, R0_values, input_data, sero_template, case_template, vaccine_efficacy,
time_inc,mode_start, start_SEIRV, mode_time, n_reps, deterministic,
p_severe_inf, p_death_severe_inf, p_rep_severe, p_rep_death,
mode_parallel, cluster=FALSE,output_frame)
The output is a list containing data frames for serological and case data:
head(dataset1$model_sero_data,3)
head(dataset1$model_case_data,21)