How to update the database with new article data

We provide the ability for people to update the database with new data as it becomes available. New data will be represented as a new row in the article data set, and potentially multiple new rows in the other three data sets (outbreaks, parameters and models). We suggest the below process to add new data, however the only requirement is the submission of a pull request, with the updated csv files together with details of the peer-reviewed paper which is being added. The paper must conform with the inclusion and exclusion criteria of the underlying systematic review.

Step 1 – git clone repo

Please clone the git repository git clone https://github.com/mrc-ide/epireview.git and create a new branch to add data for the paper(s).

Step 2 – add article details

Please provide all available data for the article. For the quality assessment, we refer to the wiki on the epireview github repo, which provides details for how to assess the quality of papers (the score should be 0, 1 or NA if not applicable).

# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_article_entry.R')
source('R/append_new_entry_to_table.R')
source('R/load_epidata.R')

new_article <- 
  create_new_article_entry(
    pathogen = "marburg",
    new_article = c(list("first_author_first_name" = as.character("Joe")),
                    list("first_author_surname"    = as.character("Blocks")),
                    list("article_title"           = as.character("hello")),
                    list("doi"                     = as.character(NA)),
                    list("journal"                 = as.character("ABC")),
                    list("year_publication"        = as.integer(2000)),
                    list("volume"                  = as.integer(NA)),
                    list("issue"                   = as.integer(NA)),
                    list("page_first"              = as.integer(NA)),
                    list("page_last"               = as.integer(NA)),
                    list("paper_copy_only"         = as.logical(NA)),
                    list("notes"                   = as.character(NA)),
                    list("qa_m1"                   = as.integer(1)),
                    list("qa_m2"                   = as.integer(0)),
                    list("qa_a3"                   = as.integer(NA)),
                    list("qa_a4"                   = as.integer(1)),
                    list("qa_d5"                   = as.integer(0)),
                    list("qa_d6"                   = as.integer(NA)),
                    list("qa_d7"                   = as.integer(1))),
    vignette_prepend = "")

This creates a new entry for this article. We enforce that the first author’s first name (or initials), first author’s surname, article title, journal, and year of publication, needs to be provided. Please take note of the article_id and covidence_id as you will need these to link any parameters, outbreaks and models to this paper. This can be appended to the article dataset by running the following command (which includes validations for the data provided):

new_article_dataset <- append_new_entry_to_table(pathogen = "marburg",
                                                 table_type = "article",
                                                 new_row = new_article,
                                                 validate = TRUE,
                                                 write_table = TRUE,
                                                 vignette_prepend = "")

Step 3 – add model details

If the paper does not include a model you can skip this section. For the model table we have a set number of field options which can be accepted. These field options can be viewed in the Database field options vignette. The validations in the function will assert that valid options have been used.

# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_model_entry.R')

new_model <- create_new_model_entry(
  pathogen = "marburg",
  new_model = c(list("article_id"           = as.integer(1)),
                list("model_type"           = as.character("Compartmental")),
                list("compartmental_type"   = as.character("SEIR, SIR")),
                list("stoch_deter"          = as.character("Deterministic")),
                list("theoretical_model"    = as.logical(FALSE)),
                list("interventions_type"   = as.character("Vaccination")),
                list("code_available"       = as.logical(TRUE)),
                list("transmission_route"   = as.character("Sexual")),
                list("assumptions"          = as.character("Unspecified")),
                list("covidence_id"         = as.integer(2059))),
  vignette_prepend = "")

If you would like to add multiple models, it is important to add each model sequentially in order to ensure that the model_data_id is set correctly.

new_model_dataset <- append_new_entry_to_table(pathogen = "marburg",
                                               table_type = "model",
                                               new_row = new_model,
                                               validate = TRUE,
                                               write_table = TRUE,
                                               vignette_prepend = "")

Step 4 – add outbreak details

If the paper does not include an outbreak you can skip this section. For this table we have a set number of field options which can be accepted. These field options can be viewed in the Database field options vignette. The validations in the function will assert that valid options have been used.

# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_outbreak_entry.R')

new_outbreak <- create_new_outbreak_entry(
  pathogen = "marburg",
  new_outbreak = c(list("article_id"                = as.integer(1)),
                   list("outbreak_start_day"        = as.integer(NA)),
                   list("outbreak_start_month"      = as.character(NA)),
                   list("outbreak_start_year"       = as.integer(1999)),
                   list("outbreak_end_day"          = as.integer(NA)),
                   list("outbreak_end_month"        = as.character(NA)),
                   list("outbreak_date_year"        = as.integer(2001)),
                   list("outbreak_duration_months"  = as.integer(NA)),
                   list("outbreak_size"             = as.integer(2)),
                   list("asymptomatic_transmission" = as.integer(0)),
                   list("outbreak_country"          = as.character("Tanzania")),
                   list("outbreak_location"         = as.character(NA)),
                   list("cases_confirmed"           = as.integer(NA)),
                   list("cases_mode_detection"      = as.character(NA)),
                   list("cases_suspected"           = as.integer(NA)),
                   list("cases_asymptomatic"        = as.integer(NA)),
                   list("deaths"                    = as.integer(2)),
                   list("cases_severe_hospitalised" = as.integer(NA)),
                   list("covidence_id"              = as.integer(2059))),
  vignette_prepend = "")

If you would like to add multiple outbreaks, it is important to add each outbreak sequentially in order to ensure that the outbreak_id is set correctly.

new_outbreak_dataset <- append_new_entry_to_table(pathogen = "marburg",
                                                  table_type = "outbreak",
                                                  new_row = new_outbreak,
                                                  validate = TRUE,
                                                  write_table = TRUE,
                                                  vignette_prepend = "")

Step 5 – add parameter details

If the paper does not include parameters you can skip this section.

# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_parameter_entry.R')

new_parameter <- create_new_parameter_entry(
  pathogen = "marburg",
  new_param   = c(list("article_id"                    = as.integer(1)),
                  list("parameter_type"                = as.character(NA)),
                  list("parameter_value"               = as.double(NA)),
                  list("parameter_unit"                = as.character(NA)),
                  list("parameter_lower_bound"         = as.double(NA)),
                  list("parameter_upper_bound"         = as.double(NA)),
                  list("parameter_value_type"          = as.character(NA)),
                  list("parameter_uncertainty_single_value" = as.double(NA)),
                  list("parameter_uncertainty_singe_type"   = as.character(NA)),
                  list("parameter_uncertainty_lower_value"  = as.double(NA)),
                  list("parameter_uncertainty_upper_value"  = as.double(NA)),
                  list("parameter_uncertainty_type"         = as.character(NA)),
                  list("cfr_ifr_numerator"             = as.integer(NA)),
                  list("cfr_ifr_denominator"           = as.integer(NA)),
                  list("distribution_type"             = as.character(NA)),
                  list("distribution_par1_value"       = as.double(NA)),
                  list("distribution_par1_type"        = as.character(NA)),
                  list("distribution_par1_uncertainty" = as.logical(NA)),
                  list("distribution_par2_value"       = as.double(NA)),
                  list("distribution_par2_type"        = as.character(NA)),
                  list("distribution_par2_uncertainty" = as.logical(NA)),
                  list("method_from_supplement"        = as.logical(NA)),
                  list("method_moment_value"           = as.character(NA)),
                  list("cfr_ifr_method"                = as.character(NA)),
                  list("method_r"                      = as.character(NA)),
                  list("method_disaggregated_by"       = as.character(NA)),
                  list("method_disaggregated"          = as.logical(NA)),
                  list("method_disaggregated_only"     = as.logical(NA)),
                  list("riskfactor_outcome"            = as.character(NA)),
                  list("riskfactor_name"               = as.character(NA)),
                  list("riskfactor_occupation"         = as.character(NA)),
                  list("riskfactor_significant"        = as.character(NA)),
                  list("riskfactor_adjusted"           = as.character(NA)),
                  list("population_sex"                = as.character(NA)),
                  list("population_sample_type"        = as.character(NA)),
                  list("population_group"              = as.character(NA)),
                  list("population_age_min"            = as.integer(NA)),
                  list("population_age_max"            = as.integer(NA)),
                  list("population_sample_size"        = as.integer(NA)),
                  list("population_country"            = as.character(NA)),
                  list("population_location"           = as.character(NA)),
                  list("population_study_start_day"    = as.integer(NA)),
                  list("population_study_start_month"  = as.character(NA)),
                  list("population_study_start_year"   = as.integer(NA)),
                  list("population_study_end_day"      = as.integer(NA)),
                  list("population_study_end_month"    = as.character(NA)),
                  list("population_study_end_year"     = as.integer(NA)),
                  list("genome_site"                   = as.character(NA)),
                  list("genomic_sequence_available"    = as.logical(NA)),
                  list("parameter_class"               = as.character(NA)),
                  list("covidence_id"                  = as.integer(2059)),
                  list("Uncertainty"                   = as.character(NA)),
                  list("Survey year"                   = as.character(NA))),
  vignette_prepend = "")

If you would like to add multiple parameters, it is important to add each parameter sequentially in order to ensure that the parameter_data_id is set correctly.

new_parameter_dataset <- append_new_entry_to_table(pathogen = "marburg",
                                                   table_type = "parameter",
                                                   new_row = new_parameter,
                                                   validate = TRUE,
                                                   write_table = TRUE,
                                                   vignette_prepend = "")

Step 6 – create a pull request

Once you have completed the above steps, you can push the updated datasets to your branch of the epireview repository, and then submit a pull request. Please ensure that your pull request contains the following:

Details of the paper, including doi
Confirmation that you have checked that this paper satisfies the inclusion and exclusion criteria of the systematic review
The pathogen this update concerns and which tables you have updated
Any comments or observations that the reviewer may find helpful