How to update the database with new article data
Latest update: June 2023
Source:vignettes/pathogen_database_update.Rmd
pathogen_database_update.Rmd
We provide the ability for people to update the database with new data as it becomes available. New data will be represented as a new row in the article data set, and potentially multiple new rows in the other three data sets (outbreaks, parameters and models). We suggest the below process to add new data, however the only requirement is the submission of a pull request, with the updated csv files together with details of the peer-reviewed paper which is being added. The paper must conform with the inclusion and exclusion criteria of the underlying systematic review.
Step 1 – git clone repo
Please clone the git repository
git clone https://github.com/mrc-ide/epireview.git
and
create a new branch to add data for the paper(s).
Step 2 – add article details
Please provide all available data for the article. For the quality assessment, we refer to the wiki on the epireview github repo, which provides details for how to assess the quality of papers (the score should be 0, 1 or NA if not applicable).
# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_article_entry.R')
source('R/append_new_entry_to_table.R')
source('R/load_epidata.R')
new_article <-
create_new_article_entry(
pathogen = "marburg",
new_article = c(list("first_author_first_name" = as.character("Joe")),
list("first_author_surname" = as.character("Blocks")),
list("article_title" = as.character("hello")),
list("doi" = as.character(NA)),
list("journal" = as.character("ABC")),
list("year_publication" = as.integer(2000)),
list("volume" = as.integer(NA)),
list("issue" = as.integer(NA)),
list("page_first" = as.integer(NA)),
list("page_last" = as.integer(NA)),
list("paper_copy_only" = as.logical(NA)),
list("notes" = as.character(NA)),
list("qa_m1" = as.integer(1)),
list("qa_m2" = as.integer(0)),
list("qa_a3" = as.integer(NA)),
list("qa_a4" = as.integer(1)),
list("qa_d5" = as.integer(0)),
list("qa_d6" = as.integer(NA)),
list("qa_d7" = as.integer(1))),
vignette_prepend = "")
This creates a new entry for this article. We enforce that the first
author’s first name (or initials), first author’s surname, article
title, journal, and year of publication, needs to be provided. Please
take note of the article_id
and covidence_id
as you will need these to link any parameters, outbreaks and models to
this paper. This can be appended to the article dataset by running the
following command (which includes validations for the data
provided):
new_article_dataset <- append_new_entry_to_table(pathogen = "marburg",
table_type = "article",
new_row = new_article,
validate = TRUE,
write_table = TRUE,
vignette_prepend = "")
Step 3 – add model details
If the paper does not include a model you can skip this section. For
the model table we have a set number of field options which can be
accepted. These field options can be viewed in the
Database field options
vignette. The validations in the
function will assert that valid options have been used.
# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_model_entry.R')
new_model <- create_new_model_entry(
pathogen = "marburg",
new_model = c(list("article_id" = as.integer(1)),
list("model_type" = as.character("Compartmental")),
list("compartmental_type" = as.character("SEIR, SIR")),
list("stoch_deter" = as.character("Deterministic")),
list("theoretical_model" = as.logical(FALSE)),
list("interventions_type" = as.character("Vaccination")),
list("code_available" = as.logical(TRUE)),
list("transmission_route" = as.character("Sexual")),
list("assumptions" = as.character("Unspecified")),
list("covidence_id" = as.integer(2059))),
vignette_prepend = "")
If you would like to add multiple models, it is important to add each
model sequentially in order to ensure that the
model_data_id
is set correctly.
new_model_dataset <- append_new_entry_to_table(pathogen = "marburg",
table_type = "model",
new_row = new_model,
validate = TRUE,
write_table = TRUE,
vignette_prepend = "")
Step 4 – add outbreak details
If the paper does not include an outbreak you can skip this section.
For this table we have a set number of field options which can be
accepted. These field options can be viewed in the
Database field options
vignette. The validations in the
function will assert that valid options have been used.
# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_outbreak_entry.R')
new_outbreak <- create_new_outbreak_entry(
pathogen = "marburg",
new_outbreak = c(list("article_id" = as.integer(1)),
list("outbreak_start_day" = as.integer(NA)),
list("outbreak_start_month" = as.character(NA)),
list("outbreak_start_year" = as.integer(1999)),
list("outbreak_end_day" = as.integer(NA)),
list("outbreak_end_month" = as.character(NA)),
list("outbreak_date_year" = as.integer(2001)),
list("outbreak_duration_months" = as.integer(NA)),
list("outbreak_size" = as.integer(2)),
list("asymptomatic_transmission" = as.integer(0)),
list("outbreak_country" = as.character("Tanzania")),
list("outbreak_location" = as.character(NA)),
list("cases_confirmed" = as.integer(NA)),
list("cases_mode_detection" = as.character(NA)),
list("cases_suspected" = as.integer(NA)),
list("cases_asymptomatic" = as.integer(NA)),
list("deaths" = as.integer(2)),
list("cases_severe_hospitalised" = as.integer(NA)),
list("covidence_id" = as.integer(2059))),
vignette_prepend = "")
If you would like to add multiple outbreaks, it is important to add
each outbreak sequentially in order to ensure that the
outbreak_id
is set correctly.
new_outbreak_dataset <- append_new_entry_to_table(pathogen = "marburg",
table_type = "outbreak",
new_row = new_outbreak,
validate = TRUE,
write_table = TRUE,
vignette_prepend = "")
Step 5 – add parameter details
If the paper does not include parameters you can skip this section.
# load packages and source code
library(tidyverse)
library(validate)
source('R/create_new_parameter_entry.R')
new_parameter <- create_new_parameter_entry(
pathogen = "marburg",
new_param = c(list("article_id" = as.integer(1)),
list("parameter_type" = as.character(NA)),
list("parameter_value" = as.double(NA)),
list("parameter_unit" = as.character(NA)),
list("parameter_lower_bound" = as.double(NA)),
list("parameter_upper_bound" = as.double(NA)),
list("parameter_value_type" = as.character(NA)),
list("parameter_uncertainty_single_value" = as.double(NA)),
list("parameter_uncertainty_singe_type" = as.character(NA)),
list("parameter_uncertainty_lower_value" = as.double(NA)),
list("parameter_uncertainty_upper_value" = as.double(NA)),
list("parameter_uncertainty_type" = as.character(NA)),
list("cfr_ifr_numerator" = as.integer(NA)),
list("cfr_ifr_denominator" = as.integer(NA)),
list("distribution_type" = as.character(NA)),
list("distribution_par1_value" = as.double(NA)),
list("distribution_par1_type" = as.character(NA)),
list("distribution_par1_uncertainty" = as.logical(NA)),
list("distribution_par2_value" = as.double(NA)),
list("distribution_par2_type" = as.character(NA)),
list("distribution_par2_uncertainty" = as.logical(NA)),
list("method_from_supplement" = as.logical(NA)),
list("method_moment_value" = as.character(NA)),
list("cfr_ifr_method" = as.character(NA)),
list("method_r" = as.character(NA)),
list("method_disaggregated_by" = as.character(NA)),
list("method_disaggregated" = as.logical(NA)),
list("method_disaggregated_only" = as.logical(NA)),
list("riskfactor_outcome" = as.character(NA)),
list("riskfactor_name" = as.character(NA)),
list("riskfactor_occupation" = as.character(NA)),
list("riskfactor_significant" = as.character(NA)),
list("riskfactor_adjusted" = as.character(NA)),
list("population_sex" = as.character(NA)),
list("population_sample_type" = as.character(NA)),
list("population_group" = as.character(NA)),
list("population_age_min" = as.integer(NA)),
list("population_age_max" = as.integer(NA)),
list("population_sample_size" = as.integer(NA)),
list("population_country" = as.character(NA)),
list("population_location" = as.character(NA)),
list("population_study_start_day" = as.integer(NA)),
list("population_study_start_month" = as.character(NA)),
list("population_study_start_year" = as.integer(NA)),
list("population_study_end_day" = as.integer(NA)),
list("population_study_end_month" = as.character(NA)),
list("population_study_end_year" = as.integer(NA)),
list("genome_site" = as.character(NA)),
list("genomic_sequence_available" = as.logical(NA)),
list("parameter_class" = as.character(NA)),
list("covidence_id" = as.integer(2059)),
list("Uncertainty" = as.character(NA)),
list("Survey year" = as.character(NA))),
vignette_prepend = "")
If you would like to add multiple parameters, it is important to add
each parameter sequentially in order to ensure that the
parameter_data_id
is set correctly.
new_parameter_dataset <- append_new_entry_to_table(pathogen = "marburg",
table_type = "parameter",
new_row = new_parameter,
validate = TRUE,
write_table = TRUE,
vignette_prepend = "")
Step 6 – create a pull request
Once you have completed the above steps, you can push the updated datasets to your branch of the epireview repository, and then submit a pull request. Please ensure that your pull request contains the following:
We strongly encourage the use of the utility functions set out in this vignette. We will run the data through these validation tests again, and if they fail, the pull request will be rejected.