Reading in Data and Calculating Prevalence • STAVE

This tutorial covers the following topics:

Creating a new STAVE object
Loading data into this object
Calculating prevalence, with and without ambiguous matches

Reading in data

Our input data will take the form of three tables. We will use the example data that comes loaded with the STAVE package. These tables are already correctly formatted. If you are using your own data then you need to match this input format (see explanation):

data("example_input")

example_input$studies |>
  kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
  scroll_box(width = "100%", height = NULL)

study_id	study_name	study_type	authors	publication_year	url
Dama_2017	Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali	peer_reviewed	Dama et al	2017	https://doi.org/10.1186/s12936-017-1700-8
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18


example_input$surveys |>
  kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
  scroll_box(width = "100%", height = NULL)

study_key	survey_id	country_name	site_name	latitude	longitude	spatial_notes	collection_start	collection_end	collection_day	time_notes
Dama_2017	Bamako_2014	Mali	Koulikoro	12.612900	-8.13560	WWARN lat and long	2014-01-01	2014-12-31	2014-07-02	automated midpoint
Asua_2019	Agago_2017	Uganda	Agago	2.984722	33.33055	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint
Asua_2019	Arua_2017	Uganda	Arua	3.030000	30.91000	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint
Asua_2019	Kole_2017	Uganda	Kole	2.428611	32.80111	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint
Asua_2019	Lamwo_2017	Uganda	Lamwo	3.533333	32.80000	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint
Asua_2019	Mubende_2017	Uganda	Mubende	0.557500	31.39500	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint


example_input$counts |>
  kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
  scroll_box(width = "100%", height = NULL)

study_key	survey_key	variant_string	variant_num	total_num
Dama_2017	Bamako_2014	crt:76:T	130	170
Dama_2017	Bamako_2014	mdr1:86:Y	46	158
Asua_2019	Agago_2017	k13:469:Y	42	42
Asua_2019	Agago_2017	k13:675:V	42	42
Asua_2019	Arua_2017	k13:675:V	43	43
Asua_2019	Kole_2017	k13:469:Y	47	47
Asua_2019	Kole_2017	k13:675:V	47	47
Asua_2019	Lamwo_2017	k13:469:Y	43	43
Asua_2019	Lamwo_2017	k13:675:V	43	43
Asua_2019	Mubende_2017	k13:469:F	45	45

With the data correctly formatted, we can create a STAVE object and append the data:

# create new object
s <- STAVE_object$new()

# append data
s$append_data(studies_dataframe = example_input$studies,
              surveys_dataframe = example_input$surveys,
              counts_dataframe = example_input$counts)
#> data correctly appended


# check how many studies are now loaded
s
#> Studies: 2
#> Surveys: 6

Once data are loaded, we can always view the different tables using get functions. However, we cannot alter the values directly.

As a side note, if we want to know all the variants in our loaded data, we can use the get_variants() function.

s$get_variants()
#> [1] "crt:76:T"  "k13:469:F" "k13:469:Y" "k13:675:V" "mdr1:86:Y"

Calculating prevalence

We can calculate the prevalence of any variant using get_prevalence(). This appends all information together over studies and surveys, with final columns giving the prevalence (%) and lower and upper 95% CIs:

s$get_prevalence("k13:469:Y") |>
  kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
  scroll_box(width = "100%", height = NULL)

study_id	study_name	study_type	authors	publication_year	url	survey_id	country_name	site_name	latitude	longitude	spatial_notes	collection_start	collection_end	collection_day	time_notes	numerator	denominator	prevalence	prevalence_lower	prevalence_upper
Dama_2017	Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali	peer_reviewed	Dama et al	2017	https://doi.org/10.1186/s12936-017-1700-8	Bamako_2014	Mali	Koulikoro	12.612900	-8.13560	WWARN lat and long	2014-01-01	2014-12-31	2014-07-02	automated midpoint	0	0	NA	NA	NA
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Agago_2017	Uganda	Agago	2.984722	33.33055	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	42	42	100	91.59161	100.00000
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Arua_2017	Uganda	Arua	3.030000	30.91000	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	0	0	NA	NA	NA
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Kole_2017	Uganda	Kole	2.428611	32.80111	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	47	47	100	92.45143	100.00000
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Lamwo_2017	Uganda	Lamwo	3.533333	32.80000	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	43	43	100	91.77889	100.00000
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Mubende_2017	Uganda	Mubende	0.557500	31.39500	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	0	45	0	0.00000	7.87051

Notice that results are given for every loaded survey, even if there is no corresponding genetic data to calculate a prevalence. If instead you only want rows for which there is a non-zero denominator, use the argument return_full = FALSE.

Here is another example, this time allowing for ambiguous matches.

s$get_prevalence("k13:469:Y", keep_ambiguous = TRUE, prev_from_min = TRUE) |>
  kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
  scroll_box(width = "100%", height = NULL)

study_id	study_name	study_type	authors	publication_year	url	survey_id	country_name	site_name	latitude	longitude	spatial_notes	collection_start	collection_end	collection_day	time_notes	numerator_min	numerator_max	denominator	prevalence	prevalence_lower	prevalence_upper
Dama_2017	Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali	peer_reviewed	Dama et al	2017	https://doi.org/10.1186/s12936-017-1700-8	Bamako_2014	Mali	Koulikoro	12.612900	-8.13560	WWARN lat and long	2014-01-01	2014-12-31	2014-07-02	automated midpoint	0	0	0	NA	NA	NA
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Agago_2017	Uganda	Agago	2.984722	33.33055	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	42	42	42	100	91.59161	100.00000
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Arua_2017	Uganda	Arua	3.030000	30.91000	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	0	0	0	NA	NA	NA
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Kole_2017	Uganda	Kole	2.428611	32.80111	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	47	47	47	100	92.45143	100.00000
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Lamwo_2017	Uganda	Lamwo	3.533333	32.80000	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	43	43	43	100	91.77889	100.00000
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	peer_reviewed	Asua et al	2019	https://doi.org/10.1128/aac.01818-18	Mubende_2017	Uganda	Mubende	0.557500	31.39500	WWARN lat and long	2017-01-01	2017-12-31	2017-07-02	automated midpoint	0	0	45	0	0.00000	7.87051

A min and max numerator are now given. There is still only a single prevalence estimate and 95% CI, calculated using either the min or the max values, specified by the prev_from_min argument.