Reading in Data and Calculating Prevalence
reading_in_data.Rmd
This tutorial covers the following topics:
- Creating a new STAVE object
- Loading data into this object
- Calculating prevalence, with and without ambiguous matches
Reading in data
Our input data will take the form of three tables. We will use the example data that comes loaded with the STAVE package. These tables are already correctly formatted. If you are using your own data then you need to match this input format (see explanation):
data("example_input")
example_input$studies |>
kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
scroll_box(width = "100%", height = NULL)
study_id | study_name | study_type | authors | publication_year | url |
---|---|---|---|---|---|
Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | peer_reviewed | Dama et al | 2017 | https://doi.org/10.1186/s12936-017-1700-8 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 |
example_input$surveys |>
kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
scroll_box(width = "100%", height = NULL)
study_key | survey_id | country_name | site_name | latitude | longitude | spatial_notes | collection_start | collection_end | collection_day | time_notes |
---|---|---|---|---|---|---|---|---|---|---|
Dama_2017 | Bamako_2014 | Mali | Koulikoro | 12.612900 | -8.13560 | WWARN lat and long | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint |
Asua_2019 | Agago_2017 | Uganda | Agago | 2.984722 | 33.33055 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint |
Asua_2019 | Arua_2017 | Uganda | Arua | 3.030000 | 30.91000 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint |
Asua_2019 | Kole_2017 | Uganda | Kole | 2.428611 | 32.80111 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint |
Asua_2019 | Lamwo_2017 | Uganda | Lamwo | 3.533333 | 32.80000 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint |
Asua_2019 | Mubende_2017 | Uganda | Mubende | 0.557500 | 31.39500 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint |
example_input$counts |>
kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
scroll_box(width = "100%", height = NULL)
study_key | survey_key | variant_string | variant_num | total_num |
---|---|---|---|---|
Dama_2017 | Bamako_2014 | crt:76:T | 130 | 170 |
Dama_2017 | Bamako_2014 | mdr1:86:Y | 46 | 158 |
Asua_2019 | Agago_2017 | k13:469:Y | 42 | 42 |
Asua_2019 | Agago_2017 | k13:675:V | 42 | 42 |
Asua_2019 | Arua_2017 | k13:675:V | 43 | 43 |
Asua_2019 | Kole_2017 | k13:469:Y | 47 | 47 |
Asua_2019 | Kole_2017 | k13:675:V | 47 | 47 |
Asua_2019 | Lamwo_2017 | k13:469:Y | 43 | 43 |
Asua_2019 | Lamwo_2017 | k13:675:V | 43 | 43 |
Asua_2019 | Mubende_2017 | k13:469:F | 45 | 45 |
With the data correctly formatted, we can create a STAVE object and append the data:
# create new object
s <- STAVE_object$new()
# append data
s$append_data(studies_dataframe = example_input$studies,
surveys_dataframe = example_input$surveys,
counts_dataframe = example_input$counts)
#> data correctly appended
# check how many studies are now loaded
s
#> Studies: 2
#> Surveys: 6
Once data are loaded, we can always view the different tables using get functions. However, we cannot alter the values directly.
As a side note, if we want to know all the variants in our loaded
data, we can use the get_variants()
function.
s$get_variants()
#> [1] "crt:76:T" "k13:469:F" "k13:469:Y" "k13:675:V" "mdr1:86:Y"
Calculating prevalence
We can calculate the prevalence of any variant using
get_prevalence()
. This appends all information together
over studies and surveys, with final columns giving the prevalence (%)
and lower and upper 95% CIs:
s$get_prevalence("k13:469:Y") |>
kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
scroll_box(width = "100%", height = NULL)
study_id | study_name | study_type | authors | publication_year | url | survey_id | country_name | site_name | latitude | longitude | spatial_notes | collection_start | collection_end | collection_day | time_notes | numerator | denominator | prevalence | prevalence_lower | prevalence_upper |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | peer_reviewed | Dama et al | 2017 | https://doi.org/10.1186/s12936-017-1700-8 | Bamako_2014 | Mali | Koulikoro | 12.612900 | -8.13560 | WWARN lat and long | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint | 0 | 0 | NA | NA | NA |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Agago_2017 | Uganda | Agago | 2.984722 | 33.33055 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 42 | 42 | 100 | 91.59161 | 100.00000 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Arua_2017 | Uganda | Arua | 3.030000 | 30.91000 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 0 | 0 | NA | NA | NA |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Kole_2017 | Uganda | Kole | 2.428611 | 32.80111 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 47 | 47 | 100 | 92.45143 | 100.00000 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Lamwo_2017 | Uganda | Lamwo | 3.533333 | 32.80000 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 43 | 43 | 100 | 91.77889 | 100.00000 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Mubende_2017 | Uganda | Mubende | 0.557500 | 31.39500 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 0 | 45 | 0 | 0.00000 | 7.87051 |
Notice that results are given for every loaded survey, even if there
is no corresponding genetic data to calculate a prevalence. If instead
you only want rows for which there is a non-zero denominator, use the
argument return_full = FALSE
.
Here is another example, this time allowing for ambiguous matches.
s$get_prevalence("k13:469:Y", keep_ambiguous = TRUE, prev_from_min = TRUE) |>
kbl(format = "html", table.attr = "style='width:100%; white-space: nowrap;'") |>
scroll_box(width = "100%", height = NULL)
study_id | study_name | study_type | authors | publication_year | url | survey_id | country_name | site_name | latitude | longitude | spatial_notes | collection_start | collection_end | collection_day | time_notes | numerator_min | numerator_max | denominator | prevalence | prevalence_lower | prevalence_upper |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | peer_reviewed | Dama et al | 2017 | https://doi.org/10.1186/s12936-017-1700-8 | Bamako_2014 | Mali | Koulikoro | 12.612900 | -8.13560 | WWARN lat and long | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint | 0 | 0 | 0 | NA | NA | NA |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Agago_2017 | Uganda | Agago | 2.984722 | 33.33055 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 42 | 42 | 42 | 100 | 91.59161 | 100.00000 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Arua_2017 | Uganda | Arua | 3.030000 | 30.91000 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 0 | 0 | 0 | NA | NA | NA |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Kole_2017 | Uganda | Kole | 2.428611 | 32.80111 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 47 | 47 | 47 | 100 | 92.45143 | 100.00000 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Lamwo_2017 | Uganda | Lamwo | 3.533333 | 32.80000 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 43 | 43 | 43 | 100 | 91.77889 | 100.00000 |
Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | peer_reviewed | Asua et al | 2019 | https://doi.org/10.1128/aac.01818-18 | Mubende_2017 | Uganda | Mubende | 0.557500 | 31.39500 | WWARN lat and long | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | 0 | 0 | 45 | 0 | 0.00000 | 7.87051 |
A min
and max
numerator are now given.
There is still only a single prevalence estimate and 95% CI, calculated
using either the min
or the max
values,
specified by the prev_from_min
argument.