Calculating Prevalence
tutorial_calculating_prevalence.RmdThis tutorial covers the following topics:
- Calculating prevalence at a single locus
- Calculating prevalence at multiple loci and dealing with ambiguities
Calculating prevalence
Let’s begin by creating a new STAVE object and appending the example data:
# create new object
s <- STAVE_object$new()
# append example data
s$append_data(studies_dataframe = example_input$studies,
surveys_dataframe = example_input$surveys,
counts_dataframe = example_input$counts)
#> data correctly appendedBefore calculating prevalence, it is often useful to inspect the set of variants encoded in the object:
s$get_variants()
#> [1] "crt:76:T" "k13:469:F" "k13:469:Y" "k13:675:V" "mdr1:86:Y"By default, get_variants() lists single-locus variants.
If you instead want to see all multi-locus haplotypes, set:
s$get_variants(report_haplo = TRUE)
#> [1] "crt:76:T" "k13:469:F" "k13:469:Y" "k13:675:V" "mdr1:86:Y"(for this example there is no difference because we have no multi-locus haplotypes loaded).
Prevalence at a single locus
To calculate the prevalence of a specific variant, use
get_prevalence(). For example, here is the prevalence of
the mutation crt:76:T:
s$get_prevalence(target_variant = "crt:76:T")| study_id | study_label | description | access_level | contributors | reference | reference_year | PMID | survey_id | country_name | site_name | latitude | longitude | location_method | location_notes | collection_start | collection_end | collection_day | time_method | time_notes | numerator | denominator | prevalence | prevalence_lower | prevalence_upper |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | NA | public | Dama et al. | https://pubmed.ncbi.nlm.nih.gov/28148267/ | 2017 | 28148267 | Dama_2017_Bamako_2014 | Mali | Koulikoro | 12.612900 | -8.13560 | WWARN lat and long | NA | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint | NA | 130 | 170 | 76.47059 | 69.36751 | 82.62694 |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Agago_2017 | Uganda | Agago | 2.984722 | 33.33055 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Arua_2017 | Uganda | Arua | 3.030000 | 30.91000 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Kole_2017 | Uganda | Kole | 2.428611 | 32.80111 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Lamwo_2017 | Uganda | Lamwo | 3.533333 | 32.80000 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Mubende_2017 | Uganda | Mubende | 0.557500 | 31.39500 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | NA | NA | NA |
The output is a joined table containing study, survey, and count information, as well as the estimated prevalence and its 95% confidence interval for each survey.
Note that we have a row for every loaded survey, even when the denominator is zero. To return only surveys with non-zero denominators, use:
s$get_prevalence(target_variant = "crt:76:T", return_full = FALSE)| study_id | study_label | description | access_level | contributors | reference | reference_year | PMID | survey_id | country_name | site_name | latitude | longitude | location_method | location_notes | collection_start | collection_end | collection_day | time_method | time_notes | numerator | denominator | prevalence | prevalence_lower | prevalence_upper |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | NA | public | Dama et al. | https://pubmed.ncbi.nlm.nih.gov/28148267/ | 2017 | 28148267 | Dama_2017_Bamako_2014 | Mali | Koulikoro | 12.6129 | -8.1356 | WWARN lat and long | NA | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint | NA | 130 | 170 | 76.47059 | 69.36751 | 82.62694 |
Prevalence of a haploype, and ambiguous matches
Here is another example, this time allowing for ambiguous matches.
s$get_prevalence("crt:76:T", keep_ambiguous = TRUE, prev_from_min = TRUE)| study_id | study_label | description | access_level | contributors | reference | reference_year | PMID | survey_id | country_name | site_name | latitude | longitude | location_method | location_notes | collection_start | collection_end | collection_day | time_method | time_notes | numerator | numerator_min | numerator_max | denominator | prevalence | prevalence_lower | prevalence_upper |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | NA | public | Dama et al. | https://pubmed.ncbi.nlm.nih.gov/28148267/ | 2017 | 28148267 | Dama_2017_Bamako_2014 | Mali | Koulikoro | 12.612900 | -8.13560 | WWARN lat and long | NA | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint | NA | 130 | 130 | 130 | 170 | 76.47059 | 69.36751 | 82.62694 |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Agago_2017 | Uganda | Agago | 2.984722 | 33.33055 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Arua_2017 | Uganda | Arua | 3.030000 | 30.91000 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Kole_2017 | Uganda | Kole | 2.428611 | 32.80111 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Lamwo_2017 | Uganda | Lamwo | 3.533333 | 32.80000 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | 0 | 0 | NA | NA | NA |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 | Asua_2019_Mubende_2017 | Uganda | Mubende | 0.557500 | 31.39500 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA | 0 | 0 | 0 | 0 | NA | NA | NA |
A min and max numerator are now given. In
this example there is no ambiguity as we are calculating prevalence at a
single locus, but for longer haplotypes the min and
max can differ. The prevalence and 95% CI calculated using
either the min or the max values, specified by
the prev_from_min argument.