vignettes/tutorial_analysis.Rmd
tutorial_analysis.Rmd
Here, we outline the main steps in analysing data using the DRpower Bayesian model. Although we focus on the pfhrp2/3 use-case here, the same steps can be used to analyse the prevalence of drug resistance markers.
The main thing we want to estimate is usually the prevalence of
pfhrp2/3 deletions. This is extremely simple to do, and is
carried out through the get_prevalence()
function. We pass
this function two sets of values; 1) the number of pfhrp2/3
deletions observed in each site (numerator), and 2) the total sample
size in each site (denominator):
# define observed data
num_deletions <- c(3, 12, 4)
sample_size <- c(100, 130, 65)
# estimate prevalence
get_prevalence(n = num_deletions,
N = sample_size)
#> MAP CrI_lower CrI_upper prob_above_threshold
#> 1 6.81 2.3 19.52 0.8791
We obtain a point estimate of 6.81% prevalence, with a 95% CrI in the range [2.3% to 19.52%]. When presenting our estimates we should always report the full credible interval and not just the central estimate of 6.81%, as this may give a misleading impression of how confident we are in this value.
The second thing we may want to do is to establish whether the
prevalence is above the 5% threshold at the domain level. The
probability of being above this threshold is given in the
prob_above_threshold
output above, in this case 0.8791.
Before conducting this analysis, we should have decided what level of
confidence we need in order to accept this hypothesis - we advise using
0.95 by default. In this case, 0.8791 is below 0.95 so we do not have
sufficient evidence to conclude that prevalence is above 5% at the
domain level.
Note that it is possible for the CrI to span the 5% threshold, but
for the prob_above_threshold
to still be greater than 0.95.
This is because the CrI is two-sided, whereas the hypothesis test is one
sided.
The prevalence estimates above have already taken into account
uncertainty in the intra-cluster correlation (ICC). That being said, it
can be useful to present our estimate of the ICC to help contextualise
results, and to guide future studies. This can be achieved through the
get_ICC()
function, which takes the same two inputs:
# estimate ICC
get_ICC(n = num_deletions,
N = sample_size)
#> MAP CrI_lower CrI_upper
#> 1 0.0074 0 0.1912
We estimate that the ICC is around 0.0074, and in the range [0, 0.1912]. This is a fairly low value of the ICC, and so when we conduct follow-up studies or studies in nearby regions we should take this information into account.