Analysing data

Here, we outline the main steps in analysing data using the DRpower Bayesian model. Although we focus on the pfhrp2/3 use-case here, the same steps can be used to analyse the prevalence of drug resistance markers.

1. Estimate prevalence

The main thing we want to estimate is usually the prevalence of pfhrp2/3 deletions. This is extremely simple to do, and is carried out through the get_prevalence() function. We pass this function two sets of values; 1) the number of pfhrp2/3 deletions observed in each site (numerator), and 2) the total sample size in each site (denominator):

# define observed data
num_deletions <- c(3, 12, 4)
sample_size <- c(100, 130, 65)

# estimate prevalence
get_prevalence(n = num_deletions,
               N = sample_size)
#>    MAP CrI_lower CrI_upper prob_above_threshold
#> 1 6.81       2.3     19.52               0.8791

We obtain a point estimate of 6.81% prevalence, with a 95% CrI in the range [2.3% to 19.52%]. When presenting our estimates we should always report the full credible interval and not just the central estimate of 6.81%, as this may give a misleading impression of how confident we are in this value.

2. Compare prevalence against a threshold

The second thing we may want to do is to establish whether the prevalence is above the 5% threshold at the domain level. The probability of being above this threshold is given in the prob_above_threshold output above, in this case 0.8791. Before conducting this analysis, we should have decided what level of confidence we need in order to accept this hypothesis - we advise using 0.95 by default. In this case, 0.8791 is below 0.95 so we do not have sufficient evidence to conclude that prevalence is above 5% at the domain level.

Note that it is possible for the CrI to span the 5% threshold, but for the prob_above_threshold to still be greater than 0.95. This is because the CrI is two-sided, whereas the hypothesis test is one sided.

3. Estimate the ICC

The prevalence estimates above have already taken into account uncertainty in the intra-cluster correlation (ICC). That being said, it can be useful to present our estimate of the ICC to help contextualise results, and to guide future studies. This can be achieved through the get_ICC() function, which takes the same two inputs:

# estimate ICC
get_ICC(n = num_deletions,
        N = sample_size)
#>      MAP CrI_lower CrI_upper
#> 1 0.0074         0    0.1912

We estimate that the ICC is around 0.0074, and in the range [0, 0.1912]. This is a fairly low value of the ICC, and so when we conduct follow-up studies or studies in nearby regions we should take this information into account.

Bob Verity

Last updated: 08 Nov 2024

1. Estimate prevalence

2. Compare prevalence against a threshold

3. Estimate the ICC