Margin of error calculations using the Bayesian DRpower model when estimating prevalence from a clustered survey

As well as comparing against a threshold, the function get_prevalence() can be used to estimate a Bayesian credible interval (CrI) on the prevalence. This function returns the margin of error (MOE) we can expect via this method, in terms of the expected lower and upper limits of our credible interval (CrI).

get_margin_Bayesian(
  N,
  prevalence = 0.2,
  ICC = 0.05,
  alpha = 0.05,
  prior_prev_shape1 = 1,
  prior_prev_shape2 = 1,
  prior_ICC_shape1 = 1,
  prior_ICC_shape2 = 9,
  CrI_type = "HDI",
  n_intervals = 20,
  round_digits = 2,
  reps = 100,
  use_cpp = TRUE,
  return_full = FALSE,
  silent = FALSE
)

Arguments

N: vector of the number of samples obtained from each cluster.
prevalence: assumed true prevalence of pfhrp2/3 deletions as a proportion between 0 and 1.
ICC: assumed true intra-cluster correlation (ICC) as a value between 0 and 1.
alpha: the significance level of the credible interval - for example, use alpha = 0.05 for a 95% interval. See also CrI_type argument for how this is calculated.
prior_prev_shape1, prior_prev_shape2, prior_ICC_shape1, prior_ICC_shape2: parameters that dictate the shape of the Beta priors on prevalence and the ICC. See the Wikipedia page on the Beta distribution for more detail. The default values of these parameters were chosen based on an analysis of historical pfhrp2/3 studies, although this does not guarantee that they will be suitable in all settings.
CrI_type: which method to use when computing credible intervals. Options are "ETI" (equal-tailed interval) or "HDI" (high-density interval). The ETI searches a distance alpha/2 from either side of the [0,1] interval. The HDI method returns the narrowest interval that subtends a proportion 1-alpha of the distribution. The HDI method is used by default as it guarantees that the MAP estimate is within the credible interval, which is not always the case for the ETI.
n_intervals: the number of intervals used in the adaptive quadrature method. Increasing this value gives a more accurate representation of the true posterior, but comes at the cost of reduced speed.
round_digits: the number of digits after the decimal point that are used when reporting estimates. This is to simplify results and to avoid giving the false impression of extreme precision.
reps: number of times to repeat simulation per parameter combination.
use_cpp: if TRUE (the default) then use an Rcpp implementation of the adaptive quadrature approach that is much faster than the base R method.
return_full: if TRUE then return the complete distribution of lower and upper CrI limits in a data.frame. If FALSE (the default) return a summary including the mean and 95% CI of these limits.
silent: if TRUE then suppress all console output.

Value

If return_full = FALSE (the default) returns an estimate of the lower and upper CrI limit in the form of a data.frame. The first row gives the lower limit, the second row gives the upper limit, both as percentages. The first column gives the point estimate, the subsequent columns give the 95% CI on this estimate. If return_full = TRUE

then returns a complete data.frame of all lower and upper CI realisations over simulations.

Details

Estimates MOE using the following approach:

Simulate data via the function rbbinom_reparam() using known values, e.g. a known "true" prevalence and intra-cluster correlation.
Analyse data using get_prevalence(). Determine the upper and lower limits of the credible interval.
Repeat steps 1-2 reps times to obtain the distribution of upper and lower limits. Return the mean of this distribution along with upper and lower 95% CIs. To be completely clear, we are producing a 95% CI on the limits of a CrI, which can be confusing! See Value for a clear explanation of how to interpret the output.

Note that we have not implemented a function to return the sample size needed to achieve a given MOE under the Bayesian model, as this would require repeated simulation over different values of N which is computationally costly. The appropriate value can be established manually if needed by running get_margin_Bayesian() for different sample sizes.

Examples

get_margin_Bayesian(N = c(120, 90, 150), prevalence = 0.15, ICC = 0.01 , reps = 1e2)
#> 
|===                                                   |  6% ~1 s remaining     
|======                                                | 12% ~1 s remaining     
|=========                                             | 18% ~1 s remaining     
|============                                          | 24% ~1 s remaining     
|================                                      | 30% ~1 s remaining     
|===================                                   | 36% ~1 s remaining     
|======================                                | 42% ~1 s remaining     
|=========================                             | 48% ~0 s remaining     
|=============================                         | 54% ~0 s remaining     
|================================                      | 60% ~0 s remaining     
|==================================                    | 64% ~0 s remaining     
|=====================================                 | 70% ~0 s remaining     
|=========================================             | 76% ~0 s remaining     
|============================================          | 82% ~0 s remaining     
|===============================================       | 88% ~0 s remaining     
|===================================================   | 95% ~0 s remaining     
Completed after 1 s                                                             
#>       estimate    CI_2.5   CI_97.5
#> lower   7.8694  7.501811  8.236989
#> upper  28.9751 28.250169 29.700031