Takes raw counts of the number of positive samples per cluster (numerator) and the number of tested samples per cluster (denominator) and returns posterior estimates of the prevalence and intra-cluster correlation coefficient (ICC).
get_prevalence(
n,
N,
alpha = 0.05,
prev_thresh = 0.05,
ICC = NULL,
prior_prev_shape1 = 1,
prior_prev_shape2 = 1,
prior_ICC_shape1 = 1,
prior_ICC_shape2 = 9,
MAP_on = TRUE,
post_mean_on = FALSE,
post_median_on = FALSE,
post_CrI_on = TRUE,
post_thresh_on = TRUE,
post_full_on = FALSE,
post_full_breaks = seq(0, 1, l = 1001),
CrI_type = "HDI",
n_intervals = 20,
round_digits = 2,
use_cpp = TRUE,
silent = FALSE
)
get_ICC(
n,
N,
alpha = 0.05,
prior_prev_shape1 = 1,
prior_prev_shape2 = 1,
prior_ICC_shape1 = 1,
prior_ICC_shape2 = 9,
MAP_on = TRUE,
post_mean_on = FALSE,
post_median_on = FALSE,
post_CrI_on = TRUE,
post_full_on = FALSE,
post_full_breaks = seq(0, 1, l = 1001),
CrI_type = "HDI",
n_intervals = 20,
round_digits = 4,
use_cpp = TRUE
)
the numerator (n
) and denominator (N
) per cluster.
These are both integer vectors.
the significance level of the credible interval - for example,
use alpha = 0.05
for a 95% interval. See also CrI_type
argument for how this is calculated.
the prevalence threshold that we are comparing against. Can be a vector, in which case the return object contains one value for each input.
normally this should be set to NULL
(the default), in which
case the ICC is estimated from the data. However, a fixed value can be
entered here, in which case this overrides the use of the prior
distribution as specified by prior_ICC_shape1
and
prior_ICC_shape2
.
parameters that dictate the shape of the Beta priors on prevalence and the ICC. See the Wikipedia page on the Beta distribution for more detail. The default values of these parameters were chosen based on an analysis of historical pfhrp2/3 studies, although this does not guarantee that they will be suitable in all settings.
a series of boolean values specifying which outputs to produce. The options are:
MAP_on
: if TRUE
then return the maximum a
posteriori.
post_mean_on
: if TRUE
then return the posterior mean.
post_median_on
: if TRUE
then return the posterior
median.
post_CrI_on
: if TRUE
then return the posterior
credible interval at significance level alpha
. See CrI_type
argument for how this is calculated.
post_thresh_on
: if TRUE
then return the posterior
probability of being above the threshold(s) specified by
prev_thresh
.
post_full_on
: if TRUE
then return the full posterior
distribution, produced using the adaptive quadrature approach, at breaks
specified by post_full_breaks
.
a vector of breaks at which to evaluate the full
posterior distribution (only if post_full_on = TRUE
). Defaults to
0.1% intervals from 0% to 100%.
which method to use when computing credible intervals.
Options are "ETI" (equal-tailed interval) or "HDI" (high-density interval).
The ETI searches a distance alpha/2
from either side of the [0,1]
interval. The HDI method returns the narrowest interval that subtends a
proportion 1-alpha
of the distribution. The HDI method is used by
default as it guarantees that the MAP estimate is within the credible
interval, which is not always the case for the ETI.
the number of intervals used in the adaptive quadrature method. Increasing this value gives a more accurate representation of the true posterior, but comes at the cost of reduced speed.
the number of digits after the decimal point that are used when reporting estimates. This is to simplify results and to avoid giving the false impression of extreme precision.
if TRUE
(the default) then use an Rcpp implementation
of the adaptive quadrature approach that is much faster than the base R
method.
if TRUE
then suppress all console output.
There are two unknown quantities in the DRpower model: the prevalence and the intra-cluster correlation (ICC). These functions integrate over a prior on one quantity to arrive at the marginal posterior distribution of the other. Possible outputs include the maximum a posteriori (MAP) estimate, the posterior mean, posterior median, credible interval (CrI), probability of being above a set threshold, and the full posterior distribution. For speed, distributions are approximated using an adaptive quadrature approach in which the full distribution is split into intervals and each intervals is approximated using Simpson's rule. The number of intervals used in quadrature can be increased for more accurate results at the cost of slower speed.
# basic example of estimating prevalence and
# ICC from observed counts
sample_size <- c(80, 110, 120)
deletions <- c(3, 5, 6)
get_prevalence(n = deletions, N = sample_size)
#> MAP CrI_lower CrI_upper prob_above_threshold
#> 1 4.96 1.7 15.72 0.6739
get_ICC(n = deletions, N = sample_size)
#> MAP CrI_lower CrI_upper
#> 1 0 0 0.1642