7. Power and sample size under the z-test

In the previous section, we found that using an approach that mixed CI estimation with hypothesis testing resulted in low power. Here, we start from scratch with a purely hypothesis testing approach. Our null hypothesis will be that the prevalence of deletions is equal to a known value (the threshold), and we are going to try to disprove this hypothesis.

The standard test here is the one-sample z-test, also called the one-sample proportion test. We start by calculating the \(Z\) test statistic as follows:

\(Z = \frac{\displaystyle |\hat{p} - \mu|}{\displaystyle \sqrt{\frac{\hat{p}(1-\hat{p})}{N}D_{eff}}}\)

This statistic follows the z-distribution (i.e. the normal distribution) under the null hypothesis. We can control how often we come to a false-positive conclusion by setting a significance level \(\alpha\). We will reject the null hypothesis whenever our \(Z\) statistic is greater in magnitude (either positive or negative) than the critical value \(z_{\tiny 1 - \alpha/2}\).

We can calculate power under the z-test as the probability of being more extreme than this critical value. This is given by the following formula:

\(\text{Power} = \Phi\Bigg( \frac{|p - \mu|}{\sqrt{\frac{p(1-p)}{N}D_{eff} }} - z_{\tiny 1 - \alpha/2} \Bigg)\)

where \(\Phi()\) is the cumulative density function of the normal distribution. What would happen if we used the value of \(N\) taken from the mistaken approach described in the 2020 Master Protocol? We can explore this very neatly by substituting the value \(N = z_{\tiny 1 - \alpha/2}^2 \frac{\displaystyle p(1-p)}{\displaystyle (p-\mu)^2}D_{eff}\) from the Master Protocol into the formula above. Interestingly, we find that every term inside the brackets cancels out! We are left with \(\text{Power} = \Phi(0)\), which is equal to 0.5. This is essentially a mathematical proof of what we described in the previous section - that power is forced to equal 50% when using the formula from the Master Protocol.

Hopefully by this point I have convinced you of the issues with the CI-based approach. So, what should be the correct sample size formula? We can find this by rearranging the power formula above to give us the exact \(N\) needed to achieve any given power. We will use the symbol \(\beta\) to represent the false-negative rate, meaning \(\text{Power} = 1 - \beta\). We will also define \(z_{\tiny 1 - \beta}\) to be the critical value of the normal distribution at the value \(1 - \beta\). This results in the following formula:

\(N \geq (z_{\tiny 1 - \beta} + z_{\tiny 1 - \alpha/2})^2 \frac{\displaystyle p(1-p)}{\displaystyle (p-\mu)^2}D_{eff}\)

It is interesting to see how similar this is to the formula in the Master Protocol - the only difference is the addition of the extra \(z_{\tiny 1 - \beta}\) term. This extra term means that we will always be underestimating sample sizes if we use the Master Protocol formula.

But just how wrong was the original formula? Dividing one by the other, we find that z-test sample sizes are greater than Master Protocol sample sizes by a factor \((1 + z_{\tiny 1 - \beta}/z_{\tiny 1 - \alpha/2})^2\). If we are aiming for a power of 80% then this factor equals 2.043. So, the actual sample size required is almost exactly double what we would get from the Master Protocol formula.

To put this in context, if previously we found that 551 samples are needed, then with this new (corrected) formula we find that 1126 samples are needed. This is obviously significantly higher even than the updated recommendation of 600, and is likely to be beyond what many malaria control programmes can achieve within their logistic and financial constraints. To overcome this, we need to turn to more powerful statistical approaches.