anscombe

A Bernoulli trial is an experiment in which the outcome can be one of two mutually exclusive results, e.g. true/false, success/failure, heads/tails and so on. A number of methods are available to compute confidence intervals after many such trials have been performed. The most common have been discussed and reviewed by Brown et al. (2001), and are presented below. Consider $n$ trials, with $X$ successes and a significance level of $\alpha=0.05$ to obtain a 95% confidence interval. For each of the methods, the interval is shown graphically for $1 \leqslant n \leqslant 100$ and $1 \leqslant X \leqslant n$ .

Wald

This is the most common method, discussed in many textbooks, and probably the most problematic for small samples. It is based on a normal approximation to the binomial distribution, and it is often called “Wald interval” for it’s relationship with the Wald test. The interval is calculated as:

Lower bound: $L=p-k\sqrt{pq/n}$
Upper bound: $U=p+k\sqrt{pq/n}$

where $k = \Phi^{-1}\{1-\alpha/2\}$ , $\Phi^{-1}$ is the probit function, $p=X/n$ and $q=1-p$ .

Wald confidence interval.

Wilson

This interval appeared in Wilson (1927) and is defined as:

Lower bound: $L = \tilde{p} - (k/\tilde{n})\sqrt{npq+k^2/4}$
Upper bound: $U = \tilde{p} + (k/\tilde{n})\sqrt{npq+k^2/4}$

where $\tilde{p} = \tilde{X}/\tilde{n}$ , $\tilde{n}=n+k^2$ , $\tilde{X} = X+ k^2/2$ , and the remaining are as above. This is probably the most appropriate for the majority of situations.

Wilson confidence interval.

Agresti-Coull

This interval appeared in Agresti and Coull (1998) and shares many features with the Wilson interval. It is defined as:

Lower bound: $L = \tilde{p} - k\sqrt{\tilde{p}\tilde{q}/\tilde{n}}$
Upper bound: $U = \tilde{p} + k\sqrt{\tilde{p}\tilde{q}/\tilde{n}}$

where $\tilde{q}=1-\tilde{p}$ , and the remaining are as above.

Agresti-Coull confidence interval.

Jeffreys

This interval has a Bayesian motivation and uses the Jeffreys prior (Jeffreys, 1946). It seems to have been introduced by Brown et al. (2001). It is defined as:

Lower bound: $L = \mathcal{B}^{-1}\{\alpha/2,X+1/2,n-X+1/2\}$
Upper bound: $U = \mathcal{B}^{-1}\{1-\alpha/2,X+1/2,n-X+1/2\}$

where $\mathcal{B}^{-1}\{x,s_1,s_2\}$ is the inverse cdf of the beta distribution (not to be confused with the beta function), at the quantile $x$ , and with shape parameters $s_1$ and $s_2$ .

Jeffreys confidence interval.

Clopper-Pearson

This interval was proposed by Clopper and Pearson (1934) and is based on a binomial test, rather than on approximations, hence sometimes being called “exact”, although it is not “exact” in the common sense. It is considered overly conservative.

Lower bound: $L = \mathcal{B}^{-1}\{\alpha/2,X,n-X+1\}$
Upper bound: $U = \mathcal{B}^{-1}\{1-\alpha/2,X+1,n-X\}$

where $\mathcal{B}^{-1}\{x,s_1,s_2\}$ is the inverse cdf of the beta distribution as above.

Clopper-Pearson confidence interval.

Arc-Sine

This interval is based on the arc-sine variance-stabilising transformation. The interval is defined as:

Lower bound: $L = \sin\{\arcsin\{\sqrt{a}\} - k/(2\sqrt{n})\}^2$
Upper bound: $U = \sin\{\arcsin\{\sqrt{a}\} + k/(2\sqrt{n})\}^2$

where $a=\frac{X+3/8}{n+3/4}$ replaces what otherwise would be $p$ (Anscombe, 1948).

Arc-sine confidence interval.

Logit

This interval is based on the Wald interval for $\lambda = \ln\{\frac{X}{n-X}\}$ . It is defined as:

Lower bound: $L = e^{\lambda_L}/(1+e^{\lambda_L})$
Upper bound: $U = e^{\lambda_U}/(1+e^{\lambda_U})$

where $\lambda_L = \lambda - k\sqrt{\hat{V}}$ , $\lambda_U = \lambda + k\sqrt{\hat{V}}$ , and $\hat{V} = \frac{n}{X(n-X)}$ .

Logit confidence interval.

This interval was proposed by Anscombe (1956) and is based on the logit interval:

Lower bound: $L = e^{\lambda_L}/(1+e^{\lambda_L})$
Upper bound: $U = e^{\lambda_U}/(1+e^{\lambda_U})$

The difference is that $\lambda=\ln\{\frac{X+1/2}{n-X+1/2}\}$ and $\hat{V}=\frac{(n+1)(n+2)}{n(X+1)(n-X+1)}$ . The values for $\lambda_L$ and $\lambda_U$ are as above.