# Confidence intervals for Bernoulli trials

A Bernoulli trial is an experiment in which the outcome can be one of two mutually exclusive results, e.g. true/false, success/failure, heads/tails and so on. A number of methods are available to compute confidence intervals after many such trials have been performed. The most common have been discussed and reviewed by Brown et al. (2001), and are presented below. Consider $n$ trials, with $X$ successes and a significance level of $\alpha=0.05$ to obtain a 95% confidence interval. For each of the methods, the interval is shown graphically for $1 \leqslant n \leqslant 100$ and $1 \leqslant X \leqslant n$.

## Wald

This is the most common method, discussed in many textbooks, and probably the most problematic for small samples. It is based on a normal approximation to the binomial distribution, and it is often called “Wald interval” for it’s relationship with the Wald test. The interval is calculated as:

• Lower bound: $L=p-k\sqrt{pq/n}$
• Upper bound: $U=p+k\sqrt{pq/n}$

where $k = \Phi^{-1}\{1-\alpha/2\}$, $\Phi^{-1}$ is the probit function, $p=X/n$ and $q=1-p$.

## Wilson

This interval appeared in Wilson (1927) and is defined as:

• Lower bound: $L = \tilde{p} - (k/\tilde{n})\sqrt{npq+k^2/4}$
• Upper bound: $U = \tilde{p} + (k/\tilde{n})\sqrt{npq+k^2/4}$

where $\tilde{p} = \tilde{X}/\tilde{n}$, $\tilde{n}=n+k^2$, $\tilde{X} = X+ k^2/2$, and the remaining are as above. This is probably the most appropriate for the majority of situations.

## Agresti-Coull

This interval appeared in Agresti and Coull (1998) and shares many features with the Wilson interval. It is defined as:

• Lower bound: $L = \tilde{p} - k\sqrt{\tilde{p}\tilde{q}/\tilde{n}}$
• Upper bound: $U = \tilde{p} + k\sqrt{\tilde{p}\tilde{q}/\tilde{n}}$

where $\tilde{q}=1-\tilde{p}$, and the remaining are as above.

## Jeffreys

This interval has a Bayesian motivation and uses the Jeffreys prior (Jeffreys, 1946). It seems to have been introduced by Brown et al. (2001). It is defined as:

• Lower bound: $L = \mathcal{B}^{-1}\{\alpha/2,X+1/2,n-X+1/2\}$
• Upper bound: $U = \mathcal{B}^{-1}\{1-\alpha/2,X+1/2,n-X+1/2\}$

where $\mathcal{B}^{-1}\{x,s_1,s_2\}$ is the inverse cdf of the beta distribution (not to be confused with the beta function), at the quantile $x$, and with shape parameters $s_1$ and $s_2$.

## Clopper-Pearson

This interval was proposed by Clopper and Pearson (1934) and is based on a binomial test, rather than on approximations, hence sometimes being called “exact”, although it is not “exact” in the common sense. It is considered overly conservative.

• Lower bound: $L = \mathcal{B}^{-1}\{\alpha/2,X,n-X+1\}$
• Upper bound: $U = \mathcal{B}^{-1}\{1-\alpha/2,X+1,n-X\}$

where $\mathcal{B}^{-1}\{x,s_1,s_2\}$ is the inverse cdf of the beta distribution as above.

## Arc-Sine

This interval is based on the arc-sine variance-stabilising transformation. The interval is defined as:

• Lower bound: $L = \sin\{\arcsin\{\sqrt{a}\} - k/(2\sqrt{n})\}^2$
• Upper bound: $U = \sin\{\arcsin\{\sqrt{a}\} + k/(2\sqrt{n})\}^2$

where $a=\frac{X+3/8}{n+3/4}$ replaces what otherwise would be $p$ (Anscombe, 1948).

## Logit

This interval is based on the Wald interval for $\lambda = \ln\{\frac{X}{n-X}\}$. It is defined as:

• Lower bound: $L = e^{\lambda_L}/(1+e^{\lambda_L})$
• Upper bound: $U = e^{\lambda_U}/(1+e^{\lambda_U})$

where $\lambda_L = \lambda - k\sqrt{\hat{V}}$, $\lambda_U = \lambda + k\sqrt{\hat{V}}$, and $\hat{V} = \frac{n}{X(n-X)}$.

## Anscombe

This interval was proposed by Anscombe (1956) and is based on the logit interval:

• Lower bound: $L = e^{\lambda_L}/(1+e^{\lambda_L})$
• Upper bound: $U = e^{\lambda_U}/(1+e^{\lambda_U})$

The difference is that $\lambda=\ln\{\frac{X+1/2}{n-X+1/2}\}$ and $\hat{V}=\frac{(n+1)(n+2)}{n(X+1)(n-X+1)}$. The values for $\lambda_L$ and $\lambda_U$ are as above.

## Octave/MATLAB implementation

A function that computes these intervals is available here: confint.m.