Confidence intervals for Bernoulli trials

A Bernoulli trial is an experiment in which the outcome can be one of two mutually exclusive results, e.g. true/false, success/failure, heads/tails and so on. A number of methods are available to compute confidence intervals after many such trials have been performed. The most common have been discussed and reviewed by Brown et al. (2001), and are presented below. Consider n trials, with X successes and a significance level of \alpha=0.05 to obtain a 95% confidence interval. For each of the methods, the interval is shown graphically for 1 \leqslant n \leqslant 100 and 1 \leqslant X \leqslant n.

Wald

This is the most common method, discussed in many textbooks, and probably the most problematic for small samples. It is based on a normal approximation to the binomial distribution, and it is often called “Wald interval” for it’s relationship with the Wald test. The interval is calculated as:

  • Lower bound: L=p-k\sqrt{pq/n}
  • Upper bound: U=p+k\sqrt{pq/n}

where k = \Phi^{-1}\{1-\alpha/2\}, \Phi^{-1} is the probit function, p=X/n and q=1-p.

Wald confidence interval.

Wilson

This interval appeared in Wilson (1927) and is defined as:

  • Lower bound: L = \tilde{p} - (k/\tilde{n})\sqrt{npq+k^2/4}
  • Upper bound: U = \tilde{p} + (k/\tilde{n})\sqrt{npq+k^2/4}

where \tilde{p} = \tilde{X}/\tilde{n}, \tilde{n}=n+k^2, \tilde{X} = X+ k^2/2, and the remaining are as above. This is probably the most appropriate for the majority of situations.

Wilson confidence interval.

Agresti-Coull

This interval appeared in Agresti and Coull (1998) and shares many features with the Wilson interval. It is defined as:

  • Lower bound: L = \tilde{p} - k\sqrt{\tilde{p}\tilde{q}/\tilde{n}}
  • Upper bound: U = \tilde{p} + k\sqrt{\tilde{p}\tilde{q}/\tilde{n}}

where \tilde{q}=1-\tilde{p}, and the remaining are as above.

Agresti-Coull confidence interval.

Jeffreys

This interval has a Bayesian motivation and uses the Jeffreys prior (Jeffreys, 1946). It seems to have been introduced by Brown et al. (2001). It is defined as:

  • Lower bound: L = \mathcal{B}^{-1}\{\alpha/2,X+1/2,n-X+1/2\}
  • Upper bound: U = \mathcal{B}^{-1}\{1-\alpha/2,X+1/2,n-X+1/2\}

where \mathcal{B}^{-1}\{x,s_1,s_2\} is the inverse cdf of the beta distribution (not to be confused with the beta function), at the quantile x, and with shape parameters s_1 and s_2.

Jeffreys confidence interval.

Clopper-Pearson

This interval was proposed by Clopper and Pearson (1934) and is based on a binomial test, rather than on approximations, hence sometimes being called “exact”, although it is not “exact” in the common sense. It is considered overly conservative.

  • Lower bound: L = \mathcal{B}^{-1}\{\alpha/2,X,n-X+1\}
  • Upper bound: U = \mathcal{B}^{-1}\{1-\alpha/2,X+1,n-X\}

where \mathcal{B}^{-1}\{x,s_1,s_2\} is the inverse cdf of the beta distribution as above.

Clopper-Pearson confidence interval.

Arc-Sine

This interval is based on the arc-sine variance-stabilising transformation. The interval is defined as:

  • Lower bound: L = \sin\{\arcsin\{\sqrt{a}\} - k/(2\sqrt{n})\}^2
  • Upper bound: U = \sin\{\arcsin\{\sqrt{a}\} + k/(2\sqrt{n})\}^2

where a=\frac{X+3/8}{n+3/4} replaces what otherwise would be p (Anscombe, 1948).

Arc-sine confidence interval.

Logit

This interval is based on the Wald interval for \lambda = \ln\{\frac{X}{n-X}\}. It is defined as:

  • Lower bound: L = e^{\lambda_L}/(1+e^{\lambda_L})
  • Upper bound: U = e^{\lambda_U}/(1+e^{\lambda_U})

where \lambda_L = \lambda - k\sqrt{\hat{V}}, \lambda_U = \lambda + k\sqrt{\hat{V}}, and \hat{V} = \frac{n}{X(n-X)}.

Logit confidence interval.

Anscombe

This interval was proposed by Anscombe (1956) and is based on the logit interval:

  • Lower bound: L = e^{\lambda_L}/(1+e^{\lambda_L})
  • Upper bound: U = e^{\lambda_U}/(1+e^{\lambda_U})

The difference is that \lambda=\ln\{\frac{X+1/2}{n-X+1/2}\} and \hat{V}=\frac{(n+1)(n+2)}{n(X+1)(n-X+1)}. The values for \lambda_L and \lambda_U are as above.

Anscombe’s confidence interval.

Octave/MATLAB implementation

A function that computes these intervals is available here: confint.m.

References

A script to create platonic polyhedra, with recursive subdivision

Suppose you need a tri-dimensional file that contains a platonic polyhedron, say, an octahedron or an icosahedron, stored as a mesh. This file should, however, meet certain criteria: be in an useful format that can be imported into modelling applications, such as Blender, and be easily manipulable using scripts in languages suited for numerical computations, such as Octave or R and perhaps even be manipulated directly from the shell, using, e.g., gawk.

To meet these demands, below is a tool that creates any of the five platonic polyhedra in Wavefront Object format, with the *.obj extension (in ASCII). The user can specify the desired edge length, the area of the face, the total area, the total volume, the radius of the circumscribed sphere, or the radius of the inscribed sphere.

The tool also allows recursive subdivision of the faces of the polyhedra that are made of triangular faces (tetrahedron, octahedron and icosahedron). The subdivision implemented allows geodesic spheres of Class I (Kenner, 1976) to be produced easily. It is also possible to apply affine transformations to the coordinates of the vertices, so that the resulting mesh can be scaled, translated, rotated and sheared in any direction. Finally, it is possible to randomly perturb the vertex positions, so that the mesh becomes irregular, yet preserving the original topology.

In the case of recursive subdivision, it is important to note that not all edges have the same length, and not all faces have the same area. See the figure below for an illustration:

(a) A geodesic sphere can be produced from recursive subdivision of a regular icosahedron. At each iteration, the number of faces is quadruplied. (b) After the first iteration the faces no longer have regular sizes, with the largest face being approximately 1.3 times larger than the smallest as n increases.

The formulas to calculate the number of vertices, edges and faces are:

  • Number of vertices: V=4^n(V_0-2)+2;
  • Number of edges: E=4^nE_0;
  • Number of faces: F=4^nF_0;

where F_0, V_0 and E_0 are, respectively, the number of faces, vertices and edges of the polyhedron with triangular faces used for the initial subdivision:

  • Tetrahedron: V_0=4, E_0=6 and F_0=4;
  • Octahedron: V_0=6, E_0=12 and F_0=8;
  • Icosahedron: V_0=12, E_0=30 and F_0=20.

The figure above and the formulas have been discussed by Winkler et al., 2012.

Main file

The links to the script are below. Either of these will work; choose according to your needs:

  • platonic – This is the Octave script. Simply download it, change the first line to point to your correct Octave location, make it executable, and run it directly from the shell. Call it without arguments to obtain usage information.
  • platonic.m – This is the same as above, but can be executed from within Octave or MATLAB. Type ‘help platonic’ to obtain usage information.

Requirements

  • subdivtri.m – Add this to your Octave or MATLAB path.

Examples

To create an icosahedron with radius 100, recursively subdivided 7 times, if from the shell, use:

platonic ico7.obj ico sph 7 100

or, if from inside Octave/MATLAB, use:

platonic('ico7.obj','ico','sph',7,100);

The outputs from platonic are always in Wavefront (.obj) format. To use this icosahedron, e.g., with areal interpolation, it needs to be converted to FreeSurfer ascii format:

obj2srf ico7.obj > ico7.srf

The obj2srf command is available in the package for areal analysis, here.

References