# Inverse normal transformation in SOLAR

SOLAR software can, at the discretion of the user, apply a rank-based inverse-normal transformation to the data using the command inormal. This transformation is the one suggested by Van der Waerden (1952) and is given by: $\tilde y_i = \Phi^{-1}\left\{\dfrac{r_i}{n+1}\right\}$

where $\tilde y_i$ is the transformed value for observation $i$, $\Phi^{-1}\left\{\cdot\right\}$ is the probit function, $r_i$ is the ordinary rank of the $i$-th case among $n$ observations.

This transformation is a particular case of the family of transformations discussed in the paper by Beasley et al. (2009). The family can be represented as: $\tilde y_i = \Phi^{-1}\left\{\dfrac{r_i+c}{n-2c+1}\right\}$

where $c$ is a constant and the remaining variables are as above. The value of $c$ varies for different proposed methods. Blom (1958) suggests $c=3/8$, Tukey (1962) suggests $c=1/3$, Bliss (1967) suggests $c=1/2$ and, as just decribed, Van der Waerden suggests $c=0$.

Interesting enough, the Q-Q plots produced by Octave use the Bliss (1967) transformation.

An Octave/matlab function to perform these transformations in arbitrary data is here: inormal.m (note that this function does not require or use SOLAR).

## Version history

• 23.Jul.2011: First public release.
• 19.Jun.2014: Added ability to deal with ties, as well as NaNs in the data.

## 3 thoughts on “Inverse normal transformation in SOLAR”

1. Sam Mathias on said:

Useful! Here is a function to do the same thing in Python.

from scipy.stats import rankdata, norm

def rank_based_inv_norm(x, c=0):
“””
Perform the rank-based inverse normal transformation on data x.

Reference: Beasley TM, Erickson S, Allison DB. Rank-based inverse normal
transformations are increasingly used, but are they merited? Behav Genet.
2009; 39(5):580-95.

:param x: input array-like
:param c: constant parameter
:return: y
“””
return norm.ppf((rankdata(x) + c) / (x.size – 2*c + 1))

• A. M. Winkler on said:

Hey Sam — good to have this in other languages. You may need to add the extra lines to treat ties, otherwise the results aren’t accurate for all cases. Also note that the default is c=3/8, whereas you’re using c=0 (both are fine, just different).
Cheers,
Anderson

• I chose c=0 because most of the time we will probably want to replicate the SOLAR output (in the IoL lab anyway).

scipy.rankdata has a number of optional methods for handling ties, the default is ‘average’, which I just assumed was the one SOLAR used, but couldn’t find a reference. For continuous traits with high precision the probability of tied ranks is quite low, so I think this function will give us the same output as SOLAR most of the time, but need to check this.