# Variance components in genetic analyses

Pedigree-based analyses allow investigation of genetic and environmental influences on anatomy, physiology, and behaviour.

Methods based on components of variance have been used extensively to assess genetic influences and identify loci associated with various traits quantifying aspects of anatomy, physiology, and behaviour, in both normal and pathological conditions. In an earlier post, indices of genetic resemblance between relatives were presented, and in the last post, the kinship matrix was defined. In this post, these topics are used to present a basic model that allows partitioning of the phenotypic variance into sources of variation that can be ascribed to genetic, environmental, and other factors.

## A simple model

Consider the model:

$\mathbf{Y} = \mathbf{X}\mathbf{B} + \boldsymbol{\Upsilon}$

where, for $S$ subjects, $T$ traits, $P$ covariates and $K$ variance components, $\mathbf{Y}_{S \times T}$ are the observed trait values for each subject, $\mathbf{X}_{S \times P}$ is a matrix of covariates, $\mathbf{B}_{P \times T}$ is a matrix of unknown covariates’ weights, and $\boldsymbol{\Upsilon}_{S \times T}$ are the residuals after the covariates have been taken into account.

The elements of each column $t$ of $\boldsymbol{\Upsilon}$ are assumed to follow a multivariate normal distribution $\mathcal{N}\left(0;\mathbf{S}\right)$, where $\mathbf{S}$ is the between-subject covariance matrix. The elements of each row $s$ of $\boldsymbol{\Upsilon}$ are assumed to follow a normal distribution $\mathcal{N}\left(0;\mathbf{R}\right)$, where $\mathbf{R}$ is the between-trait covariance matrix. Both $\mathbf{R}$ and $\mathbf{S}$ are seen as the sum of $K$ variance components, i.e. $\mathbf{R} = \sum_{k} \mathbf{R}_{k}$ and $\mathbf{S} = \sum_{k} \mathbf{S}_{k}$. For a discussion on these equalities, see Eisenhart (1947) [see references at the end].

## An equivalent model

The same model can be written in an alternative way. Let $\mathbf{y}_{S \cdot T \times 1}$ be the stacked vector of traits, $\mathbf{\tilde{X}}_{S \cdot T \times P \cdot T} = \mathbf{X} \otimes \mathbf{I}_{T \times T}$ is the matrix of covariates, $\boldsymbol{\beta}_{P \cdot T \times 1}$ the vector with the covariates’ weights, $\boldsymbol{\upsilon}_{S \cdot T \times 1}$ the residuals after the covariates have been taken into account, and $\otimes$ represent the Kronecker product. The model can then be written as:

$\mathbf{y} = \mathbf{\tilde{X}}\boldsymbol{\beta} + \boldsymbol{\upsilon}$

The stacked residuals $\boldsymbol{\upsilon}$ is assumed to follow a multivariate normal distribution $\mathcal{N}\left(0;\boldsymbol{\Omega}\right)$, where $\boldsymbol{\Omega}$ can be seen as the sum of $K$ variance components:

$\boldsymbol{\Omega} = \sum_{k} \mathbf{R}_k \otimes \mathbf{S}_k$

The $\boldsymbol{\Omega}$ here is the same as in Almasy and Blangero (1998). $\mathbf{S}_k$ can be modelled as correlation matrices. The associated scalars are absorbed into the (to be estimated) $\mathbf{R}_k$. $\mathbf{R}$ is the phenotypic covariance matrix between the residuals of the traits:

$\mathbf{R} = \left[ \begin{array}{ccc} \mathsf{Var}(\upsilon_1) & \cdots & \mathsf{Cov}(\upsilon_1,\upsilon_T) \\ \vdots & \ddots & \vdots \\ \mathsf{Cov}(\upsilon_T,\upsilon_1) & \cdots & \mathsf{Var}(\upsilon_T) \end{array}\right]$

whereas each $\mathbf{R}_k$ are the share of these covariances attributable to the $k$-th component:

$\mathbf{R}_k = \left[ \begin{array}{ccccc} \mathsf{Var}_k(\upsilon_1) & \cdots & \mathsf{Cov}_k(\upsilon_1,\upsilon_T) \\ \vdots & \ddots & \vdots \\ \mathsf{Cov}_k(\upsilon_T,\upsilon_1) & \cdots & \mathsf{Var}_k(\upsilon_T) \end{array}\right]$

$\mathbf{R}$ can be converted to a between-trait phenotypic correlation matrix $\mathbf{\mathring{R}}$ as:

$\mathbf{\mathring{R}} = \left[ \begin{array}{ccc} \frac{\mathsf{Var}(\upsilon_1)}{\mathsf{Var}(\upsilon_1)} & \cdots & \frac{\mathsf{Cov}(\upsilon_1,\upsilon_T)}{\left(\mathsf{Var}(\upsilon_1)\mathsf{Var}(\upsilon_T)\right)^{1/2}} \\ \vdots & \ddots & \vdots \\ \frac{\mathsf{Cov}(\upsilon_1,\upsilon_T)}{\left(\mathsf{Var}(\upsilon_1)\mathsf{Var}(\upsilon_T)\right)^{1/2}} & \cdots & \frac{\mathsf{Var}(\upsilon_T)}{\mathsf{Var}(\upsilon_T)} \end{array}\right]$

The above phenotypic correlation matrix has unit diagonal and can still be fractioned into their $K$ components:

$\mathbf{\mathring{R}}_k = \left[ \begin{array}{ccc} \frac{\mathsf{Var}_k(\upsilon_1)}{\mathsf{Var}(\upsilon_1)} & \cdots & \frac{\mathsf{Cov}_k(\upsilon_1,\upsilon_T)}{\left(\mathsf{Var}(\upsilon_1)\mathsf{Var}(\upsilon_T)\right)^{1/2}} \\ \vdots & \ddots & \vdots \\ \frac{\mathsf{Cov}_k(\upsilon_T,\upsilon_1)}{\left(\mathsf{Var}(\upsilon_T)\mathsf{Var}(\upsilon_1)\right)^{1/2}} & \cdots & \frac{\mathsf{Var}_k(\upsilon_T)}{\mathsf{Var}(\upsilon_T)} \end{array}\right]$

The relationship $\mathbf{\mathring{R}} = \sum_k \mathbf{\mathring{R}}_k$ holds. The diagonal elements of $\mathbf{\mathring{R}}_k$ may receive particular names, e.g., heritability, environmentability, dominance effects, shared enviromental effects, etc., depending on what is modelled in the corresponding $\mathbf{S}_k$. However, the off-diagonal elements of $\mathbf{\mathring{R}}_k$ are not the $\rho_k$ that correspond, e.g. to the genetic or environmental correlation. These off-diagonal elements are instead the signed $\text{\textsc{erv}}$ when $\mathbf{S}_k=2\cdot\boldsymbol{\Phi}$, or their $\text{\textsc{erv}}_k$-equivalent for other variance components (see below). In this particular case, they can also be called “bivariate heritabilities” (Falconer and MacKay, 1996). A matrix $\mathbf{\breve{R}}_k$ that contains these correlations $\rho_k$, which are the fraction of the variance attributable to the $k$-th component that is shared between pairs of traits is given by:

$\mathbf{\breve{R}}_k = \left[ \begin{array}{ccc} \frac{\mathsf{Var}_k(\upsilon_1)}{\mathsf{Var}_k(\upsilon_1)} & \cdots & \frac{\mathsf{Cov}_k(\upsilon_1,\upsilon_T)}{\left(\mathsf{Var}_k(\upsilon_1)\mathsf{Var}_k(\upsilon_T)\right)^{1/2}} \\ \vdots & \ddots & \vdots \\ \frac{\mathsf{Cov}_k(\upsilon_T,\upsilon_1)}{\left(\mathsf{Var}_k(\upsilon_T)\mathsf{Var}_k(\upsilon_1)\right)^{1/2}} & \cdots & \frac{\mathsf{Var}_k(\upsilon_T)}{\mathsf{Var}_k(\upsilon_T)} \end{array}\right]$

As for the phenotypic correlation matrix, each $\mathbf{\breve{R}}_k$ has unit diagonal.

## The most common case

A particular case is when $\mathbf{S}_1 = 2\cdot\boldsymbol{\Phi}$, the coefficient of familial relationship between subjects, and $\mathbf{S}_2 = \mathbf{I}_{S \times S}$. In this case, the $T$ diagonal elements of $\mathbf{\mathring{R}}_1$ represent the heritability ($h_t^2$) for each trait $t$. The diagonal of $\mathbf{\mathring{R}}_2$ contains $1-h_t^2$, the environmentability. The off-diagonal elements of $\mathbf{\mathring{R}}_1$ contain the signed $\text{\textsc{erv}}$ (see below). The genetic correlations, $\rho_g$ are the off-diagonal elements of $\mathbf{\breve{R}}_1$, whereas the off-diagonal elements of $\mathbf{\breve{R}}_2$ are $\rho_e$, the environmental correlations between traits. In this particular case, the components of $\mathbf{R}$, i.e., $\mathbf{R}_k$ are equivalent to $\mathbf{G}$ and $\mathbf{E}$ covariance matrices as in Almasy et al (1997).

## Relationship with the ERV

To see how the off-diagonal elements of $\mathbf{\mathring{R}}_k$ are the signed Endophenotypic Ranking Values for each of the $k$-th variance component, $\text{\textsc{erv}}_k$ (Glahn et al, 2011), note that for a pair of traits $i$ and $j$:

$\mathring{R}_{kij} = \frac{\mathsf{Cov}_k(\upsilon_i,\upsilon_j)}{\left(\mathsf{Var}(\upsilon_i)\mathsf{Var}(\upsilon_j)\right)^{1/2}}$

Multiplying both numerator and denominator by $\left(\mathsf{Var}_k(\upsilon_i)\mathsf{Var}_k(\upsilon_j)\right)^{1/2}$ and rearranging the terms gives:

$\mathring{R}_{kij} = \frac{\mathsf{Cov}_k(\upsilon_i,\upsilon_j)}{\left(\mathsf{Var}_k(\upsilon_i)\mathsf{Var}_k(\upsilon_j)\right)^{1/2}} \left(\frac{\mathsf{Var}_k(\upsilon_i)}{\mathsf{Var}(\upsilon_i)}\frac{\mathsf{Var}_k(\upsilon_j)}{\mathsf{Var}(\upsilon_j)}\right)^{1/2}$

When $\mathbf{S}_k = 2\cdot\boldsymbol{\Phi}$, the above reduces to $\mathring{R}_{kij} = \rho_k \sqrt{h^2_i h^2_j}$, which is the signed version of $\text{\textsc{erv}}=\left|\rho_g\sqrt{h_i^2h_j^2}\right|$ when $k$ is the genetic component.

## Positive-definiteness

$\mathbf{R}$ and $\mathbf{R}_k$ are covariance matrices and so, are positive-definite, whereas the correlation matrices $\mathbf{\mathring{R}}$, $\mathbf{\mathring{R}}_k$ and $\mathbf{\breve{R}}_k$ are positive-semidefinite. A hybrid matrix that does not have to be positive-definite or semidefinite is:

$\mathbf{\check{R}}_k = \mathbf{I} \odot \mathbf{\mathring{R}}_k + \left(\mathbf{J}-\mathbf{I}\right) \odot \mathbf{\breve{R}}_k$

where $\mathbf{J}$ is a matrix of ones, $\mathbf{I}$ is the identity, both of size $T \times T$, and $\odot$ is the Hadamard product. An example of such matrix of practical use is to show concisely the heritabilities for each trait in the diagonal and the genetic correlations in the off-diagonal.

## Cauchy-Schwarz

Algorithmic advantages can be obtained from the positive-definiteness of $\mathbf{\mathring{R}}_k$. The Cauchy–Schwarz theorem (Cauchy, 1821; Schwarz, 1888) states that:

$\mathring{R}_{kij} \leqslant \sqrt{\mathring{R}_{kii}\mathring{R}_{kjj}}$

Hence, the bounds for the off-diagonal elements can be known from the diagonal elements, which, by their turn, are estimated in a simpler, univariate model.

The Cauchy-Schwarz inequality imposes limits on the off-diagonal values of the matrix that contains the genetic covariances (or bivariate heritabilities).

## Parameter estimation

Under the multivariate normal assumption, the parameters can be estimated maximising the following loglikelihood function:

$\mathcal{L}\left(\mathbf{R}_k,\boldsymbol{\beta}\Big|\mathbf{y},\mathbf{\tilde{X}}\right) = -\frac{1}{2} \left(N \ln 2\pi + \ln \left|\boldsymbol{\Omega}\right| + \left(\mathbf{y}-\mathbf{\tilde{X}}\boldsymbol{\beta}\right)'\boldsymbol{\Omega}\left(\mathbf{y}-\mathbf{\tilde{X}}\boldsymbol{\beta}\right)\right)$

where $N=S \cdot T$ is the number of observations on the stacked vector $\mathbf{y}$. Unbiased estimates for $\boldsymbol{\beta}$, although inefficient and inappropriate for hypothesis testing, can be obtained with ordinary least squares (OLS).

## Parametric inference

Hypothesis testing can be performed with the likelihood ratio test (LRT), i.e., the test statistic is produced by subtracting from the loglikelihood of the model in which all the parameters are free to vary ($\mathcal{L}_1$), the loglikelihood of a model in which the parameters being tested are constrained to zero, the null model ($\mathcal{L}_0$). The statistic is given by $\lambda = 2\left(\mathcal{L}_1-\mathcal{L}_0\right)$ (Wilks, 1938), which here is asymptotically distributed as a 50:50 mixture of a $\chi^2_0$ and $\chi^2_{\text{df}}$ distributions, where df is the number of parameters being tested and free to vary in the unconstrained model (Self and Liang, 1987). From this distribution the p-values can be obtained.

## References

The photograph at the top (elephants) is by Anja Osenberg and was generously released into public domain.

# Understanding the kinship matrix

Coefficients to assess the genetic resemblance between individuals were presented in the last post. Among these, the coefficient of kinship, $\phi$, is probably the most interesting. It gives a probabilistic estimate that a random gene from a given subject $i$ is identical by descent (ibd) to a gene in the same locus from a subject $j$. For $N$ subjects, these probabilities can be assembled in a $N \times N$ matrix termed kinship matrix, usually represented as $\mathbf{\Phi}$, that has elements $\phi_{ij}$, and that can be used to model the covariance between individuals in quantitative genetics.

Consider the pedigree in the figure below, consisted of 14 subjects:

The corresponding kinship matrix, already multiplied by two to indicate expected covariances between subjects (i.e., $2\cdot\mathbf{\Phi}$), is:

Note that the diagonal elements can have values above unity, given the consanguineous mating in the family (between s09 and s12, indicated by the double line in the pedigree diagram).

In the next post, details on how the kinship matrix can be used investigate heritabilities, genetic correlations, and to perform association studies will be presented.

# Genetic resemblance between relatives

How similar?

The degree of relationship between two related individuals can be estimated by the probability that a gene in one subject is identical by descent to the corresponding gene (i.e., in the same locus) in the other. Two genes are said to be identical by descent (ibd) if both are copies of the same ancestral gene. Genes that are not ibd may still be identical through separate mutations, and be therefore identical by state (ibs), though these will not be considered in what follows.

The coefficients below were introduced by Jacquard in 1970, in a book originally published in French, and translated to English in 1974. A similar content appeared in an article by the same author in the journal Biometrics in 1972 (see the references at the end).

## Coefficients of identity

Consider a particular autosomal gene $G$. Each individual has two copies, one from paternal, another from maternal origin; these can be indicated as $G_i^P$ and $G_i^M$ for individual $i$. There are 15 exactly distinct ways (states) in which the $G$ can be identical or not identical between two individuals, as shown in the figure below.

To each of these states $S_{1, \ldots , 15}$, a respective probability $\delta_{1, \ldots , 15}$ can be assigned; these are called coefficients of identity by descent. These probabilities can be calculated at every generation following very elementary rules. For most problems, however, the distinction between paternal and maternal origin of a gene is irrelevant, and some of the above states are equivalent to others. If these are condensed, we can retain 9 distinct ways, shown in the figure below:

As before, to each of these states $\Sigma_{1, \ldots , 9}$, a respective probability $\Delta_{1, \ldots , 9}$ can be assigned; these are called condensed coefficients of identity by descent, and relate to the former as:

• $\Delta_1 = \delta_1$
• $\Delta_2 = \delta_6$
• $\Delta_3 = \delta_2 + \delta_3$
• $\Delta_4 = \delta_7$
• $\Delta_5 = \delta_4 + \delta_5$
• $\Delta_6 = \delta_8$
• $\Delta_7 = \delta_9 + \delta_{12}$
• $\Delta_8 = \delta_{10} + \delta_{11} + \delta_{13} + \delta_{14}$
• $\Delta_9 = \delta_{15}$

A similar method was proposed by Cotterman (1940), in his highly influential but only much later published doctoral thesis. The $\Delta_9$, $\Delta_8$ and $\Delta_7$ correspond to his coefficients $k_0$, $k_1$ and $k_2$.

## Coefficient of kinship

The above refer to probabilities of finding particular genes as identical among subjects. However, a different coefficient can be defined for random genes: the probability that a random gene from subject $i$ is identical with a gene at the same locus from subject $j$ is the coefficient of kinship, and can be represented as $\phi_{ij}$:

• $\phi_{ij} = \Delta_1 + \frac{1}{2}(\Delta_3 + \Delta_5 + \Delta_7) + \frac{1}{4}\Delta_8$

If $i$ and $j$ are in fact the same individual, then $\phi_{ii}$ is the kinship of a subject with himself. Two genes taken from the same individual can either be the same gene (probability $\frac{1}{2}$ of being the same) or be the genes inherited from father and mother, in which case the probability is given by the coefficient of kinship between the parents. In other words, $\phi_{ii} = \frac{1}{2} + \frac{1}{2}\phi_{\text{FM}}$. If both parents are unrelated, $\phi_{\text{FM}}=0$, such that the kinship of a subject with himself is $\phi_{ii} = \frac{1}{2}$.

The value of $\phi_{ij}$ can be determined from the number of generations up to a common ancestor $k$. A random gene from individual $i$ can be identical to a random gene from individual $j$ in the same locus if both comes from the common ancestor $k$, an event that can happen if either (1) both are copies of the gene in $k$, or (2) if they are copies of different genes in $k$, but $k$ is inbred; this has probability $\frac{1}{2}f_k$ (see below about the coefficient of inbreeding, $f$). Thus, if there are $m$ generations between $i$ and $k$, and $n$ generations between $j$ and $k$, the coefficient of kinship can be computed as $\phi_{ij} = \left(\frac{1}{2}\right)^{m+n+1}(1+f_k)$. If $i$ and $j$ can have more than one common ancestor, then there are more than one line of descent possible, and the kinship is determined by integrating over all such possible $K$ common ancestors:

• $\phi_{ij} = \sum_{k=1}^K \left(\frac{1}{2}\right)^{m_k+n_k+1}(1+f_k)$

For a set of subjects, the pairwise coefficients of kinship $\phi_{ij}$ can be arranged in a square matrix $\boldsymbol{\Phi}$, and used to model the covariance between subjects as $2\cdot\boldsymbol{\Phi}$ (see here).

## Coefficient of inbreeding

The coefficient of inbreeding $f$ of a given subject $i$ is the coefficient of kinship between their parents. While the above coefficients provide information about pairs of individuals, the coefficient of inbreeding gives information about a particular subject. Yet, $f_i$ can be computed from the coefficients of identity:

• $f_{i} = \Delta_1 + \Delta_2 + \Delta_3 + \Delta_4$
• $f_{j} = \Delta_1 + \Delta_2 + \Delta_5 + \Delta_6$

Note that all these coefficients are based on probabilities, but it is now possible to identify the actual presence of a particular gene using marker data. Also note that while the illustrations above suggest application to livestock, the same applies to studies of human populations.

## Some particular cases

The computation of the above coefficients can be done using algorithms, and are done automatically by software that allow analyses of pedigree data, such as solar. Some common particular cases are shown below:

Relationship $\Delta_1$ $\Delta_2$ $\Delta_3$ $\Delta_4$ $\Delta_5$ $\Delta_6$ $\Delta_7$ $\Delta_8$ $\Delta_9$ $\phi_{ij}$
Self $0$ $0$ $0$ $0$ $0$ $0$ $1$ $0$ $0$ $\frac{1}{2}$
Parent-offspring $0$ $0$ $0$ $0$ $0$ $0$ $0$ $1$ $0$ $\frac{1}{4}$
Half sibs $0$ $0$ $0$ $0$ $0$ $0$ $0$ $\frac{1}{2}$ $\frac{1}{2}$ $\frac{1}{8}$
Full sibs/dizygotic twins $0$ $0$ $0$ $0$ $0$ $0$ $\frac{1}{4}$ $\frac{1}{2}$ $\frac{1}{4}$ $\frac{1}{4}$
Monozygotic twins $0$ $0$ $0$ $0$ $0$ $0$ $1$ $0$ $0$ $\frac{1}{2}$
First cousins $0$ $0$ $0$ $0$ $0$ $0$ $0$ $\frac{1}{4}$ $\frac{3}{4}$ $\frac{1}{16}$
Double first cousins $0$ $0$ $0$ $0$ $0$ $0$ $\frac{1}{16}$ $\frac{6}{16}$ $\frac{9}{16}$ $\frac{1}{8}$
Second cousins $0$ $0$ $0$ $0$ $0$ $0$ $0$ $\frac{1}{16}$ $\frac{15}{16}$ $\frac{1}{64}$
Uncle-nephew $0$ $0$ $0$ $0$ $0$ $0$ $0$ $\frac{1}{2}$ $\frac{1}{2}$ $\frac{1}{8}$
Offspring of sib-matings $\frac{1}{16}$ $\frac{1}{32}$ $\frac{1}{8}$ $\frac{1}{32}$ $\frac{1}{8}$ $\frac{1}{32}$ $\frac{7}{32}$ $\frac{5}{16}$ $\frac{1}{16}$ $\frac{3}{8}$

## References

• Cotterman C. A calculus for statistico-genetics. 1940. PhD Thesis. Ohio State University.
• Jacquard, A. Structures génétiques des populations. Masson, Paris, France, 1970, later translated and republished as Jacquard, A. The genetic structure of populations. Springer, Heidelberg, 1974.
• Jacquard A. Genetic information given by a relative. Biometrics. 1972;28(4):1101-1114.

The photograph at the top (sheep) is in public domain.

# The NIFTI-2 file format

In a previous post, the nifti-1 file format was presented. An update of this format has recently been produced by the Data Format Working Group (dfwg). The updated version retains generally the same amount of information as the previous, with the crucial difference that it allows far more datapoints in each dimension, thus permitting that the same overall file structure is used to store, for instance, surface-based scalar data, or large connectivity matrices. Neither of these had originally been intended at the time the analyze or the nifti-1 formats were developed. While packages as FreeSurfer developed their own formats for surface-based scalar data, a more general solution was still pending.

## Compatible, but not as before

Users who participated of the transition from analyze to nifti-1 may remember that the same libraries used to read analyze would read nifti, perhaps with a few minor difficulties, but the bulk of the actual data would be read by most analyze-compliant applications. This was possible because a large part of the relevant information in the header was kept exactly in the same position counted from the beginning of the file. An application could read information at a given byte position and locate it without error, or without finding something else.

This time things are different. While a large degree of compatibility exists, this compatibility helps more the developer than the end user. If before, an application made to read only analyze could read nifti-1, this time an application made to read nifti-1 will not read nifti-2 without a few, even if minor, changes to the application source code. To put in other words, the new version of the format is not bitwise compatible with the previous one. The reasons for this “almost compatibility” will become clear below.

## Changing types

The limitation that became evident with the new uses found for the nifti format refer particularly to the maximum number of points (e.g., voxels) in each dimension. This limitation stems from the field short dim[8], which allows only 2 bytes (16 bits) for each dimension; since only positive values are accepted (short is signed), this imposes a cap: no more than 215-1 = 32767 voxels per dimension. In the nifti-2 format, this was replaced by int64_t dim[8], which guarantees 8 bytes (64 bits) per dimension, and so, a much larger number of points per dimension, that is, 263-1 = 9,223,372,036,854,775,807.

This change alone already renders the nifti-2 not bitwise compatible with the nifti-1. Yet, other changes were made, some as a consequence of the modifications to dim[8], such as slice_start and slice_end, both too promoted from short to int64_t. Other changes were made so as to improve the general ability to store data with higher precision. A complete table listing the modifications of the field types is below:

NIFTI-1 NIFTI-2
short dim[8] int64_t dim[8]
float intent_p1 double intent_p1
float intent_p2 double intent_p2
float intent_p3 double intent_p3
float pixdim[8] double pixdim[8]
float vox_offset int64_t vox_offset
float scl_slope double scl_slope
float scl_inter double scl_inter
float cal_max double cal_max
float cal_min double cal_min
float slice_duration double slice_duration
float toffset double toffset
short slice_start int64_t slice_start
short slice_end int64_t slice_end
char slice_code int32_t slice_code
char xyzt_units int32_t xyzt_units
short intent_code int32_t intent_code
short qform_code int32_t qform_code
short sform_code int32_t sform_code
float quatern_b double quatern_b
float quatern_c double quatern_c
float quatern_d double quatern_d
float srow_x double srow_x
float srow_y double srow_y
float srow_z double srow_z
char magic[4] char magic[8]

## Fields removed, fields reordered, fields added

Seven fields that only existed in the nifti-1 for compatibility with the old analyze format were removed entirely. These are:

• char data_type[10]
• char db_name[18]
• int extents
• short session_error
• char regular
• int glmin
• int glmax

Another change is that the fields were reordered, which is an improvement over the nifti-1: the magic string, for instance, is now at the beginning of the file, which helps testing what kind of file it is. All constraints that were imposed on the nifti-1 to allow compatibility with the analyze were finally dropped. At the far end of the header, a field with 15 bytes was included for padding the header to a total size of 540, and to ensure 16-byte alignment after the 4 final bytes that indicate extra information are included.

## Overview of the new header structure

With the modifications above, the overall structure of the he nifti-2 became:

Type Name Offset Size Description
int sizeof_hdr 0B 4B Size of the header. Must be 540 (bytes).
char magic[8] 4B 8B Magic string, defining a valid signature.
int16_t data_type 12B 2B Data type.
int16_t bitpix 14B 2B Number of bits per voxel.
int64_t dim[8] 16B 64B Data array dimensions.
double intent_p1 80B 8B 1st intent parameter.
double intent_p2 88B 8B 2nd intent parameter.
double intent_p3 96B 8B 3rd intent parameter.
double pixdim[8] 104B 64B Grid spacings (unit per dimension).
int64_t vox_offset 168B 8B Offset into a .nii file.
double scl_slope 176B 8B Data scaling, slope.
double scl_inter 184B 8B Data scaling, offset.
double cal_max 192B 8B Maximum display intensity.
double cal_min 200B 8B Minimum display intensity.
double slice_duration 208B 8B Time for one slice.
double toffset 216B 8B Time axis shift.
int64_t slice_start 224B 8B First slice index.
int64_t slice_end 232B 8B Last slice index.
char descrip[80] 240B 80B Any text.
char aux_file[24] 320B 24B Auxiliary filename.
int qform_code 344B 4B Use the quaternion fields.
int sform_code 348B 4B Use of the affine fields.
double quatern_b 352B 8B Quaternion b parameter.
double quatern_c 360B 8B Quaternion c parameter.
double quatern_d 368B 8B Quaternion d parameter.
double qoffset_x 376B 8B Quaternion x shift.
double qoffset_y 384B 8B Quaternion y shift.
double qoffset_z 392B 8B Quaternion z shift.
double srow_x[4] 400B 32B 1st row affine transform.
double srow_y[4] 432B 32B 2nd row affine transform.
double srow_z[4] 464B 32B 3rd row affine transform.
int slice_code 496B 4B Slice timing order.
int xyzt_units 500B 4B Units of pixdim[1..4].
int intent_code 504B 4B nifti intent.
char intent_name[16] 508B 16B Name or meaning of the data.
char dim_info 524B 1B Encoding directions.
char unused_str[15] 525B 15B Unused, to be padded with with zeroes.
Total size 540B

## NIFTI-1 or NIFTI-2?

For the developer writing input/output functions to handle nifti files, a simple check can be used to test the version and the endianness of the file: the first four bytes are read (int sizeof_hdr): if equal to 348, then it is a nifti-1 file; if equal to 540, then it is a nifti-2 file. If equal to neither, then swap the bytes, as if reading in the non-native endianness, and repeat the test; if this time the size of the header is found as 348 or 540, the version is determined, and this also implies that all bytes in the file need to be swapped to match the endianness of the current architecture. If, however, the first four bytes do not contain 348 or 540 in either endianness, then it is not a valid nifti file.

Once the version and the endianness have been determined, if it is a nifti-1 file, jump to byte 344 and check if the content is 'ni1' (or '6E 69 31 00' in hexadecimal), indicating a pair .hdr/.img, or if it is 'n+1' ('6E 2B 31 00'), indicating a single .nii file. If, however, it is a nifti-2 file, just read the next 8 bytes and check if the content is 'n+2' ('6E 2B 32 00') followed by '0D 0A 1A 0A' (hex).

## Storing extra information

Just like the nifti-1, the four bytes after the end of the nifti-2 header are used to indicate extensions and more information. Thus, the actual data begins after the byte 544. See the post on the nifti-1 for details. The cifti-2 file format (used extensively by the Human Connectome Project) is built on top of the nifti-2 format, and uses this extra information.

The official definition of the nifti-2 format is available as a c header file (nifti2.h) here and mirrored here.

# All GLM formulas

It’s so often that we find ourselves in the need to quickly compute a statistic for a certain dataset, but finding the formulas is not always direct. Using a statistical software is helpful, but it often also happens that the reported results are not exactly what one may believe it represents. Moreover, even if using these packages, it is always good to have in mind the meaning of statistics and how they are computed. Here the formulas for the most commonly used statistics with the general linear model (glm) are presented, all in matrix form, that can be easily implemented in Octave or Matlab.

## I — Model

We consider two models, one univariate, another multivariate. The univariate is a particular case of the multivariate, but for univariate problems, it is simpler to use the smaller, particular case.

## Univariate model

The univariate glm can be written as:

$\mathbf{y} = \mathbf{M}\boldsymbol{\psi} + \boldsymbol{\epsilon}$

where $\mathbf{y}$ is the $N \times 1$ vector of observations, $\mathbf{M}$ is the $N \times s$ matrix of explanatory variables, $\boldsymbol{\psi}$ is the $s \times 1$ vector of regression coefficients, and $\boldsymbol{\epsilon}$ is the $N \times 1$ vector of residuals.

The null hypothesis can be stated as $\mathcal{H}_0 : \mathbf{C}'\boldsymbol{\psi} = 0$, where $\mathbf{C}$ is a $s \times s'$ matrix that defines a contrast of regression coefficients, satisfying $\mathsf{rank}(\mathbf{C}) = s'$ and $1 \geqslant s' \geqslant s$.

## Multivariate model

The multivariate glm can be written as:

$\mathbf{Y} = \mathbf{M}\boldsymbol{\Psi} + \boldsymbol{\epsilon}$

where $\mathbf{Y}$ is the $N \times q$ vector of observations, $\mathbf{M}$ is the $N \times s$ matrix of explanatory variables, $\boldsymbol{\Psi}$ is the $s \times q$ vector of regression coefficients, and $\boldsymbol{\epsilon}$ is the $N \times q$ vector of residuals.

The null hypothesis can be stated as $\mathcal{H}_0 : \mathbf{C}'\boldsymbol{\Psi}\mathbf{D} = 0$, where $\mathbf{C}$ is defined as above, and $\mathbf{D}$ is a $q \times q'$ matrix that defines a contrast of observed variables, satisfying $\mathsf{rank}(\mathbf{D}) = q'$ and $1 \geqslant q' \geqslant q$.

## II — Estimation of parameters

In the model, the unknowns of interest are the values arranged in $\boldsymbol{\Psi}$. These can be estimated as:

$\boldsymbol{\hat{\Psi}} = (\mathbf{M}'\mathbf{M})^{+}(\mathbf{M}'\mathbf{Y})$

where the $^{+}$ represents a pseudo-inverse. The residuals can be computed as:

$\boldsymbol{\hat{\epsilon}} = \mathbf{Y} - \mathbf{M}\boldsymbol{\hat{\Psi}}$

The above also applies to the univariate case ($\mathbf{y}$ is a particular case of $\mathbf{Y}$, and $\boldsymbol{\psi}$ of $\boldsymbol{\Psi}$).

## Coefficient of determination, R2

The following is the fraction of the variance explained by the part of the model determined by the contrast. It applies to mean-centered data and design, i.e., $\tilde{\mathbf{y}}=\mathbf{y}-\bar{y}$ and $\tilde{\mathbf{M}}=\mathbf{M}-\bar{\mathbf{m}}$.

$R^2 = \dfrac{\boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\tilde{\mathbf{M}}'\tilde{\mathbf{M}})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\psi}}}{\tilde{\mathbf{y}}'\tilde{\mathbf{y}}}$

Note that the portion of the variance explained by nuisance variables (if any) remains in the denominator. To have these taken into account, consider the squared partial correlation coefficient, or Pillai’s trace with univariate data, both described further down.

## Pearson’s correlation coefficient

When $\mathsf{rank}\left(\mathbf{C}\right) = 1$, the multiple correlation coefficient can be computed from the $R^2$ statistic as:

$R = \mathsf{sign}\left(\mathbf{C}'\boldsymbol{\hat{\psi}}\right)\sqrt{R^2}$

This value should not be confused, even in the presence of nuisance, with the partial correlation coefficient (see below).

## Student’s t statistic

If $\mathsf{rank}\left(\mathbf{C}\right) = 1$, the Student’s $t$ statistic can be computed as:

$t = \boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-\frac{1}{2}} \left/ \sqrt{\dfrac{\boldsymbol{\hat{\epsilon}}'\boldsymbol{\hat{\epsilon}}}{N-\mathsf{rank}\left(\mathbf{M}\right)}} \right.$

## F statistic

The $F$ statistic can be computed as:

$F = \left.\dfrac{\boldsymbol{\hat{\psi}}'\mathbf{C} \left( \mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\psi}}}{\mathsf{rank}\left(\mathbf{C} \right)} \right/ \dfrac{\boldsymbol{\hat{\epsilon}}'\boldsymbol{\hat{\epsilon}}}{N-\mathsf{rank}\left(\mathbf{M}\right)}$

## Aspin—Welch v

If homoscedastic variances cannot be assumed, and $\mathsf{rank}\left(\mathbf{C}\right) = 1$, this is equivalent to the Behrens—Fisher problem, and the Aspin—Welch’s $t$ statistic can be computed as:

$v = \boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{W}\mathbf{M})^{-1}\mathbf{C} \right)^{-\frac{1}{2}}$

where $\mathbf{W}$ is a diagonal matrix that has elements:

$W_{nn}=\dfrac{\sum_{n' \in g_{n}}R_{n'n'}}{\boldsymbol{\hat{\epsilon}}_{g_{n}}'\boldsymbol{\hat{\epsilon}}_{g_{n}}}$

and where $R_{n'n'}$ are the $n'$ diagonal elements of the residual forming matrix $\mathbf{R} = \mathbf{I}-\mathbf{M}\mathbf{M}^{+}$, and $g_{n}$ is the variance group to which the $n$-th observation belongs.

## Generalised G statistic

If variances cannot be assumed to be the same across all observations, a generalisation of the $F$ statistic can be computed as:

$G = \dfrac{\boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{W}\mathbf{M})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\psi}}}{\Lambda \cdot \mathsf{rank}\left(\mathbf{C}\right)}$

where $\mathbf{W}$ is defined as above, and the remaining denominator term, $\Lambda$, is given by:

$\Lambda = 1+\frac{2(s-1)}{s(s+2)}\sum_{g} \frac{1}{\sum_{n \in g}R_{nn}} \left(1-\frac{\sum_{n \in g}W_{nn}}{\mathsf{trace}\left(\mathbf{W}\right)}\right)^2$

There is another post on the G-statistic (here).

## Partial correlation coefficient

When $\mathsf{rank}\left(\mathbf{C}\right) = 1$, the partial correlation can be computed from the Student’s $t$ statistic as:

$r = \mathsf{sign}\left(t\right)\sqrt{\dfrac{t^2}{N-\mathsf{rank}\left(\mathbf{M}\right)+t^2}}$

The square of the partial correlation corresponds to Pillai’s trace applied to an univariate model, and it can also be computed from the $F$-statistic as:

$r^2 = \dfrac{F}{\frac{N-\mathsf{rank}\left(\mathbf{M}\right)}{\mathsf{rank}\left(\mathbf{C}\right)}+F}$

Likewise, if $r$ is known, the formula can be solved for $t$:

$t = \dfrac{r\sqrt{N-\mathsf{rank}\left(\mathbf{M}\right)}}{\sqrt{1-r^2}}$

or for $F$:

$F = \dfrac{r^2}{1-r^2}\times\dfrac{N-\mathsf{rank}\left(\mathbf{M}\right)}{\mathsf{rank}\left(\mathbf{C}\right)}$

The partial correlation can also be computed at once for all variables vs. all other variables as follows. Let $\mathbf{A} = \left[\mathbf{y}\; \mathbf{M}\right]$, and $\mathsf{r}\left(\mathbf{A}\right)$ be the inverse of the correlation matrix of the columns of $\mathbf{A}$, and $\mathsf{d}\left(\cdot\right)$ the diagonal operator, that returns a column vector with the diagonal entries of a square matrix. Then the matrix with the partial correlations is:

$\mathbf{r} = -\mathsf{r}\left(\mathbf{A}\right) \odot \left(\mathsf{d}\left(\mathsf{r}\left(\mathbf{A}\right)\right)\mathsf{d}\left(\mathsf{r}\left(\mathbf{A}\right)\right)'\right)^{-\frac{1}{2}}$

where $\odot$ is the Hadamard product, and the power “$-\frac{1}{2}$” is taken elementwise (i.e., not matrix power).

## IV – Multivariate statistics

For the multivariate statistics, define generically $\mathbf{E} = (\boldsymbol{\hat{\epsilon}}\mathbf{D})'(\boldsymbol{\hat{\epsilon}}\mathbf{D})$ as the sums of the products of the residuals, and $\mathbf{H}=(\mathbf{C}'\boldsymbol{\hat{\Psi}}\mathbf{D})' \left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-1} (\mathbf{C}'\boldsymbol{\hat{\Psi}}\mathbf{D})$ as the sums of products of the hypothesis. In fact, the original model can be modified as $\tilde{\mathbf{Y}} = \mathbf{M}\tilde{\boldsymbol{\Psi}} + \tilde{\boldsymbol{\epsilon}}$, where $\tilde{\mathbf{Y}}=\mathbf{Y}\mathbf{D}$, $\tilde{\boldsymbol{\Psi}} = \boldsymbol{\Psi}\mathbf{D}$ and $\tilde{\boldsymbol{\epsilon}}=\boldsymbol{\epsilon}\mathbf{D}$.

If $\mathsf{rank}\left(\mathbf{D}\right)=1$, this is an univariate model, otherwise it remains multivariate, although $\mathbf{D}$ can be omitted from the formulas. From now on this simplification is adopted, so that $\mathbf{E} = \boldsymbol{\hat{\epsilon}}'\boldsymbol{\hat{\epsilon}}$ and $\mathbf{H}=\boldsymbol{\hat{\Psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\Psi}}$.

## Hotelling T2

If $\mathsf{rank}\left(\mathbf{C}\right) = 1$, the Hotelling’s $T^2$ statistic can be computed as:

$T^2 = \mathbf{C}'\boldsymbol{\hat{\Psi}}\boldsymbol{\Sigma}^{-\frac{1}{2}}\left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-1}\boldsymbol{\Sigma}^{-\frac{1}{2}} \boldsymbol{\hat{\Psi}}'\mathbf{C}$

where $\boldsymbol{\Sigma} = \mathbf{E}/\left(N-\mathsf{rank}\left(\mathbf{M}\right)\right)$

## Multivariate alternatives to the F statistic

Classical manova/mancova statistics can be based in the ratio between the sums of products of the hypothesis and the sums of products of the residuals, or the ratio between the sums of products of the hypothesis and the total sums of products. In other words, define:

$\begin{array}{ccl} \mathbf{A} &=& \mathbf{H}\mathbf{E}^{-1} = \mathbf{E}^{-\frac{1}{2}} \boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\psi}}\mathbf{E}^{-\frac{1}{2}}\\ \mathbf{B} &=& \mathbf{H}\left(\mathbf{E}+\mathbf{H}\right)^{-1} \end{array}$

Let $\lambda_i$ be the eigenvalues of $\mathbf{A}$, and $\theta_i$ the eigenvalues of $\mathbf{B}$. Then:

• Wilks’ $\Lambda = \prod_{i} \dfrac{1}{1+\lambda_i} = \dfrac{|\mathbf{E}|}{|\mathbf{E}+\mathbf{H}|}$.
• Lawley–Hotelling’s trace: $\sum_i \lambda_i = \mathsf{trace}\left(\mathbf{A}\right)$.
• Pillai’s trace: $\sum_i \dfrac{\lambda_i}{1+\lambda_i} = \sum_i \theta_i = \mathsf{trace}\left(\mathbf{B}\right)$.
• Roy’s largest root (ii): $\lambda_1 = \mathsf{max}_i\left(\lambda_i\right) = \dfrac{\theta_1}{1-\theta_1}$ (analogous to $F$).
• Roy’s largest root (iii): $\theta_1 = \mathsf{max}_i\left(\theta_i\right) = \dfrac{\lambda_1}{1+\lambda_1}$ (analogous to $R^2$).

When $\mathsf{rank}\left(\mathbf{C}\right) = 1$, or when $\mathbf{Y}$ is univariate, or both, Lawley–Hotelling’s trace is equal to Roy’s (ii) largest root, Pillai’s trace is equal to Roy’s (iii) largest root, and Wilks’ $\Lambda$ added to Pillai’s trace equals to unity. The value $\rho_i=\sqrt{\theta_i}$ is the $i$-th canonical correlation.

# Fast surface smoothing on a common grid

Smoothing scalar data on the surface of a high resolution sphere can be a slow process. If the filter is not truncated, the distances between all the vertices or barycentres of faces need to be calculated, in a very time consuming process. If the filter is truncated, the process can be faster, but with resolutions typically used in brain imaging, it can still be very slow, a problem that is amplified if data from many subjects are analysed.

However, if the data for each subject have already been interpolated to a common grid, such as an icosahedron recursively subdivided multiple times (details here), then the distances do not need to be calculated repeatedly for each of them. Doing so just once suffices. Furthermore, the implementation of the filter itself can be made in such a way that the smoothing process can be performed as a single matrix multiplication.

Consider the smoothing defined in Lombardi, (2002), which we used in Winkler et al., (2012):

$\tilde{Q}_n = \dfrac{\sum_j Q_j K\left(g\left(\mathbf{x}_n,\mathbf{x}_j\right)\right)}{\sum_j K\left(g\left(\mathbf{x}_n,\mathbf{x}_j\right)\right)}$

where $\tilde{Q}_n$ is the smoothed quantity at the vertex or face $n$, $Q_j$ is the quantity assigned to the $j$-th vertex or face of a sphere containining $J$ vertices or faces, $g\left(\mathbf{x}_n,\mathbf{x}_j\right)$ is the geodesic distance between vertices or faces with coordinates $\mathbf{x}_n$ and $\mathbf{x}_j$, and $K(g)$ is the Gaussian filter, defined as a function of the geodesic distance between points.

The above formula requires that all distances between the current vertex or face $n$ and the other points $j$ are calculated, and that this is repeated for each $n$, in a time consuming process that needs to be repeated over again for every additional subject. If, however, the distances $g$ are known and organised into a distance matrix $\mathbf{G}$, the filter can take the form of a matrix of the same size, $\mathbf{K}$, with the values at each row scaled so as to add up to unity, and the same smoothing can proceed as a simple matrix multiplication:

$\mathbf{\tilde{Q}} = \mathbf{K}\mathbf{Q}$

If the grid is the same for all subjects, which is the typical case when comparisons across subjects are performed, $\mathbf{K}$ can be calculated just once, saved, and reused for each subject.

It should be noted, however, that although running faster, the method requires much more memory. For a filter of full-width at half-maximum (FWHM) $f$, truncated at a distance $t \cdot f$ from the filter center, in a sphere of radius $r$, the number of non-zero elements in $\mathbf{K}$ is approximately:

$\text{nnz} \approx \dfrac{J^2}{2} \left(1-\cos\left(t \cdot \dfrac{f}{r}\right)\right)$

whereas the total number of elements is $J^2$. Even using sparse matrices, this may require a large amount of memory space. For practical purposes, a filter with width $f$ = 20 mm can be truncated at twice the width ($t$ = 2), for application in a sphere with 100 mm made by 7 subdivisions of an icosahedron, still comfortably in a computer with 16GB of RAM. Wider filters may require more memory to run.

The script smoothdpx, part of the areal interpolation tools, available here, can be used to do both things, that is, smooth the data for any given subject, and also save the filter so that it can be reused with other subjects. To apply a previously saved filter, the rpncalc can be used. These commands require Octave or MATLAB, and if Octave is available, they can be executed directly from the command line.

## Figures

The figures above represent facewise data on the surface of a sphere of 100 mm radius, made by recursive subdivision of a regular icosahedron 4 times, constructed with the platonic command (details here), shown without smoothing, and smoothed with filters with FWHM = 7, 14, 21, 28 and 35 mm.

# The G-statistic

## Preliminaries

Consider the common analysis of a neuroimaging experiment. At each voxel, vertex, face or edge (or any other imaging unit), we have a linear model expressed as:

$\mathbf{Y} = \mathbf{M} \boldsymbol{\psi} + \boldsymbol{\epsilon}$

where $\mathbf{Y}$ contains the experimental data, $\mathbf{M}$ contains the regressors, $\boldsymbol{\psi}$ the regression coefficients, which are to be estimated, and $\boldsymbol{\epsilon}$ the residuals. For a linear null hypothesis $\mathcal{H}_0 : \mathbf{C}'\boldsymbol{\psi}=\mathbf{0}$, where $\mathbf{C}$ is a contrast. If $\mathsf{rank}\left(\mathbf{C}\right) = 1$, the Student’s t statistic can be calculated as:

$t = \boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-\frac{1}{2}} \left/ \sqrt{\dfrac{\boldsymbol{\hat{\epsilon}}'\boldsymbol{\hat{\epsilon}}}{N-\mathsf{rank}\left(\mathbf{M}\right)}} \right.$

where the hat on $\boldsymbol{\hat{\psi}}$ and $\boldsymbol{\hat{\epsilon}}$ indicate that these are quantities estimated from the sample. If $\mathsf{rank}\left(\mathbf{C}\right) \geqslant 1$, the F statistic can be obtained as:

$F = \left.\dfrac{\boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{M})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\psi}}}{\mathsf{rank}\left(\mathbf{C}\right)} \right/ \dfrac{\boldsymbol{\hat{\epsilon}}'\boldsymbol{\hat{\epsilon}}}{N-\mathsf{rank}\left(\mathbf{M}\right)}$

When $\mathsf{rank}\left(\mathbf{C}\right) = 1$, $F = t^2$. For either of these statistics, we can assess their significance by repeating the same fit after permuting $\mathbf{Y}$ or $\mathbf{M}$ (i.e., a permutation test), or by referring to the formula for the distribution of the corresponding statistic, which is available in most statistical software packages (i.e., a parametric test).

Permutation tests don’t depend on the same assumptions on which parametric tests are based. As some of these assumptions can be quite stringent in practice, permutation methods arguably should be preferred as a general framework for the statistical analysis of imaging data. At each permutation, a new statistic is computed and used to build its empirical distribution from which the p-values are obtained. In practice it’s not necessary to build the full distribution, and it suffices to increment a counter at each permutation. At the end, the counter is divided by the number of permutations to produce a p-value.

An example of a permutation distribution

Using permutation tests, correction for multiple testing using the family-wise error rate (fwer) is trivial: rather than build the permutation distribution at each voxel, a single distribution of the global maximum of the statistic across the image is constructed. Each permutation yields one maximum, that is used to build the distribution. Any dependence between the tests is implicitly captured, with not need to model it explicitly, nor to introduce even more assumptions, a problem that hinders methods such as the random field theory.

## Exchangeability blocks

Permutation is allowed if it doesn’t affect the joint distribution of the error terms, i.e., if the errors are exchangeable. Some types of experiments may involve repeated measurements or other kinds of dependency, such that exchangeability cannot be guaranteed between all observations. However, various cases of structured dependency can still be accommodated if sets (blocks) of observations are shuffled as a whole, or if shuffling happens only within set (block). It is not necessary to know or to model the exact dependence structure between observations, which is captured implicitly as long as the blocks are defined correctly.

Permutation within block.

Permutation of blocks as a whole.

The two figures above are of designs constructed using the fsl software package. In fsl, within-block permutation is available in randomise with the option -e, used to supply a file with the definition of blocks. For whole-block permutation, in addition to the option -e, the option --permuteBlocks needs to be supplied.

## The G-statistic

The presence of exchangeability blocks solves a problem, but creates another. Having blocks implies that observations may not be pooled together to produce a non-linear parameter estimate such as the variance. In other words: the mere presence of exchangeability blocks, either for shuffling within or as a whole, implies that the variances may not be the same across all observations, and a single estimate of this variance is likely to be inaccurate whenever the variances truly differ, or if the groups don’t have the same size. This also means that the F or t statistics may not behave as expected.

The solution is to use the block definitions and the permutation strategy is to define groups of observations that are known or assumed to have identical variances, and pool only the observations within group for variance estimation, i.e., to define variance groups (vgs).

The F-statistic, however, doesn’t allow such multiple groups of variances, and we need to resort to another statistic. In Winkler et al. (2014) we propose:

$G = \dfrac{\boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{W}\mathbf{M})^{-1}\mathbf{C} \right)^{-1} \mathbf{C}'\boldsymbol{\hat{\psi}}}{\Lambda \cdot s}$

where $\mathbf{W}$ is a diagonal matrix that has elements:

$W_{nn}=\dfrac{\sum_{n' \in g_{n}}R_{n'n'}}{\boldsymbol{\hat{\epsilon}}_{g_{n}}'\boldsymbol{\hat{\epsilon}}_{g_{n}}}$

and where $R_{n'n'}$ are the $n'$ diagonal elements of the residual forming matrix, and $g_{n}$ is the variance group to which the $n$-th observation belongs. The remaining denominator term, $\Lambda$, is given by (Welch, 1951):

$\Lambda = 1+\frac{2(s-1)}{s(s+2)}\sum_{g} \frac{1}{\sum_{n \in g}R_{nn}} \left(1-\frac{\sum_{n \in g}W_{nn}}{\mathsf{trace}\left(\mathbf{W}\right)}\right)^2$

where $s=\mathsf{rank}\left(\mathbf{C}\right)$. The matrix $\mathbf{W}$ can be seen as a weighting matrix, the square root of which normalises the model such that the errors have then unit variance and can be ignored. It can also be seen as being itself a variance estimator. In fact, it is the very same variance estimator proposed by Horn et al (1975).

The W matrix used with the G statistic. It is constructed from the estimated variances of the error terms.

The matrix $\mathbf{W}$ has a crucial role in making the statistic pivotal in the presence of heteroscedasticity. Pivotality means that the statistic has a sampling distribution that is not dependent on any unknown parameter. For imaging experiments, it’s important that the statistic has this property, otherwise correction for multiple testing that controls fwer will be inaccurate, or possibly invalid altogether.

When $\mathsf{rank}\left(\mathbf{C}\right)=1$, the t-equivalent to the G-statistic is $v = \boldsymbol{\hat{\psi}}'\mathbf{C} \left(\mathbf{C}'(\mathbf{M}'\mathbf{W}\mathbf{M})^{-1}\mathbf{C} \right)^{-\frac{1}{2}}$, which is the well known Aspin-Welch $v$-statistic for the Behrens-Fisher problem. The relationship between $v$ and G is the same as between t and F, i.e., when the rank of the contrast equals to one, the latter is simply the square of the former. The G statistic is a generalization of all these, and more, as we show in the paper, and summarise in the table below:

$\mathsf{rank}\left(\mathbf{C}\right) = 1$ $\mathsf{rank}\left(\mathbf{C}\right) > 1$
Homoscedastic errors, unrestricted exchangeability Square of Student’s t F-ratio
Homoscedastic within vg, restricted exchangeability Square of Aspin-Welch $v$ Welch’s $v^2$

In the absence of variance groups (i.e., all observations belong to the same vg), G and $v$ are equivalent to F and t respectively.

Although not typically necessary if permutation methods are to be preferred, approximate parametric p-values for the G-statistic can be computed from an F-distribution with $\nu_1=s$ and $\nu_2=2(s-1)/3/(\Lambda-1)$.

While the error rates are controlled adequately (a feature of permutation tests in general), the G-statistic offers excellent power when compared to the F-statistic, even when the assumptions of the latter are perfectly met. Moreover, by preserving pivotality, it is an adequate statistic to control of the error rate in the in the presence of multiple tests.

In this post, the focus is in using G for imaging data, but of course, it can be used for any dataset in which a linear model where variances cannot be assumed to be the same is used, i.e., when heteroscedasticity is or could be present.

Note that the G-statistic has nothing to do with the G-test. It is named as this for being a generalisation over various tests, including the commonly used t and F tests, as shown above.

## Main reference

The core reference and results for the G-statistic have just been published in Neuroimage:

## Other references

The two other references cited, which are useful to understand the variance estimator and the parametric approximation are:

# Splitting the cortical surface into independent regions

FreeSurfer offers excellent visualisation capabilities with tksurfer and FreeView. However, there are endless other possibilities using various different computer graphics software. In previous posts, it was shown here in the blog how to generate cortical and subcortical surfaces that could be imported into these applications, as well as how to generate models with vertexwise and facewise colours, and even a description of common file formats. It was also previously shown how to arbitrarily change the colours of regions for use with FreeSurfer own tools. However, a method to allow rendering cortical regions with different colours in software such as Blender was missing. This is what this post is about.

The idea is simple: splitting the cortical surface into one mesh per parcellation allows each to be imported as an independent object, and so, it becomes straightforward to apply a different colour for each one. To split, the first step is to convert the FreeSurfer annotation file to a data-per-vertex file (*.dpv). This can be done with the command annot2dpv.

./annot2dpv lh.aparc.annot lh.aparc.annot.dpv


Before running, be sure that ${FREESURFER_HOME}/matlab is in the Octave/matlab, path. With the data-per-vertex file ready, do the splitting of the surface with splitsrf: ./splitsrf lh.white lh.aparc.annot.dpv lh.white_roi  This will create several files names as lh.white_roi*. Each corresponds to one piece of the cortex, in *.srf format. To convert to a format that can be read directly into computer graphics software, see the instructions here. The annot2dpv and splitsrf are now included in the package for areal analysis, available here. With the meshes imported, let your imagination and creativity fly. Once produced, labels can be added to the renderings using software such as Inkscape, to produce images as the one above, of the Desikan-Killiany atlas, which illustrates the paper Cortical Thickness or Gray Matter Volume: The Importance of Selecting the Phenotype for Imaging Genetics Studies. Another method is also possible, without the need to split the cortex, but instead, painting the voxels. This can be done with the command replacedpx, also available from the package above. In this case each region index is replaced by its corresponding statistical value (or any other value), then maps are produced with the dpx2map, shown in an earlier blog post, here. This other method, however, requires that the label indices are known for each region, which in FreeSurfer depends on the rgb colors assigned to them. Moreover, the resulting maps don’t have as sharp and beautiful borders as when the surface is split into independent pieces. # Displaying vertexwise and facewise brain maps In a previous post, a method to display FreeSurfer cortical regions in arbitrary colours was presented. Suppose that, instead, you would like to display the results from vertexwise or facewise analyses. For vertexwise, these can be shown using tksurfer or Freeview. The same does not apply, however, to facewise data, which, at the time of this writing, is not available in any neuroimaging software. In this article a tool to generate files with facewise or vertexwise data is provided, along with some simple examples. ## The dpx2map tool The tool to generate the maps is dpx2map (right-click to download, then make it executable). Call it without arguments to get usage information. This tool uses Octave as the backend, and it assumes that it is installed in its usual location (/usr/bin/octave). It is also possible to run it from inside Octave or Matlab using a slight variant, dpx2map.m (in which case, type help dpx2map for usage). In either case, the commands srfread, dpxread and mtlwrite must be available. These are part of the areal package discussed here. And yes, dpx2map is now included in the latest release of the package too. To use dpx2map, you need to specify a surface object that will provide the geometry on which the data colours will be overlaid, and the data itself. The surface should be in FreeSurfer format (*.asc or *.srf), and the data should be in FreeSurfer “curvature” format (*.asc, *.dpv) for vertexwise, or in facewise format (*.dpf). A description of these formats is available here. It is possible to specify the data range that will be used when computing the scaling to make the colours, as well which range will be actually shown. It is also possible to split the scale so that a central part of it is omitted or shown in a colour outside the colourscale. This is useful to show thresholded positive and negative maps. The output is saved either in Stanford Polygon (*.ply) for vertexwise, or in Wavefront Object (*.obj + *.mtl) for facewise data, and can be imported directly in many computer graphics software. All input and output files must be/are in their respective ascii versions, not binary. The command also outputs a image with the colourbar, in Portable Network Graphics format (*.png). ## An example object With a simple geometric shape as this it is much simpler to demonstrate how to generate the maps, than using a complicated object as the brain. The strategy for colouring remains the same. For the next examples, an ellipsoid was created using the platonic command. The command line used was: platonic ellipsoid.obj ico sph 7 '[.25 0 0 0; 0 3 0 0; 0 0 .25 0; 0 0 0 1]'  This ellipsoid has maximum y-coordinate equal to 3, and a maximum x- and z-coordinates equal to 0.25. This file was converted from Wavefront *.obj to FreeSurfer ascii, and scalar fields simply describing the coordinates (x,y,z), were created with: obj2srf ellipsoid.obj > ellipsoid.srf srf2area ellipsoid.srf ellipsoid.dpv dpv gawk '{print$1,$2,$3,$4,$2}' ellipsoid.dpv > ellipsoid-x.dpv
gawk '{print $1,$2,$3,$4,$3}' ellipsoid.dpv > ellipsoid-y.dpv gawk '{print$1,$2,$3,$4,$4}' ellipsoid.dpv > ellipsoid-z.dpv


It is the ellipsoid-y.dpv that is used for the next examples.

## Vertexwise examples

The examples below use the same surface (*.srf) and the same curvature, data-per-vertex file (*.dpv). The only differences are the way as the map is generated and presented, using different colour maps and different scaling. The jet colour map is the same available in Octave and Matlab. The coolhot5 is a custom colour map that will be made available, along with a few others, in another article to be posted soon.

### Example A

In this example, defaults are used. The input files are specified, along with a prefix (exA) to be used to name the output files.

dpx2map ellipsoid-y.dpv ellipsoid.srf exA


### Example B

In this example, the data between values -1.5 and 1.5 is coloured, and the remaining receive the colours of the extreme points (dark blue and dark red).

dpx2map ellipsoid-y.dpv ellipsoid.srf exB jet '[-1.5 1.5]'


### Example C

In this example, the data between -2 and 2 is used to define the colours, with the values below/above receiving the extreme colours. However, the range between -1 and 1 is not shown or used for the colour scaling. This is because the dual option is set as true as well as the coption.

dpx2map ellipsoid-y.dpv ellipsoid.srf exC coolhot5 '[-2 2]' '[-1 1]' true '[.75 .75 .75]' true


### Example D

This example is similar as above, except that the values between -1 and 1, despite not being shown, are used for the scaling of the colours. This is due to the coption being set as true.

dpx2map ellipsoid-y.dpv ellipsoid.srf exD coolhot5 '[-2 2]' '[-1 1]' true '[.75 .75 .75]' false


### Example E

Here the data between -2 and 2 is used for scaling, but only the points between -1 and 1 are shown. This is because the option dual was set as false. The values below -1 or above 1 receive the same colours as these numbers, because the coption was configured as true. Note that because all points will receive some colour, it is not necessary to define the colourgap.

dpx2map ellipsoid-y.dpv ellipsoid.srf exE coolhot5 '[-2 2]' '[-1 1]' false '[]' true


### Example F

This is similar as the previous example, except that the values between -1 and 1 receive a colour off of the colour map. This is because both dual and coption were set as false.

dpx2map ellipsoid-y.dpv ellipsoid.srf exF coolhot5 '[-2 2]' '[-1 1]' false '[.75 .75 .75]' false


## Facewise data

The process to display facewise data is virtually identical. The only two differences are that (1) instead of supplying a *.dpv file, a *.dpf file is given to the script as input, and (2) the output isn’t a *.ply file, but instead a pair of files *.obj + *.mtl. Note that very few software can handle thousands of colours per object in the case of facewise data. Blender is recommended over most commercial products specially for this reason (and of course, it is free, as in freedom).

The dpx2map is available here, and it is also included in the areal package, described here, where all its dependencies are satisfied. You must have Octave (free) or Matlab available to use this tool.

## How to cite

If you use dpx2map for your scientific research, please, remember to mention the brainder.org website in your paper.

## Update: Display in PDF documents

3D models as these, with vertexwise colours, can be shown in interactive PDF documents. Details here.

# Merging multiple surfaces

Say you have a number of meshes in FreeSurfer ascii format (with extension *.asc or *.srf), one brain structure per file. However, for later processing or to import in some computer graphics software, you would like to have these multiple meshes all in a single file. This post provides a small script to accomplish this: mergesrf.

To use it, right click and save the file above, make it executable and, ideally, put it in a place where it can be found (or add its location to the environmental variable ${PATH}. Then run something as: mergesrf file1.srf file2.srf fileN.srf mergedfile.srf  In this example, the output file is saved as mergedfile.srf. Another example is to convert all subcortical structures into just one large object, after aseg2srf as described here. To convert all, just change the current directory to ${SUBJECTS_DIR}//ascii, then run:

mergesrf * aseg_all.srf


A list with the input files and the output at the end is shown below:

The script uses Octave, which can be downloaded freely. The same script, with a small modification, can also run from inside matlab. This other version can be downloaded here: mergesrf.m

## Requirements

In addition to Octave (or matlab), the script also requires functions to read and write surface files, which are available from the areal package (described here and downloadable here).