Three HCP utilities

If you are working with data from the Human Connectome Project (HCP), perhaps these three small Octave/MATLAB utilities may be of some use:

  • hcp2blocks.m: Takes the restricted file with information about kinship and zygosity and produces a multi-level exchangeability blocks file that can be used with PALM for permutation inference. It is fully described here.
  • hcp2solar.m: Takes restricted and unrestricted files to produce a pedigree file that can be used with SOLAR for heritability and genome-wide association analyses.
  • picktraits.m: Takes either restricted or unrestricted files, a list of traits and a list of subject IDs to produce tables with selected traits for the selected subjects. These can be used to, e.g., produce design matrices for subsequent analysis.

These functions need to parse relatively large CSV files, which is somewhat inefficient in MATLAB and Octave. Still, since these commands usually have to be executed only once for a particular analysis, a 1-2 minute wait seems acceptable.

If downloaded directly from the above links, remember also to download the prerequisites: strcsvread.m and strcsvwrite.m. Alternatively, clone the full repository from GitHub. The link is this. Other tools may be added in the future.

A fourth utility

For the HCP-S1200 release (March/2017), zygosity information is provided in the fields ZygositySR (self-reported zygosity) and ZygosityGT (zygosity determined by genetic methods for select subjects). If needed, these two fields can be merged into a new field named simply Zygosity. To do so, use a fourth utility, command mergezyg.

Downsampling (decimating) a brain surface

Downsampled average cortical surfaces at different iterations (n), with the respective number of vertices (V), edges (E) and faces (F).

In the previous post, a method to display brain surfaces interactively in PDF documents was presented. While the method is already much more efficient than it was when it first appeared some years ago, the display of highly resolved meshes can be computationally intensive, and may make even the most enthusiastic readers give up even opening the file.

If the data being shown has low spatial frequency, an alternative way to display, which preserves generally the amount of information, is to decimate the mesh, downsampling it to a lower resolution. Although in theory this can be done in the native (subject-level) geometry through retessellation (i.e., interpolation of coordinates), the interest in downsampling usually is at the group level, in which case the subjects have all been interpolated to a common grid, which in general is a geodesic sphere produced by subdividing recursively an icosahedron (see this earlier post). If, at each iteration, the vertices and faces are added in a certain order (such as in FreeSurfer‘s fsaverage or in the one generated with the platonic command), downsampling is relatively straightforward, whatever is the type of data.

Vertexwise data

For vertexwise data, downsampling can be based on the fact that vertices are added (appended) in a certain order as the icosahedron is constructed:

  • Vertices 1-12 correspond to n = 0, i.e., no subdivision, or ico0.
  • Vertices 13-42 correspond to the vertices that, once added to the ico0, make it ico1 (first iteration of subdivision, n = 1).
  • Vertices 43-162 correspond to the vertices that, once added to ico1, make it ico2 (second iteration, n = 2).
  • Vertices 163-642, likewise, make ico3.
  • Vertices 643-2562 make ico4.
  • Vertices 2563-10242 make ico5.
  • Vertices 10243-40962 make ico6, etc.

Thus, if the data is vertexwise (also known as curvature, such as cortical thickness or curvature indices proper), the above information is sufficient to downsample the data: to reduce down to an ico3, for instance, all what one needs to do is to pick the vertices 1 through 642, ignoring 643 onwards.

Facewise data

Data stored at each face (triangle) generally correspond to areal quantities, that require mass conservation. For both fsaverage and platonic icosahedrons, the faces are added in a particular order such that, at each iteration of the subdivision, a given face index is replaced in situ for four other faces: one can simply collapse (via sum or average) the data of every four faces into a new one.

Surface geometry

If the objective is to decimate the surface geometry, i.e., the mesh itself, as opposed to quantities assigned to vertices or faces, one can use similar steps:

  1. Select the vertices from the first up to the last vertex of the icosahedron in the level needed.
  2. Iteratively downsample the face indices by selecting first those that are formed by three vertices that were appended for the current iteration, then for those that have two vertices appended in the current iteration, then connecting the remaining three vertices to form a new, larger face.

Applications

Using downsampled data is useful not only to display meshes in PDF documents, but also, some analyses may not require a high resolution as the default mesh (ico7), particularly for processes that vary smoothly across the cortex, such as cortical thickness. Using a lower resolution mesh can be just as informative, while operating at a fraction of the computational cost.

A script

A script that does the tasks above using Matlab/Octave is here: icodown.m. It is also available as part of the areal package described here, which also satisfies all its dependencies. Input and output formats are described here.

Interactive 3D brains in PDF documents

A screenshot from Acrobat Reader. The example file is here.

Would it not be helpful to be able to navigate through tri-dimensional, surface-based representations of the brain when reading a paper, without having to download separate datasets, or using external software? Since 2004, with the release of the version 1.6 of the Portable Document Format (PDF), this has been possible. However, the means to generate the file were not easily available until about 2008, when Intel released of a set of libraries and tools. This still did not help much to improve popularity, as in-document rendering of complex 3D models requires a lot of memory and processing, making its use difficult in practice at the time. The fact that Acrobat Reader was a lot bloated did not help much either.

Now, almost eight years later, things have become easier for users who want to open these documents. Newer versions of Acrobat are much lighter, and capabilities of ordinary computers have increased. Yet, it seems the interest on this kind of visualisation have faded. The objective of this post is to show that it is remarkably simple to have interactive 3D objects in PDF documents, which can be used in any document published online, including theses, presentations, and papers: journals as PNAS and Journal of Neuroscience are at the forefront in accepting interactive manuscripts.

Requirements

  • U3D Tools: Make sure you have the IDTFConverter utility, from the U3D tools, available on SourceForge as part of the MathGL library. A direct link to version 1.4.4 is here; an alternative link, of a repackaged version of the same, is here. Compiling instructions for Linux and Mac are in the “readme” file. There are some dependencies that must be satisfied, and are described in the documentation. If you decide not to install the U3D tools, but only compile them, make sure the path of the executable is both in the $PATH and in the $LD_LIBRARY_PATH. This can be done with:
cd /path/to/the/directory/of/IDTFConverter
export PATH=${PATH}:$(pwd)
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$(pwd)
  • The ply2idtf function: Make sure you have the latest version of the areal package, which contains the MATLAB/Octave function ply2idtf.m used below.
  • Certain LaTeX packages: The packages movie15 or media9, that allow embedding the 3D object into the PDF using LaTeX. Either will work. Below it is assumed the older, movie15 package, is used.

Step 1: Generate the PLY maps

Once you have a map of vertexwise cortical data that needs to be shown, follow the instructions from this earlier blog post that explains how to generate Stanford PLY files to display colour-coded vertexwise data. These PLY files will be used below.

Step 2: Convert the PLY to IDTF files

IDTF stands for Intermediate Data Text Format. As the name implies, it is a text, intermediate file, used as a step before the creation of the U3D files, the latter that are embedded into the PDF. Use the function ply2idtf for this:

ply2idtf(...
   {'lh.pial.thickness.avg.ply','LEFT', eye(4);...
    'rh.pial.thickness.avg.ply','RIGHT',eye(4)},...
    'thickness.idtf');

The first argument is a cell array with 3 columns, and as many rows as PLY files being added to the IDTF file. The first column contains the file name, the second the label (or node) that for that file, and the third an affine matrix that maps the coordinates from the PLY file to the world coordinate system of the (to be created) U3D. The second (last) argument to the command is the name of the output file.

Step 3: Convert the IDTF to U3D files

From a terminal window (not MATLAB or Octave), run:

IDTFConverter -input thickness.idtf -output thickness.u3d

Step 4: Configure default views

Here we use the older movie15 LaTeX package, and the same can be accomplished with the newer, media9 package. Various viewing options are configurable, all of which are described in the documentation. These options can be saved in a text file with extension .vws, and later supplied in the LaTeX document. An example is below.

VIEW=Both Hemispheres
  COO=0 -14 0,
  C2C=-0.75 0.20 0.65
  ROO=325
  AAC=30
  ROLL=-0.03
  BGCOLOR=.5 .5 .5
  RENDERMODE=Solid
  LIGHTS=CAD
  PART=LEFT
    VISIBLE=true
  END
  PART=RIGHT
    VISIBLE=true
  END
END
VIEW=Left Hemisphere
  COO=0 -14 0,
  C2C=-1 0 0
  ROO=325
  AAC=30
  ROLL=-0.03
  BGCOLOR=.5 .5 .5
  RENDERMODE=Solid
  LIGHTS=CAD
  PART=LEFT
    VISIBLE=true
  END
  PART=RIGHT
    VISIBLE=false
  END
END
VIEW=Right Hemisphere
  COO=0 -14 0,
  C2C=1 0 0
  ROO=325
  AAC=30
  ROLL=0.03
  BGCOLOR=.5 .5 .5
  RENDERMODE=Solid
  LIGHTS=CAD
  PART=LEFT
    VISIBLE=false
  END
  PART=RIGHT
    VISIBLE=true
  END
END

Step 5: Add the U3D to the LaTeX source

Interactive, 3D viewing is unfortunately not supported by most PDF readers. However, it is supported by the official Adobe Acrobat Reader since version 7.0, including the recent version DC. Thus, it is important to let the users/readers of the document know that they must open the file using a recent version of Acrobat. This can be done in the document itself, using a message placed with the option text of the \includemovie command of the movie15 package. A minimalistic LaTeX source is shown below (it can be downloaded here).

\documentclass{article}

% Relevant package:
\usepackage[3D]{movie15}

% pdfLaTeX and color links setup:
\usepackage{color}
\usepackage[pdftex]{hyperref}
\definecolor{colorlink}{rgb}{0, 0, .6}  % dark blue
\hypersetup{colorlinks=true,citecolor=colorlink,filecolor=colorlink,linkcolor=colorlink,urlcolor=colorlink}

\begin{document}
\title{Interactive 3D brains in PDF documents}
\author{}
\date{}
\maketitle

\begin{figure}[!h]
\begin{center}
\includemovie[
text=\fbox{\parbox[c][9cm][c]{9cm}{\centering {\footnotesize (Use \href{http://get.adobe.com/reader/}{Adobe Acrobat Reader 7.0+} \\to view the interactive content.)}}},
poster,label=Average,3Dviews2=pial.vws]{10cm}{10cm}{thickness.u3d}
\caption{An average 3D brain, showing colour-coded average thickness (for simplicity, colour scale not shown). Click to rotate. Right-click to for a menu with various options. Details at \href{http://brainder.org}{http://brainder.org}.}
\end{center}
\end{figure}

\end{document}

Step 6: Generate the PDF

For LaTeX, use pdfLaTeX as usual:

pdflatex document.tex

What you get

After generating the PDF, the result of this example is shown here (a screenshot is at the top). It is possible to rotate in any direction, zoom, pan, change views to predefined modes, and alternate between orthogonal and perspective projections. It’s also possible to change rendering modes (including transparency), and experiment with various lightning options.

In Acrobat Reader, by right-clicking, a menu with various useful options is presented. A toolbar (as shown in the top image) can also be enabled through the menu.

The same strategy works also with the Beamer class, such that interactive slides can be created and used in talks, and with XeTeX, allowing these with a richer variety of text fonts.

See also

  • Wikipedia has an article on U3D files.
  • Alexandre Gramfort has developed a set of tools that covers quite much the same as above. It’s freely available in Matlab FileExchange.
  • To display molecules interactively (including proteins), the steps are similar. Instructions for Jmol and Pymol are available.
  • Commercial products offering features that build on these resources are also available.

Fast surface smoothing on a common grid

Smoothing scalar data on the surface of a high resolution sphere can be a slow process. If the filter is not truncated, the distances between all the vertices or barycentres of faces need to be calculated, in a very time consuming process. If the filter is truncated, the process can be faster, but with resolutions typically used in brain imaging, it can still be very slow, a problem that is amplified if data from many subjects are analysed.

However, if the data for each subject have already been interpolated to a common grid, such as an icosahedron recursively subdivided multiple times (details here), then the distances do not need to be calculated repeatedly for each of them. Doing so just once suffices. Furthermore, the implementation of the filter itself can be made in such a way that the smoothing process can be performed as a single matrix multiplication.

Consider the smoothing defined in Lombardi, (2002), which we used in Winkler et al., (2012):

\tilde{Q}_n = \dfrac{\sum_j Q_j K\left(g\left(\mathbf{x}_n,\mathbf{x}_j\right)\right)}{\sum_j K\left(g\left(\mathbf{x}_n,\mathbf{x}_j\right)\right)}

where \tilde{Q}_n is the smoothed quantity at the vertex or face n, Q_j is the quantity assigned to the j-th vertex or face of a sphere containining J vertices or faces, g\left(\mathbf{x}_n,\mathbf{x}_j\right) is the geodesic distance between vertices or faces with coordinates \mathbf{x}_n and \mathbf{x}_j, and K(g) is the Gaussian filter, defined as a function of the geodesic distance between points.

The above formula requires that all distances between the current vertex or face n and the other points j are calculated, and that this is repeated for each n, in a time consuming process that needs to be repeated over again for every additional subject. If, however, the distances g are known and organised into a distance matrix \mathbf{G}, the filter can take the form of a matrix of the same size, \mathbf{K}, with the values at each row scaled so as to add up to unity, and the same smoothing can proceed as a simple matrix multiplication:

\mathbf{\tilde{Q}} = \mathbf{K}\mathbf{Q}

If the grid is the same for all subjects, which is the typical case when comparisons across subjects are performed, \mathbf{K} can be calculated just once, saved, and reused for each subject.

It should be noted, however, that although running faster, the method requires much more memory. For a filter of full-width at half-maximum (FWHM) f, truncated at a distance t \cdot f from the filter center, in a sphere of radius r, the number of non-zero elements in \mathbf{K} is approximately:

\text{nnz} \approx \dfrac{J^2}{2} \left(1-\cos\left(t \cdot \dfrac{f}{r}\right)\right)

whereas the total number of elements is J^2. Even using sparse matrices, this may require a large amount of memory space. For practical purposes, a filter with width f = 20 mm can be truncated at twice the width (t = 2), for application in a sphere with 100 mm made by 7 subdivisions of an icosahedron, still comfortably in a computer with 16GB of RAM. Wider filters may require more memory to run.

The script smoothdpx, part of the areal interpolation tools, available here, can be used to do both things, that is, smooth the data for any given subject, and also save the filter so that it can be reused with other subjects. To apply a previously saved filter, the rpncalc can be used. These commands require Octave or MATLAB, and if Octave is available, they can be executed directly from the command line.

Figures

The figures above represent facewise data on the surface of a sphere of 100 mm radius, made by recursive subdivision of a regular icosahedron 4 times, constructed with the platonic command (details here), shown without smoothing, and smoothed with filters with FWHM = 7, 14, 21, 28 and 35 mm.

References

Splitting the cortical surface into independent regions

FreeSurfer offers excellent visualisation capabilities with tksurfer and FreeView. However, there are endless other possibilities using various different computer graphics software. In previous posts, it was shown here in the blog how to generate cortical and subcortical surfaces that could be imported into these applications, as well as how to generate models with vertexwise and facewise colours, and even a description of common file formats. It was also previously shown how to arbitrarily change the colours of regions for use with FreeSurfer own tools. However, a method to allow rendering cortical regions with different colours in software such as Blender was missing. This is what this post is about.

The idea is simple: splitting the cortical surface into one mesh per parcellation allows each to be imported as an independent object, and so, it becomes straightforward to apply a different colour for each one. To split, the first step is to convert the FreeSurfer annotation file to a data-per-vertex file (*.dpv). This can be done with the command annot2dpv.

./annot2dpv lh.aparc.annot lh.aparc.annot.dpv

Before running, be sure that ${FREESURFER_HOME}/matlab is in the Octave/matlab, path. With the data-per-vertex file ready, do the splitting of the surface with splitsrf:

./splitsrf lh.white lh.aparc.annot.dpv lh.white_roi

This will create several files names as lh.white_roi*. Each corresponds to one piece of the cortex, in *.srf format. To convert to a format that can be read directly into computer graphics software, see the instructions here.

The annot2dpv and splitsrf are now included in the package for areal analysis, available here.

With the meshes imported, let your imagination and creativity fly. Once produced, labels can be added to the renderings using software such as Inkscape, to produce images as the one above, of the Desikan-Killiany atlas, which illustrates the paper Cortical Thickness or Gray Matter Volume: The Importance of Selecting the Phenotype for Imaging Genetics Studies.

Another method is also possible, without the need to split the cortex, but instead, painting the voxels. This can be done with the command replacedpx, also available from the package above. In this case each region index is replaced by its corresponding statistical value (or any other value), then maps are produced with the dpx2map, shown in an earlier blog post, here. This other method, however, requires that the label indices are known for each region, which in FreeSurfer depends on the rgb colors assigned to them. Moreover, the resulting maps don’t have as sharp and beautiful borders as when the surface is split into independent pieces.

Displaying vertexwise and facewise brain maps

In a previous post, a method to display FreeSurfer cortical regions in arbitrary colours was presented. Suppose that, instead, you would like to display the results from vertexwise or facewise analyses. For vertexwise, these can be shown using tksurfer or Freeview. The same does not apply, however, to facewise data, which, at the time of this writing, is not available in any neuroimaging software. In this article a tool to generate files with facewise or vertexwise data is provided, along with some simple examples.

The dpx2map tool

The tool to generate the maps is dpx2map (right-click to download, then make it executable). Call it without arguments to get usage information. This tool uses Octave as the backend, and it assumes that it is installed in its usual location (/usr/bin/octave). It is also possible to run it from inside Octave or Matlab using a slight variant, dpx2map.m (in which case, type help dpx2map for usage).

In either case, the commands srfread, dpxread and mtlwrite must be available. These are part of the areal package discussed here. And yes, dpx2map is now included in the latest release of the package too.

To use dpx2map, you need to specify a surface object that will provide the geometry on which the data colours will be overlaid, and the data itself. The surface should be in FreeSurfer format (*.asc or *.srf), and the data should be in FreeSurfer “curvature” format (*.asc, *.dpv) for vertexwise, or in facewise format (*.dpf). A description of these formats is available here.

It is possible to specify the data range that will be used when computing the scaling to make the colours, as well which range will be actually shown. It is also possible to split the scale so that a central part of it is omitted or shown in a colour outside the colourscale. This is useful to show thresholded positive and negative maps.

The output is saved either in Stanford Polygon (*.ply) for vertexwise, or in Wavefront Object (*.obj + *.mtl) for facewise data, and can be imported directly in many computer graphics software. All input and output files must be/are in their respective ascii versions, not binary. The command also outputs a image with the colourbar, in Portable Network Graphics format (*.png).

An example object

With a simple geometric shape as this it is much simpler to demonstrate how to generate the maps, than using a complicated object as the brain. The strategy for colouring remains the same. For the next examples, an ellipsoid was created using the platonic command. The command line used was:

platonic ellipsoid.obj ico sph 7 '[.25 0 0 0; 0 3 0 0; 0 0 .25 0; 0 0 0 1]'

This ellipsoid has maximum y-coordinate equal to 3, and a maximum x- and z-coordinates equal to 0.25. This file was converted from Wavefront *.obj to FreeSurfer ascii, and scalar fields simply describing the coordinates (x,y,z), were created with:

obj2srf ellipsoid.obj > ellipsoid.srf
srf2area ellipsoid.srf ellipsoid.dpv dpv
gawk '{print $1,$2,$3,$4,$2}' ellipsoid.dpv > ellipsoid-x.dpv
gawk '{print $1,$2,$3,$4,$3}' ellipsoid.dpv > ellipsoid-y.dpv
gawk '{print $1,$2,$3,$4,$4}' ellipsoid.dpv > ellipsoid-z.dpv

It is the ellipsoid-y.dpv that is used for the next examples.

Vertexwise examples

The examples below use the same surface (*.srf) and the same curvature, data-per-vertex file (*.dpv). The only differences are the way as the map is generated and presented, using different colour maps and different scaling. The jet colour map is the same available in Octave and Matlab. The coolhot5 is a custom colour map that will be made available, along with a few others, in another article to be posted soon.

Example A

In this example, defaults are used. The input files are specified, along with a prefix (exA) to be used to name the output files.

dpx2map ellipsoid-y.dpv ellipsoid.srf exA

Example B

In this example, the data between values -1.5 and 1.5 is coloured, and the remaining receive the colours of the extreme points (dark blue and dark red).

dpx2map ellipsoid-y.dpv ellipsoid.srf exB jet '[-1.5 1.5]'

Example C

In this example, the data between -2 and 2 is used to define the colours, with the values below/above receiving the extreme colours. However, the range between -1 and 1 is not shown or used for the colour scaling. This is because the dual option is set as true as well as the coption.

dpx2map ellipsoid-y.dpv ellipsoid.srf exC coolhot5 '[-2 2]' '[-1 1]' true '[.75 .75 .75]' true

Example D

This example is similar as above, except that the values between -1 and 1, despite not being shown, are used for the scaling of the colours. This is due to the coption being set as true.

dpx2map ellipsoid-y.dpv ellipsoid.srf exD coolhot5 '[-2 2]' '[-1 1]' true '[.75 .75 .75]' false

Example E

Here the data between -2 and 2 is used for scaling, but only the points between -1 and 1 are shown. This is because the option dual was set as false. The values below -1 or above 1 receive the same colours as these numbers, because the coption was configured as true. Note that because all points will receive some colour, it is not necessary to define the colourgap.

dpx2map ellipsoid-y.dpv ellipsoid.srf exE coolhot5 '[-2 2]' '[-1 1]' false '[]' true

Example F

This is similar as the previous example, except that the values between -1 and 1 receive a colour off of the colour map. This is because both dual and coption were set as false.

dpx2map ellipsoid-y.dpv ellipsoid.srf exF coolhot5 '[-2 2]' '[-1 1]' false '[.75 .75 .75]' false

Facewise data

The process to display facewise data is virtually identical. The only two differences are that (1) instead of supplying a *.dpv file, a *.dpf file is given to the script as input, and (2) the output isn’t a *.ply file, but instead a pair of files *.obj + *.mtl. Note that very few software can handle thousands of colours per object in the case of facewise data. Blender is recommended over most commercial products specially for this reason (and of course, it is free, as in freedom).

Download

The dpx2map is available here, and it is also included in the areal package, described here, where all its dependencies are satisfied. You must have Octave (free) or Matlab available to use this tool.

How to cite

If you use dpx2map for your scientific research, please, remember to mention the brainder.org website in your paper.

Update: Display in PDF documents

3D models as these, with vertexwise colours, can be shown in interactive PDF documents. Details here.

Merging multiple surfaces

Say you have a number of meshes in FreeSurfer ascii format (with extension *.asc or *.srf), one brain structure per file. However, for later processing or to import in some computer graphics software, you would like to have these multiple meshes all in a single file. This post provides a small script to accomplish this: mergesrf.

To use it, right click and save the file above, make it executable and, ideally, put it in a place where it can be found (or add its location to the environmental variable ${PATH}. Then run something as:

mergesrf file1.srf file2.srf fileN.srf mergedfile.srf

In this example, the output file is saved as mergedfile.srf. Another example is to convert all subcortical structures into just one large object, after aseg2srf as described here. To convert all, just change the current directory to ${SUBJECTS_DIR}/<subject_name>/ascii, then run:

mergesrf * aseg_all.srf

A list with the input files and the output at the end is shown below:

The script uses Octave, which can be downloaded freely. The same script, with a small modification, can also run from inside matlab. This other version can be downloaded here: mergesrf.m

Requirements

In addition to Octave (or matlab), the script also requires functions to read and write surface files, which are available from the areal package (described here and downloadable here).

Competition ranking and empirical distributions

Estimating the empirical cumulative distribution function (empirical cdf) from a set of observed values requires computing, for each value, how many of the other values are smaller or equal to it. In other words, let X = \{x_1, x_2, \ldots , x_N\} be the observed values, then:

F(x) = P(X \leqslant x) = \frac{1}{N} \sum_{n=1}^{N}I(x_n \leqslant x)

where F(x) is the empirical cdf, P(X \leqslant x) is the probability of observing a value in X that is smaller than or equal to x, I(\cdot) is a function that evaluates as 1 if the condition in the parenthesis is satisfied or 0 otherwise, and N is the number of observations. This definition is straightforward and can be found in many statistical textbooks (see, e.g., Papoulis, 1991).

For continuous distributions, a p-value for a given x \in X can be computed as 1-F(x). For a discrete cdf, however, this trivial relationship does not hold. The reason is that, while for continuous distributions the probability P(X = x) = 0, for discrete distributions (such as empirical distributions), P(X = x) > 0 whenever x \in X. As we often want to assign a probability to one of the values that have been observed, the condition x \in X always holds, and the simple relationship between the cdf and the p-value is lost. The solution, however, is still simple: a p-value can be obtained from the cdf as 1-F(x) + 1/N.

In the absence of repeated values (ties), the cdf can be obtained computationally by sorting the observed data in ascending order, i.e., X_{s} = \{x_{(1)}, x_{(2)}, \ldots , x_{(N)}\}. Then F(x)=(n_x)/N, where (n_x) represents the ascending rank of x. Likewise, the p-value can be obtaining by sorting the data in descending order, and using a similar formula, P(X \geqslant x) = (\tilde{n}_x)/N, where (\tilde{n}_x) represents the descending rank of x.

Example 1: Consider the following sequence of 10 observed values, already sorted in ascending order:

X=\{75, 76, 79, 80, 84, 85, 86, 88, 90, 94\}

The corresponding ranks are simply 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Consequently, the cdf is:

F(x|x\in X) = \{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1\}

The p-values are computed by using the ranks in descending order, i.e., 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, which yields:

P(X \geqslant x | x \in X) = \{1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1\}

The problem with ties

Simple sorting, as above, is effective when there are no repeated values in the data. However, in the presence of ties, simple sorting that produces ordinal rankings yield incorrect results.

Example 2: Consider the following sequence of 10 observed values, in which ties are present:

X = \{81, 81, 82, 83, 83, 83, 84, 85, 85, 85\}

If the same computational strategy discussed above were used, the we would conclude that the p-value for x=85 would be either 0.1, 0.2 or 0.3, depending on which instance of the “85” we were to choose. The correct value, however, is 0.3 (only), since 3 out of 10 variables are equal or above 85. The solution to this problem is to replace the simple, ordinal ranking, for a modified version of the so called competition ranking.

The competition ranking

The competition ranking assigns the same rank to values that are identical. The standard competition ranking for the previous sequence is, in ascending order, 1, 1, 3, 4, 4, 4, 7, 8, 8, 8. In descending order, the ranks are 9, 9, 8, 5, 5, 5, 4, 1, 1, 1. Note that in this ranking, the ties receive the “best” possible rank. A slightly different approach consists in assigning the “worst” possible rank to the ties. This modified competition ranking applied the previous sequence gives, in ascending order, the ranks 2, 2, 3, 5, 5, 5, 7, 10, 10, 10, whereas in descending order, the ranks are 10, 10, 8, 7, 7, 7, 4, 3, 3, 3.

Competition ranking solves the problem with ties for both the empirical cdf and for the calculation of the p-values. In both cases, it is the modified version of the ranking that needs to be used, i.e., the ranking that assigns the worst rank to identical values. Still using the same sequence as example, the cdf is given by:

F(x|x\in X) = \{0.2, 0.2, 0.3, 0.6, 0.6, 0.6, 0.7, 1, 1, 1\}

and the p-values are given by:

P(X \geqslant x | x \in X) = \{1, 1, 0.8, 0.7, 0.7, 0.7, 0.4, 0.3, 0.3, 0.3\}

Note that, as in the case with no ties, the p-values cannot be computed simply as 1-F(x) from the cdf as it would for continuous distributions. The correct formula is 1-F(x)+(n_x)/N, with (n_x) here representing the frequency of x.

Uses

The cdf of a set of observed data is useful in many types of non-parametric inference, particularly those that use bootstrap, jack-knife, Monte Carlo, or permutation tests. Ties can be quite common when working with discrete data and discrete explanatory variables using any of these methods.

Competition ranking in MATLAB and Octave

A function that computes the competition ranks for Octave and/or MATLAB is available to download here: csort.m.

Once installed, type help csort to obtain usage information.

References

Papoulis A. Probability, Random Variables and Stochastic Processes. McGraw-Hill, New York, 3rd ed, 1991.

Automatic atlas queries in FSL

The fmrib Software Library (fsl) provides a tool to query whether regions belong or not to one of various atlases available. This well-known tool is called atlasquery, and it requires one region per image per run. To run for multiple separate regions on a single image image (e.g., a thresholded statistical map), a separate call to fsl‘s cluster command (not to be confused with the homonym cluster command that is part of the GraphViz package) is needed.

In order to automate this task, a small script called autoaq is available (UPDATE: the script is no longer supplied here; it has been incorporated into fsl). Usage information is provided by calling the command without arguments. This is the same script we posted recently to the fsl mailing list (here). Obviously, it does not run in Microsoft Windows. To use it, you need any recent Linux or Mac computer with fsl installed.

An example call is shown below:

./autoaq -i pvals.nii.gz -t 0.95 -o report.txt \
         -a "JHU White-Matter Tractography Atlas"

The command will write temporary files to the directory from where it is called, hence it needs to be called from a directory to which the user has writing permissions. The atlas name can be any of the atlases available in fsl, currently being the ones listed below (note the quotes, ” “, that need to be provided when calling autoaq):

  • “Cerebellar Atlas in MNI152 space after normalization with FLIRT”
  • “Cerebellar Atlas in MNI152 space after normalization with FNIRT”
  • “Harvard-Oxford Cortical Structural Atlas”
  • “Harvard-Oxford Subcortical Structural Atlas”
  • “JHU ICBM-DTI-81 White-Matter Labels”
  • “JHU White-Matter Tractography Atlas”
  • “Juelich Histological Atlas”
  • “MNI Structural Atlas”
  • “Oxford Thalamic Connectivity Probability Atlas”
  • “Oxford-Imanova Striatal Connectivity Atlas 3 sub-regions”
  • “Oxford-Imanova Striatal Connectivity Atlas 7 sub-regions”
  • “Oxford-Imanova Striatal Structural Atlas”
  • “Talairach Daemon Labels”

The list can always be obtained through atlasquery --dumpatlases. Information about these atlases is available here.

The output is divided in three sections. In the first, a table containing the cluster indices, size and coordinates of the peaks and centres of mass is provided. In the second part, the structures to which the cluster peaks belong to are presented, along with the associated probabilities. In the third part, probabilities for each cluster as a whole is presented. If the atlas is a binary label atlas, the number shown is in fact the overlap percentage between the cluster and the respective atlas label. If the atlas is probabilistic, the value is the mean probability in the overlapping region.

Version history:

  • 03.Oct.2012: Update – Fixed issue with the md5 command under a different name in vanilla Mac.
  • 25.Jan.2014: Update – Added options -u (to update/append a previous report) and -p to show peak coordinates instead of center of mass.
  • 10.Dec.2014: The autoaq has been integrated into the freely available fmrib Software Library (fsl) and is no longer provided here. Hope you enjoy using it directly into fsl! :-)

Facewise brain cortical surface area: scripts now available

https://brainder.org/download/areal

In the paper Measuring and comparing brain cortical surface area and other areal quantities (Neuroimage, 2012;61(4):1428-43, doi:10.1016/j.neuroimage.2012.03.026, pmid:22446492), we described the steps to perform pycnophylactic interpolation, i.e., mass-conservative interpolation, of brain surface area. The method is also valid for any other quantity that is also areal by nature.

The scripts to perform the steps we propose in the paper are now freely available for Octave and/or matlab. They can also be executed directly from the shell using Octave as the interpreter. In this case, multiple instances can be easily submitted to run in a cluster. Click here to dowload and for step-by-step instructions.