How many principal components?

Posted on 05.April.2021 by A. M. Winkler

Better than scree, even in beautiful settings.

Very often we find ourselves handling large matrices and, for multiple different reasons, may want or need to reduce the dimensionality of the data by retaining the most relevant principal components. A question that arises then is how many such principal components should be retained? That is, what are the principal components that provide information that can be distinguished from variability that is indistinguishable from random noise?

One simple heuristic method is the scree plot (Cattell, 1966): one computes the singular values of the matrix, plots them in descending order, and visually looks for some kind of “elbow” (or the starting point of the “scree”) to be used as a threshold. Those singular vectors (principal components) that have corresponding singular values larger than that threshold are retained, otherwise discarded. Unfortunately, the scree test is too subjective, and worse, many realistic problems can produce matrices that have multiple such “elbows”.

Many other methods to help select the number of principal components to retain exist. Some, for example, are based on information theory, or on probabilistic formulations, or on hard thresholds related the size of the matrix or the (explained) variance from the data. In this article, the little known Wachter method (Wachter, 1976) is presented.

Let $\mathbf{X}$ be a $n \times p$ matrix of random values with zero mean and variance $\sigma^2$ . The density of the bulk spectrum of the singular values $\lambda_k$ , $k \in \left\{1,\ldots,K\right\}$ , $K = \mathrm{min}(n,p)$ , of $\mathbf{X}'\mathbf{X}/n$ is:

$p(\lambda;y,\sigma^2) = \begin{cases}\sqrt{(b-\lambda)(\lambda-a)}/(2\pi \lambda y \sigma^2), & \text{if } a \leqslant \lambda \leqslant b\\0, & \text{otherwise}\end{cases}$

and a point mass $1-1/y$ at the origin if $y > 1$ , where $a = \sigma^2(1-\sqrt{y})^2$ , $b = \sigma^2(1+\sqrt{y})^2$ , and $n/p \rightarrow y \in (0,\infty)$ (Bai, 1999). This is the Marčenko and Pastur (1967) distribution. The cumulative distribution function (cdf) can be obtained by integration, i.e., $P(\lambda;y,\sigma^2) = \int_a^\lambda p(x;y,\sigma^2)\mathrm{d}x$ . MATLAB/Octave functions that compute the cdf and its inverse are here: mpcdf.m and mpinv.m.

If we know the probability of finding a singular value $\lambda$ or larger for a random matrix, we have the means to judge whether that $\lambda$ is sufficiently large to be considered interesting enough for its corresponding singular vector (principal component) to be retained. Simply computing the p-value, however, is not enough, because the distribution makes no reference to the position $k$ of the sorted singular values. A different procedure needs be considered.

The core idea of the Wachter method is to use a QQ plot of the observed singular values versus the quantiles obtained from the inverse cdf of the Marčenko–Pastur distribution, and use eventual deviations from the identity line to help finding the threshold that separates the “good” from the “unnecessary” principal components. That is, plot the eigenvalues $\lambda_{K+1-k}$ versus the quantiles $P^{-1}((k-\frac{1}{2})/K;y,\sigma^2)$ . Deviations from the identity line are evidence for excess variance not expected from random variables alone.

The function wachter.m computes the singular values from a given observed matrix $\mathbf{X}$ , the expected singular values should $\mathbf{X}$ be a random matrix, as well as the p-values for each singular value in either case. The observed and expected singular values can be used to built a QQ-plot. The p-values can be used for a comparison, e.g., via the logarithm of their ratio. For example:

% For reproducibility, reset the random number generator.
% Use "rand" instead of "rng" to ensure compatibility with old versions.
rand('seed',0);

% Simulate data. See Gavish and Donoho (2014) for this example.
X = diag([1.7 2.5 zeros(1,98)]) + randn(100)*sqrt(.01); 

% Compute the expected and observed singular values,
% as well as the respective cumulative probabilities (p-values).
% See the help of wachter.m for syntax.
[Exp,Obs,Pexp,Pobs] = wachter(X,[],false,true);

% Log of the ratio between observed and expected p-values.
% Large values are evidence for "good" components.
P_ratio = -log10(Pobs./Pexp);

% Plot the spectrum.
subplot(1,2,1);
plot(Obs,'x:');
title('Singular values');
xlabel('index (k)');

% Construct the QQ-plot.
subplot(1,2,2);
qqplot(Exp,Obs);
title('QQ plot');

The result is:

In this example, two of the singular values can be considered “good”, and should be retained. The others can, according to this criterion, be dropped.

The function can take into account nuisance variables (provided in the 2nd argument), including intercept (for mean-centering), allows normalization of columns to unit variance, and can operate on p-values (upper tail from the density function) or on the cumulative probabilities. See the help text inside the function for usage information.

References

Bai ZD. Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica. 1999;9(3):611–77.
Cattell RB. The scree test for the number of factors. Multivariate Behavioral Research. 1966;1(2):245–276.
Gavish M, Donoho DL. The optimal hard threshold for singular values is 4/sqrt(3). arXiv:13055870. 2014.
Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics. 2001;29(2):295–327.
Marčenko VA, Pastur LA. Distribution of eigenvalues for some sets of random matrices. Math USSR Sb. 1967;1(4):457–83.
Wachter KW. Probability plotting points for principal components. In: Proceedings of the Ninth Interface Symposium on Computer Science and Statistics. Harvard University and Massachussetts Institute of Technology: Prindle, Weber & Schmidt; 1976. p. 299–308.

The image at the top is of the Drei Zinnen, in the Italian Alps during the Summer, in which a steep slope scattered with small stones (scree) is visible. Photo by Heinz Melion from Pixabay.

A higher Octave for PALM

Posted on 08.October.2019 by A. M. Winkler

PALM — Pemutation Analysis of Linear Models — uses either MATLAB or Octave behind the scenes. It can be executed from within either of these environments, or from the shell, in which case either of these is invoked, depending on how PALM was configured.

For users who do not want or cannot spend thousands of dollars in MATLAB licenses, Octave comes for free, and offers quite much the same benefits. However, for Octave, some functionalities in PALM require version 4.4.1 or higher. However, stable Linux distributions such as Red Hat Enterprise Linux and related (such as CentOS and Scientific Linux) still offer only 3.8.2 in the official repositories at the time of this writing, leaving the user with the task of finding an unofficial package or compiling from the source. The latter task, however, can be daunting: Octave is notoriously difficult to compile, with a myriad of dependencies.

A much simpler approach is to use Flatpak or Snappy. These are systems for distribution of Linux applications. Snappy is sponsored by Canonical (that maintains Ubuntu), whereas Flatpak appears to be the preferred tool for Fedora and openSUSE. Using either system is quite simple, and here the focus is on Flatpak.

To have a working installation of Octave, all that needs be done is:

1) Make sure Flatpak is installed:

On a RHEL/CentOS system, use (as root):

yum install flatpak

For openSUSE, use (as root):

zypper install flatpak

For Ubuntu and other Debian-based systems:

sudo apt install flatpak

2) Add the Flathub repository:

flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo

3) Install Octave:

flatpak install flathub org.octave.Octave

4) Run it!

flatpak run org.octave.Octave

Only the installation of Flatpak needs be done as root. Once it has been installed, repositories and applications (such as Octave, among many others) can be installed at the user level. These can also be installed and made available system-wide (if run as root).

Configuring PALM

From version alpha117 onwards, the executable file ‘palm’ (not to be confused with ‘palm.m’) will include a variable named “OCTAVEBIN”, which specifies how Octave should be called. Change it from the default:

OCTAVEBIN=/usr/bin/octave

so that it invokes the version installed with Flatpak:

OCTAVEBIN="/usr/bin/flatpak run org.octave.Octave"

After making the above edits, it should be possible to run PALM directly from the system shell using the version installed via Flatpak. Alternatively, it should also be possible to invoke Octave (as in step 4 above), then use the command “addpath” to specify the location of palm.m, and then call PALM from the Octave prompt.

Octave packages

Handling of packages when Octave is installed via Flatpak is the same as usual, that is, via the command ‘pkg’ run from within Octave. More details here.

Installing NiDB

Posted on 19.February.2018 by A. M. Winkler

NiDB is a light, powerful, and simple to use neuroimaging database. One of its main strengths is that it was developed using a stack of stable and proven technologies: Linux, Apache, MySQL/MariaDB, PHP, and Perl. None of these technologies are new, and the fact that they have been around for so many years means that there is a lot of documentation and literature available, as well as a myriad of libraries (for PHP and Perl) that can do virtually anything. Although both PHP and Perl have received some degree of criticism (not unreasonably), and in some cases are being replaced by tools such as Node.js and Python, the volume of information about them means it is easy to find solutions when problems appear.

This article covers installation steps for either CentOS or RHEL 7, but similar steps should work with other distributions since the overall strategy is the same. By separating apart each of the steps, as opposed to doing all the configuration and installation as a single script, it becomes easier to adapt to different systems, and to identify and correct problems that may arise due to local particularities. The steps below are derived from the scripts setup-centos7.sh and setup-ubuntu16.sh, that are available in the NiDB repository, but here these will be ignored. Note that the instructions below are not “official”; for the latter, consult the NiDB documentation. The intent of this article is to facilitate the process and mitigate some frustration you may feel if trying to do it all by yourself. Also, by looking at the installation steps, you should be able to have a broad overview of the pieces that constitute database.

1) Begin with a fresh install.

If installing CentOS from the minimal DVD, choose a “Minimal Install” and leave to add the desktop in the next step.

2) Update the system.

This is a good time to install the most recent updates and patches, and reboot if the updates include a new kernel:

yum update
/sbin/reboot

3) Have a graphical mode.

While not strictly necessary, having a graphical interface for a web-based application will be handy. Install your favourite desktop, and a VNC server if you intend to manage the system remotely. For a lightweight desktop, consider MATE:

First add the EPEL repository. Depending on what you already have configured, use either:

yum install epel-release

or:

yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Then:

yum groupinstall "MATE Desktop"
systemctl set-default graphical.target
systemctl isolate graphical.target
systemctl enable lightdm
systemctl start lightdm

For VNC, there are various options available. Consider, for example, TurboVNC.

4) Define some environment variables to be used later.

These will help when entering the commands later.

# Directory where NiDB will be installed
NIDBROOT=/nidb

# Directory of the webpages and PHP files:
WWWROOT=/var/www/html

# Linux username under which NiDB will run:
NIDBUSER=nidb

# MySQL/MariaDB root password:
MYSQLROOTPASS=[YOUR_PASSWORD_HERE]

# MySQL/MariaDB username that will have access to the database, and associated password:
MYSQLUSER=nidb
MYSQLPASS=[YOUR_PASSWORD_HERE]

These variables are only used during the installation, and all the steps here are done as root. Considering clearing your shell history at the end, so as not to have your passwords stored there.

5) Create an account for the user under which NiDB will run.

This is the user that will run the processes related to the database. It is not necessary that this user has administrative privileges on the system, and from a security perspective, it is better if not.

useradd -m ${NIDBUSER}
passwd ${NIDBUSER} # choose a sensible password

6) Install and configure Apache.

Add the repository for a more recent version, then install:

yum install httpd

Configure it to run as the ${NIDBUSER} user:

sed -i "s/User apache/User ${NIDBUSER}/" /etc/httpd/conf/httpd.conf
sed -i "s/Group apache/Group ${NIDBUSER}/" /etc/httpd/conf/httpd.conf

Enable it at boot, and also start it now:

systemctl enable httpd.service
systemctl start httpd.service

Open the relevant ports in the firewall, then reload the rules:

firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --permanent --add-port=443/tcp
firewall-cmd --reload

7) Install and configure MySQL/MariaDB.

For MariaDB 10.2, the repository can be added to /etc/yum.repos.d/ as:

echo "[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.2/centos7-amd64
gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck = 1" >> /etc/yum.repos.d/MariaDB.repo

For other versions or distributions, visit this address. Then do the actual installation:

yum install MariaDB-server MariaDB-client

Enable it at boot and start now too:

systemctl enable mariadb.service
systemctl start mariadb.service

Secure the MySQL/MariaDB installation:

mysql_secure_installation

Pay attention to the questions on the root password and set it here to what was chosen in the ${MYSQLROOTPASS} variable. Make sure your database is secure.

8) Install and configure PHP.

First add the repositories for PHP 7.2:

yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum install yum-utils
yum-config-manager --enable remi-php72
yum install php php-mysql php-gd php-process php-pear php-mcrypt php-mbstring

Install some additional PHP packages:

pear install Mail
pear install Mail_Mime
pear install Net_SMTP

Edit the PHP configuration:

sed -i 's/^short_open_tag = .*/short_open_tag = On/g' /etc/php.ini
sed -i 's/^session.gc_maxlifetime = .*/session.gc_maxlifetime = 28800/g' /etc/php.ini
sed -i 's/^memory_limit = .*/memory_limit = 5000M/g' /etc/php.ini
sed -i 's/^upload_tmp_dir = .*/upload_tmp_dir = \/${NIDBROOT}\/uploadtmp/g' /etc/php.ini
sed -i 's/^upload_max_filesize = .*/upload_max_filesize = 5000M/g' /etc/php.ini
sed -i 's/^max_file_uploads = .*/max_file_uploads = 1000/g' /etc/php.ini
sed -i 's/^max_input_time = .*/max_input_time = 600/g' /etc/php.ini
sed -i 's/^max_execution_time = .*/max_execution_time = 600/g' /etc/php.ini
sed -i 's/^post_max_size = .*/post_max_size = 5000M/g' /etc/php.ini
sed -i 's/^display_errors = .*/display_errors = On/g' /etc/php.ini
sed -i 's/^error_reporting = .*/error_reporting = E_ALL \& \~E_DEPRECATED \& \~E_STRICT \& \~E_NOTICE/' /etc/php.ini

Also, edit /etc/php.ini to make sure your timezone is correct, for example:

date.timezone = America/New_York

For a list of time zones, see here. Finally:

chown -R ${NIDBUSER}:${NIDBUSER} /var/lib/php/session

9) Install Perl and other pieces.

These are all in the main repositories already added so you should be able to simply run:

yum install perl* cpan git gcc gcc-c++ java ImageMagick vim libpng12 libmng wget iptraf* pv

Install also various Perl packages from CPAN. The first time you run cpan, various configuration questions will be asked; it is safe to accept default answers for all:

cpan File::Path
cpan Net::SMTP::TLS
cpan List::Util
cpan Date::Parse
cpan Image::ExifTool
cpan String::CRC32
cpan Date::Manip
cpan Sort::Naturally
cpan Digest::MD5
cpan Digest::MD5::File
cpan Statistics::Basic
cpan Email::Send::SMTP::Gmail
cpan Math::Derivative

Then put these into a place where NiDB can find them:

mkdir /usr/local/lib64/perl5
cp -rv /root/perl5/lib/perl5/* /usr/local/lib64/perl5/

10) (Optional) Disable SELinux.

Disabling SELinux is not strictly necessary provided that you ensure that all processes related to NiDB (webserver, database server), and all its files, belong to the same user, nidb, and that file access policies are set correctly. In any case, you may feel this is useful so as to stop receiving too many irrelevant warnings during the installation. You can enable it again later.

sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
setenforce 0

Note that enabling or disabling SELinux requires a reboot to take effect (it is not sufficient to simply restart a daemon; there is not one in fact).

11) Install FSL.

FSL functions are used by various internal scripts. After the installation, make sure the environment variable FSLDIR exists and points to the correct location (typically /usr/local/fsl, but can be different if you installed it elsewhere). This variable is used below when defining the crontab jobs.

FSLDIR=/usr/local/fsl

12) Download and install the NiDB files.

The official Github repository is https://github.com/gbook/nidb. However, I have made a fork with a couple of changes that better adapt to the system I am working with. You can probably go with either way.

mkdir -p ${NIDBROOT}
cd ${NIDBROOT}
mkdir -p archive backup dicomincoming deleted download ftp incoming problem programs/lock programs/logs uploadtmp uploaded
git clone https://github.com/andersonwinkler/nidb install
cd install
cp -Rv setup/Mysql* /usr/local/lib64/perl5/
cp -Rv programs/* ${NIDBROOT}/programs/
cp -Rv web/* ${WWWROOT}/
chown -R ${NIDBUSER}:${NIDBUSER} ${NIDBROOT}
chown -R ${NIDBUSER}:${NIDBUSER} ${WWWROOT}

Edit the file ${WWWROOT}/functions.php and complete two pieces of configuration. Locate these two lines:

$cfg = LoadConfig();
date_default_timezone_set();

In the first parenthesis, (), put what you get when you run:

echo "${NIDBROOT}/programs/nidb.cfg"

whereas in the second (), put what you get when you run:

timedatectl | grep "Time zone:" | awk '{print $3}'

For example, depending on your variables and time zone, you could edit to look like this:

$cfg = LoadConfig("/nidb/programs/nidb.cfg");
date_default_timezone_set("America/New_York")

13) Set up the database.

First, create the nidb user in MySQL/MariaDB. This is the only user (other than root) that will be able to do anything in the database:

mysql -uroot -p${MYSQLROOTPASS} -e "CREATE USER '${MYSQLUSER}'@'%' IDENTIFIED BY '${MYSQLPASS}'; GRANT ALL PRIVILEGES ON *.* TO '${MYSQLUSER}'@'%';"

Now create the NiDB database proper:

cd ${NIDBROOT}/install/setup
mysql -uroot -p${MYSQLROOTPASS} -e "CREATE DATABASE IF NOT EXISTS nidb; GRANT ALL ON *.* TO 'root'@'localhost' IDENTIFIED BY '${MYSQLROOTPASS}'; FLUSH PRIVILEGES;"
mysql -uroot -p${MYSQLROOTPASS} nidb < nidb.sql
mysql -uroot -p${MYSQLROOTPASS} nidb < nidb-data.sql

14) Setup cron jobs.

These jobs will take care of various automated input/output tasks.

cat <<EOC > ~/tempcron.txt
* * * * * cd ${NIDBROOT}/programs; perl parsedicom.pl > /dev/null 2>&1
* * * * * cd ${NIDBROOT}/programs; perl modulemanager.pl > /dev/null 2>&1
* * * * * cd ${NIDBROOT}/programs; perl pipeline.pl > /dev/null 2>&1
* * * * * cd ${NIDBROOT}/programs; perl datarequests.pl > /dev/null 2>&1
* * * * * cd ${NIDBROOT}/programs; perl fileio.pl > /dev/null 2>&1
* * * * * cd ${NIDBROOT}/programs; perl importuploaded.pl > /dev/null 2>&1
* * * * * cd ${NIDBROOT}/programs; perl qc.pl > /dev/null 2>&1
* * * * * FSLDIR=${FSLDIR}; PATH=${FSLDIR}/bin:${PATH}; . ${FSLDIR}/etc/fslconf/fsl.sh; export FSLDIR PATH; cd ${NIDBROOT}/programs; perl mriqa.pl > /dev/null 2>&1
@hourly find ${NIDBROOT}/programs/logs/*.log -mtime +4 -exec rm {} \;
@daily  /usr/bin/mysqldump nidb -u root -p${MYSQLROOTPASS} | gzip > ${NIDBROOT}/backup/db-\$(date +%Y-%m-%d).sql.gz
@hourly /bin/find /tmp/* -mmin +120 -exec rm -rf {} \;
@daily  find ${NIDBROOT}/ftp/* -mtime +7 -exec rm -rf {} \
@daily  find ${NIDBROOT}/tmp/* -mtime +7 -exec rm -rf {} \;
EOC
crontab -u ${NIDBUSER} ~/tempcron.txt && rm ~/tempcron.txt

15) Edit the main configuration.

The main configuration file, ${NIDBROOT}/programs/nidb.cfg, should be edited to reflect your paths, usernames, and passwords. It is this file that will contain the admin password for accessing NiDB. Use the ${NIDBROOT}/programs/nidb.cfg.sample as an example.

Once you have logged in as admin, you can also edit this file again in the database interface, in the menu Admin -> NiDB Settings.

16) (Optional) Install a MySQL/MariaDB frontend.

It will likely increase your productivity when doing maintenance to have a friendly frontend for MySQL/MariaDB. Two popular choices are phpMyAdmin (web-based) and Oracle MySQL Workbench.

For phpMyAdmin:

wget https://www.phpmyadmin.net/downloads/phpMyAdmin-latest-english.zip
unzip phpMyAdmin-latest-english.zip
mv phpMyAdmin-*-english ${WWWROOT}/phpMyAdmin
chown -R ${NIDBUSER}:${NIDBUSER} ${WWWROOT}
chmod 755 ${WWWROOT}
cp ${WWWROOT}/phpMyAdmin/config.sample.inc.php ${WWWROOT}/phpMyAdmin/config.inc.php

For MySQL Workbench, the repositories are listed at this link:

wget http://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
rpm -Uvh mysql57-community-release-el7-11.noarch.rpm
yum install mysql-workbench

However, at the time of this writing, the current version (6.3.10) crashes upon start. The solution is to downgrade:

yum install yum-plugin-versionlock
yum versionlock mysql-workbench-community-6.3.8-1.el7.*
yum install mysql-workbench-community

17) That’s it!

You should by now have a working installation of NiDB, accessible from your web-browser at http://localhost. There are additional pieces you may consider configuring, such as a listener in one of your server ports to bring DICOMs from the scanner in automatically as the images are collected, and also other changes to the database schema and web interface. Now you have a starting point.

References

For more information on NiDB, see these two papers:

Book GA, Anderson BM, Stevens MC, Glahn DC, Assaf M, Pearlson GD. Neuroinformatics database (NiDB) – A modular, portable database for the storage, analysis, and sharing of neuroimaging data. Neuroinformatics. 2013;11(4):495-505.
Book GA, Stevens MC, Assaf M, Glahn DC, Pearlson GD. Neuroimaging data sharing on the neuroinformatics database platform. Neuroimage. 2016 Jan;124(0):1089-92.

Three HCP utilities

Posted on 01.August.2016 by A. M. Winkler

If you are working with data from the Human Connectome Project (HCP), perhaps these three small Octave/MATLAB utilities may be of some use:

hcp2blocks.m: Takes the restricted file with information about kinship and zygosity and produces a multi-level exchangeability blocks file that can be used with PALM for permutation inference. It is fully described here.
hcp2solar.m: Takes restricted and unrestricted files to produce a pedigree file that can be used with SOLAR for heritability and genome-wide association analyses.
picktraits.m: Takes either restricted or unrestricted files, a list of traits and a list of subject IDs to produce tables with selected traits for the selected subjects. These can be used to, e.g., produce design matrices for subsequent analysis.

These functions need to parse relatively large CSV files, which is somewhat inefficient in MATLAB and Octave. Still, since these commands usually have to be executed only once for a particular analysis, a 1-2 minute wait seems acceptable.

If downloaded directly from the above links, remember also to download the prerequisites: strcsvread.m and strcsvwrite.m. Alternatively, clone the full repository from GitHub. The link is this. Other tools may be added in the future.

A fourth utility

For the HCP-S1200 release (March/2017), zygosity information is provided in the fields ZygositySR (self-reported zygosity) and ZygosityGT (zygosity determined by genetic methods for select subjects). If needed, these two fields can be merged into a new field named simply Zygosity. To do so, use a fourth utility, command mergezyg.

Downsampling (decimating) a brain surface

Posted on 31.May.2016 by A. M. Winkler

Downsampled average cortical surfaces at different iterations (n), with the respective number of vertices (V), edges (E) and faces (F).

In the previous post, a method to display brain surfaces interactively in PDF documents was presented. While the method is already much more efficient than it was when it first appeared some years ago, the display of highly resolved meshes can be computationally intensive, and may make even the most enthusiastic readers give up even opening the file.

If the data being shown has low spatial frequency, an alternative way to display, which preserves generally the amount of information, is to decimate the mesh, downsampling it to a lower resolution. Although in theory this can be done in the native (subject-level) geometry through retessellation (i.e., interpolation of coordinates), the interest in downsampling usually is at the group level, in which case the subjects have all been interpolated to a common grid, which in general is a geodesic sphere produced by subdividing recursively an icosahedron (see this earlier post). If, at each iteration, the vertices and faces are added in a certain order (such as in FreeSurfer‘s fsaverage or in the one generated with the platonic command), downsampling is relatively straightforward, whatever is the type of data.

Vertexwise data

For vertexwise data, downsampling can be based on the fact that vertices are added (appended) in a certain order as the icosahedron is constructed:

Vertices 1-12 correspond to n = 0, i.e., no subdivision, or ico0.
Vertices 13-42 correspond to the vertices that, once added to the ico0, make it ico1 (first iteration of subdivision, n = 1).
Vertices 43-162 correspond to the vertices that, once added to ico1, make it ico2 (second iteration, n = 2).
Vertices 163-642, likewise, make ico3.
Vertices 643-2562 make ico4.
Vertices 2563-10242 make ico5.
Vertices 10243-40962 make ico6, etc.

Thus, if the data is vertexwise (also known as curvature, such as cortical thickness or curvature indices proper), the above information is sufficient to downsample the data: to reduce down to an ico3, for instance, all what one needs to do is to pick the vertices 1 through 642, ignoring 643 onwards.

Facewise data

Data stored at each face (triangle) generally correspond to areal quantities, that require mass conservation. For both fsaverage and platonic icosahedrons, the faces are added in a particular order such that, at each iteration of the subdivision, a given face index is replaced in situ for four other faces: one can simply collapse (via sum or average) the data of every four faces into a new one.

Surface geometry

If the objective is to decimate the surface geometry, i.e., the mesh itself, as opposed to quantities assigned to vertices or faces, one can use similar steps:

Select the vertices from the first up to the last vertex of the icosahedron in the level needed.
Iteratively downsample the face indices by selecting first those that are formed by three vertices that were appended for the current iteration, then for those that have two vertices appended in the current iteration, then connecting the remaining three vertices to form a new, larger face.

Applications

Using downsampled data is useful not only to display meshes in PDF documents, but also, some analyses may not require a high resolution as the default mesh (ico7), particularly for processes that vary smoothly across the cortex, such as cortical thickness. Using a lower resolution mesh can be just as informative, while operating at a fraction of the computational cost.

A script

A script that does the tasks above using Matlab/Octave is here: icodown.m. It is also available as part of the areal package described here, which also satisfies all its dependencies. Input and output formats are described here.

Interactive 3D brains in PDF documents

Posted on 10.May.2016 by A. M. Winkler

A screenshot from Acrobat Reader. The example file is here.

Would it not be helpful to be able to navigate through tri-dimensional, surface-based representations of the brain when reading a paper, without having to download separate datasets, or using external software? Since 2004, with the release of the version 1.6 of the Portable Document Format (PDF), this has been possible. However, the means to generate the file were not easily available until about 2008, when Intel released of a set of libraries and tools. This still did not help much to improve popularity, as in-document rendering of complex 3D models requires a lot of memory and processing, making its use difficult in practice at the time. The fact that Acrobat Reader was a lot bloated did not help much either.

Now, almost eight years later, things have become easier for users who want to open these documents. Newer versions of Acrobat are much lighter, and capabilities of ordinary computers have increased. Yet, it seems the interest on this kind of visualisation have faded. The objective of this post is to show that it is remarkably simple to have interactive 3D objects in PDF documents, which can be used in any document published online, including theses, presentations, and papers: journals as PNAS and Journal of Neuroscience are at the forefront in accepting interactive manuscripts.

Requirements

U3D Tools: Make sure you have the IDTFConverter utility, from the U3D tools, available on SourceForge as part of the MathGL library. A direct link to version 1.4.4 is here; an alternative link, of a repackaged version of the same, is here. Compiling instructions for Linux and Mac are in the “readme” file. There are some dependencies that must be satisfied, and are described in the documentation. If you decide not to install the U3D tools, but only compile them, make sure the path of the executable is both in the $PATH and in the $LD_LIBRARY_PATH. This can be done with:

cd /path/to/the/directory/of/IDTFConverter
export PATH=${PATH}:$(pwd)
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$(pwd)

The ply2idtf function: Make sure you have the latest version of the areal package, which contains the MATLAB/Octave function ply2idtf.m used below.
Certain LaTeX packages: The packages movie15 or media9, that allow embedding the 3D object into the PDF using LaTeX. Either will work. Below it is assumed the older, movie15 package, is used.

Step 1: Generate the PLY maps

Once you have a map of vertexwise cortical data that needs to be shown, follow the instructions from this earlier blog post that explains how to generate Stanford PLY files to display colour-coded vertexwise data. These PLY files will be used below.

Step 2: Convert the PLY to IDTF files

IDTF stands for Intermediate Data Text Format. As the name implies, it is a text, intermediate file, used as a step before the creation of the U3D files, the latter that are embedded into the PDF. Use the function ply2idtf for this:

ply2idtf(...
   {'lh.pial.thickness.avg.ply','LEFT', eye(4);...
    'rh.pial.thickness.avg.ply','RIGHT',eye(4)},...
    'thickness.idtf');

The first argument is a cell array with 3 columns, and as many rows as PLY files being added to the IDTF file. The first column contains the file name, the second the label (or node) that for that file, and the third an affine matrix that maps the coordinates from the PLY file to the world coordinate system of the (to be created) U3D. The second (last) argument to the command is the name of the output file.

Step 3: Convert the IDTF to U3D files

From a terminal window (not MATLAB or Octave), run:

IDTFConverter -input thickness.idtf -output thickness.u3d

Step 4: Configure default views

Here we use the older movie15 LaTeX package, and the same can be accomplished with the newer, media9 package. Various viewing options are configurable, all of which are described in the documentation. These options can be saved in a text file with extension .vws, and later supplied in the LaTeX document. An example is below.

VIEW=Both Hemispheres
  COO=0 -14 0,
  C2C=-0.75 0.20 0.65
  ROO=325
  AAC=30
  ROLL=-0.03
  BGCOLOR=.5 .5 .5
  RENDERMODE=Solid
  LIGHTS=CAD
  PART=LEFT
    VISIBLE=true
  END
  PART=RIGHT
    VISIBLE=true
  END
END
VIEW=Left Hemisphere
  COO=0 -14 0,
  C2C=-1 0 0
  ROO=325
  AAC=30
  ROLL=-0.03
  BGCOLOR=.5 .5 .5
  RENDERMODE=Solid
  LIGHTS=CAD
  PART=LEFT
    VISIBLE=true
  END
  PART=RIGHT
    VISIBLE=false
  END
END
VIEW=Right Hemisphere
  COO=0 -14 0,
  C2C=1 0 0
  ROO=325
  AAC=30
  ROLL=0.03
  BGCOLOR=.5 .5 .5
  RENDERMODE=Solid
  LIGHTS=CAD
  PART=LEFT
    VISIBLE=false
  END
  PART=RIGHT
    VISIBLE=true
  END
END

Step 5: Add the U3D to the LaTeX source

Interactive, 3D viewing is unfortunately not supported by most PDF readers. However, it is supported by the official Adobe Acrobat Reader since version 7.0, including the recent version DC. Thus, it is important to let the users/readers of the document know that they must open the file using a recent version of Acrobat. This can be done in the document itself, using a message placed with the option text of the \includemovie command of the movie15 package. A minimalistic LaTeX source is shown below (it can be downloaded here).

\documentclass{article}

% Relevant package:
\usepackage[3D]{movie15}

% pdfLaTeX and color links setup:
\usepackage{color}
\usepackage[pdftex]{hyperref}
\definecolor{colorlink}{rgb}{0, 0, .6}  % dark blue
\hypersetup{colorlinks=true,citecolor=colorlink,filecolor=colorlink,linkcolor=colorlink,urlcolor=colorlink}

\begin{document}
\title{Interactive 3D brains in PDF documents}
\author{}
\date{}
\maketitle

\begin{figure}[!h]
\begin{center}
\includemovie[
text=\fbox{\parbox[c][9cm][c]{9cm}{\centering {\footnotesize (Use \href{http://get.adobe.com/reader/}{Adobe Acrobat Reader 7.0+} \\to view the interactive content.)}}},
poster,label=Average,3Dviews2=pial.vws]{10cm}{10cm}{thickness.u3d}
\caption{An average 3D brain, showing colour-coded average thickness (for simplicity, colour scale not shown). Click to rotate. Right-click to for a menu with various options. Details at \href{http://brainder.org}{http://brainder.org}.}
\end{center}
\end{figure}

\end{document}

Step 6: Generate the PDF

For LaTeX, use pdfLaTeX as usual:

pdflatex document.tex

What you get

After generating the PDF, the result of this example is shown here (a screenshot is at the top). It is possible to rotate in any direction, zoom, pan, change views to predefined modes, and alternate between orthogonal and perspective projections. It’s also possible to change rendering modes (including transparency), and experiment with various lightning options.

In Acrobat Reader, by right-clicking, a menu with various useful options is presented. A toolbar (as shown in the top image) can also be enabled through the menu.

The same strategy works also with the Beamer class, such that interactive slides can be created and used in talks, and with XeTeX, allowing these with a richer variety of text fonts.

Fast surface smoothing on a common grid

Posted on 30.November.2014 by A. M. Winkler

Smoothing scalar data on the surface of a high resolution sphere can be a slow process. If the filter is not truncated, the distances between all the vertices or barycentres of faces need to be calculated, in a very time consuming process. If the filter is truncated, the process can be faster, but with resolutions typically used in brain imaging, it can still be very slow, a problem that is amplified if data from many subjects are analysed.

However, if the data for each subject have already been interpolated to a common grid, such as an icosahedron recursively subdivided multiple times (details here), then the distances do not need to be calculated repeatedly for each of them. Doing so just once suffices. Furthermore, the implementation of the filter itself can be made in such a way that the smoothing process can be performed as a single matrix multiplication.

Consider the smoothing defined in Lombardi, (2002), which we used in Winkler et al., (2012):

$\tilde{Q}_n = \dfrac{\sum_j Q_j K\left(g\left(\mathbf{x}_n,\mathbf{x}_j\right)\right)}{\sum_j K\left(g\left(\mathbf{x}_n,\mathbf{x}_j\right)\right)}$

where $\tilde{Q}_n$ is the smoothed quantity at the vertex or face $n$ , $Q_j$ is the quantity assigned to the $j$ -th vertex or face of a sphere containining $J$ vertices or faces, $g\left(\mathbf{x}_n,\mathbf{x}_j\right)$ is the geodesic distance between vertices or faces with coordinates $\mathbf{x}_n$ and $\mathbf{x}_j$ , and $K(g)$ is the Gaussian filter, defined as a function of the geodesic distance between points.

The above formula requires that all distances between the current vertex or face $n$ and the other points $j$ are calculated, and that this is repeated for each $n$ , in a time consuming process that needs to be repeated over again for every additional subject. If, however, the distances $g$ are known and organised into a distance matrix $\mathbf{G}$ , the filter can take the form of a matrix of the same size, $\mathbf{K}$ , with the values at each row scaled so as to add up to unity, and the same smoothing can proceed as a simple matrix multiplication:

$\mathbf{\tilde{Q}} = \mathbf{K}\mathbf{Q}$

If the grid is the same for all subjects, which is the typical case when comparisons across subjects are performed, $\mathbf{K}$ can be calculated just once, saved, and reused for each subject.

It should be noted, however, that although running faster, the method requires much more memory. For a filter of full-width at half-maximum (FWHM) $f$ , truncated at a distance $t \cdot f$ from the filter center, in a sphere of radius $r$ , the number of non-zero elements in $\mathbf{K}$ is approximately:

$\text{nnz} \approx \dfrac{J^2}{2} \left(1-\cos\left(t \cdot \dfrac{f}{r}\right)\right)$

whereas the total number of elements is $J^2$ . Even using sparse matrices, this may require a large amount of memory space. For practical purposes, a filter with width $f$ = 20 mm can be truncated at twice the width ( $t$ = 2), for application in a sphere with 100 mm made by 7 subdivisions of an icosahedron, still comfortably in a computer with 16GB of RAM. Wider filters may require more memory to run.

The script smoothdpx, part of the areal interpolation tools, available here, can be used to do both things, that is, smooth the data for any given subject, and also save the filter so that it can be reused with other subjects. To apply a previously saved filter, the rpncalc can be used. These commands require Octave or MATLAB, and if Octave is available, they can be executed directly from the command line.

Figures

The figures above represent facewise data on the surface of a sphere of 100 mm radius, made by recursive subdivision of a regular icosahedron 4 times, constructed with the platonic command (details here), shown without smoothing, and smoothed with filters with FWHM = 7, 14, 21, 28 and 35 mm.

References

Lombardi M. Interpolation and smoothing. Astron. Astrophys. 2002; 395(2), 733–745.
Winkler AM, Sabuncu MR, Yeo BT, Fischl B, Greve DN, Kochunov P, Nichols TE, Blangero J, Glahn DC. Measuring and comparing brain cortical surface area and other areal quantities. Neuroimage. 2012 Jul 16;61(4):1428-43.

Splitting the cortical surface into independent regions

Posted on 08.January.2014 by A. M. Winkler

FreeSurfer offers excellent visualisation capabilities with tksurfer and FreeView. However, there are endless other possibilities using various different computer graphics software. In previous posts, it was shown here in the blog how to generate cortical and subcortical surfaces that could be imported into these applications, as well as how to generate models with vertexwise and facewise colours, and even a description of common file formats. It was also previously shown how to arbitrarily change the colours of regions for use with FreeSurfer own tools. However, a method to allow rendering cortical regions with different colours in software such as Blender was missing. This is what this post is about.

The idea is simple: splitting the cortical surface into one mesh per parcellation allows each to be imported as an independent object, and so, it becomes straightforward to apply a different colour for each one. To split, the first step is to convert the FreeSurfer annotation file to a data-per-vertex file (*.dpv). This can be done with the command annot2dpv.

./annot2dpv lh.aparc.annot lh.aparc.annot.dpv

Before running, be sure that ${FREESURFER_HOME}/matlab is in the Octave/matlab, path. With the data-per-vertex file ready, do the splitting of the surface with splitsrf:

./splitsrf lh.white lh.aparc.annot.dpv lh.white_roi

This will create several files names as lh.white_roi*. Each corresponds to one piece of the cortex, in *.srf format. To convert to a format that can be read directly into computer graphics software, see the instructions here.

The annot2dpv and splitsrf are now included in the package for areal analysis, available here.

With the meshes imported, let your imagination and creativity fly. Once produced, labels can be added to the renderings using software such as Inkscape, to produce images as the one above, of the Desikan-Killiany atlas, which illustrates the paper Cortical Thickness or Gray Matter Volume: The Importance of Selecting the Phenotype for Imaging Genetics Studies.

Another method is also possible, without the need to split the cortex, but instead, painting the voxels. This can be done with the command replacedpx, also available from the package above. In this case each region index is replaced by its corresponding statistical value (or any other value), then maps are produced with the dpx2map, shown in an earlier blog post, here. This other method, however, requires that the label indices are known for each region, which in FreeSurfer depends on the rgb colors assigned to them. Moreover, the resulting maps don’t have as sharp and beautiful borders as when the surface is split into independent pieces.

Displaying vertexwise and facewise brain maps

Posted on 28.July.2013 by A. M. Winkler

In a previous post, a method to display FreeSurfer cortical regions in arbitrary colours was presented. Suppose that, instead, you would like to display the results from vertexwise or facewise analyses. For vertexwise, these can be shown using tksurfer or Freeview. The same does not apply, however, to facewise data, which, at the time of this writing, is not available in any neuroimaging software. In this article a tool to generate files with facewise or vertexwise data is provided, along with some simple examples.

The dpx2map tool

The tool to generate the maps is dpx2map (right-click to download, then make it executable). Call it without arguments to get usage information. This tool uses Octave as the backend, and it assumes that it is installed in its usual location (/usr/bin/octave). It is also possible to run it from inside Octave or Matlab using a slight variant, dpx2map.m (in which case, type help dpx2map for usage).

In either case, the commands srfread, dpxread and mtlwrite must be available. These are part of the areal package discussed here. And yes, dpx2map is now included in the latest release of the package too.

To use dpx2map, you need to specify a surface object that will provide the geometry on which the data colours will be overlaid, and the data itself. The surface should be in FreeSurfer format (*.asc or *.srf), and the data should be in FreeSurfer “curvature” format (*.asc, *.dpv) for vertexwise, or in facewise format (*.dpf). A description of these formats is available here.

It is possible to specify the data range that will be used when computing the scaling to make the colours, as well which range will be actually shown. It is also possible to split the scale so that a central part of it is omitted or shown in a colour outside the colourscale. This is useful to show thresholded positive and negative maps.

The output is saved either in Stanford Polygon (*.ply) for vertexwise, or in Wavefront Object (*.obj + *.mtl) for facewise data, and can be imported directly in many computer graphics software. All input and output files must be/are in their respective ascii versions, not binary. The command also outputs a image with the colourbar, in Portable Network Graphics format (*.png).

An example object

With a simple geometric shape as this it is much simpler to demonstrate how to generate the maps, than using a complicated object as the brain. The strategy for colouring remains the same. For the next examples, an ellipsoid was created using the platonic command. The command line used was:

platonic ellipsoid.obj ico sph 7 '[.25 0 0 0; 0 3 0 0; 0 0 .25 0; 0 0 0 1]'

This ellipsoid has maximum y-coordinate equal to 3, and a maximum x- and z-coordinates equal to 0.25. This file was converted from Wavefront *.obj to FreeSurfer ascii, and scalar fields simply describing the coordinates (x,y,z), were created with:

obj2srf ellipsoid.obj > ellipsoid.srf
srf2area ellipsoid.srf ellipsoid.dpv dpv
gawk '{print $1,$2,$3,$4,$2}' ellipsoid.dpv > ellipsoid-x.dpv
gawk '{print $1,$2,$3,$4,$3}' ellipsoid.dpv > ellipsoid-y.dpv
gawk '{print $1,$2,$3,$4,$4}' ellipsoid.dpv > ellipsoid-z.dpv

It is the ellipsoid-y.dpv that is used for the next examples.

Vertexwise examples

The examples below use the same surface (*.srf) and the same curvature, data-per-vertex file (*.dpv). The only differences are the way as the map is generated and presented, using different colour maps and different scaling. The jet colour map is the same available in Octave and Matlab. The coolhot5 is a custom colour map that will be made available, along with a few others, in another article to be posted soon.

Example A

In this example, defaults are used. The input files are specified, along with a prefix (exA) to be used to name the output files.

dpx2map ellipsoid-y.dpv ellipsoid.srf exA

Example B

In this example, the data between values -1.5 and 1.5 is coloured, and the remaining receive the colours of the extreme points (dark blue and dark red).

dpx2map ellipsoid-y.dpv ellipsoid.srf exB jet '[-1.5 1.5]'

Example C

In this example, the data between -2 and 2 is used to define the colours, with the values below/above receiving the extreme colours. However, the range between -1 and 1 is not shown or used for the colour scaling. This is because the dual option is set as true as well as the coption.

dpx2map ellipsoid-y.dpv ellipsoid.srf exC coolhot5 '[-2 2]' '[-1 1]' true '[.75 .75 .75]' true

Example D

This example is similar as above, except that the values between -1 and 1, despite not being shown, are used for the scaling of the colours. This is due to the coption being set as true.

dpx2map ellipsoid-y.dpv ellipsoid.srf exD coolhot5 '[-2 2]' '[-1 1]' true '[.75 .75 .75]' false

Example E

Here the data between -2 and 2 is used for scaling, but only the points between -1 and 1 are shown. This is because the option dual was set as false. The values below -1 or above 1 receive the same colours as these numbers, because the coption was configured as true. Note that because all points will receive some colour, it is not necessary to define the colourgap.

dpx2map ellipsoid-y.dpv ellipsoid.srf exE coolhot5 '[-2 2]' '[-1 1]' false '[]' true

Example F

This is similar as the previous example, except that the values between -1 and 1 receive a colour off of the colour map. This is because both dual and coption were set as false.

dpx2map ellipsoid-y.dpv ellipsoid.srf exF coolhot5 '[-2 2]' '[-1 1]' false '[.75 .75 .75]' false

Facewise data

The process to display facewise data is virtually identical. The only two differences are that (1) instead of supplying a *.dpv file, a *.dpf file is given to the script as input, and (2) the output isn’t a *.ply file, but instead a pair of files *.obj + *.mtl. Note that very few software can handle thousands of colours per object in the case of facewise data. Blender is recommended over most commercial products specially for this reason (and of course, it is free, as in freedom).

Download

The dpx2map is available here, and it is also included in the areal package, described here, where all its dependencies are satisfied. You must have Octave (free) or Matlab available to use this tool.

How to cite

If you use dpx2map for your scientific research, please, remember to mention the brainder.org website in your paper.

Update: Display in PDF documents

3D models as these, with vertexwise colours, can be shown in interactive PDF documents. Details here.

Merging multiple surfaces

Posted on 11.July.2013 by A. M. Winkler

Say you have a number of meshes in FreeSurfer ascii format (with extension *.asc or *.srf), one brain structure per file. However, for later processing or to import in some computer graphics software, you would like to have these multiple meshes all in a single file. This post provides a small script to accomplish this: mergesrf.

To use it, right click and save the file above, make it executable and, ideally, put it in a place where it can be found (or add its location to the environmental variable ${PATH}. Then run something as:

mergesrf file1.srf file2.srf fileN.srf mergedfile.srf

In this example, the output file is saved as mergedfile.srf. Another example is to convert all subcortical structures into just one large object, after aseg2srf as described here. To convert all, just change the current directory to ${SUBJECTS_DIR}//ascii, then run:

mergesrf * aseg_all.srf

A list with the input files and the output at the end is shown below:

The script uses Octave, which can be downloaded freely. The same script, with a small modification, can also run from inside matlab. This other version can be downloaded here: mergesrf.m

Requirements

In addition to Octave (or matlab), the script also requires functions to read and write surface files, which are available from the areal package (described here and downloadable here).