Simplifying Freedman-Lane

Doing a permutation test with the general linear model (GLM) in the presence of nuisance variables can be challenging. Let the model be:

\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon}

where \mathbf{Y} is a matrix of observed variables, \mathbf{X} is a matrix of predictors of interest, \mathbf{Z} is a matrix of covariates (of no interest), and \boldsymbol{\epsilon} is a matrix of the same size as \mathbf{Y} with the residuals.

Because the interest is in testing the relationship between \mathbf{Y} and \mathbf{X}, in principle it would be these that would need be permuted, but doing so also breaks the relationship with \mathbf{Z}, which would be undesirable. Over the years, many methods have been proposed. A review can be found in Winkler et al. (2014); other previous work include the papers by Anderson and Legendre (1999) and Anderson and Robinson (2001).

One of these various methods is the one published in Freedman and Lane (1983), which consists of permuting data that has been residualised with respect to the covariates, then estimated covariate effects added back, then the full model fitted again. The procedure can be performed through the following steps:

  1. Regress \mathbf{Y} against the full model that contains both the effects of interest and the nuisance variables, i.e., \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon}. Use the estimated parameters \boldsymbol{\hat{\beta}} to compute the statistic of interest, and call this statistic T_{0}.
  2. Regress \mathbf{Y} against a reduced model that contains only the covariates, i.e. \mathbf{Y} = \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon}_{\mathbf{Z}}, obtaining estimated parameters \boldsymbol{\hat{\gamma}} and estimated residuals \boldsymbol{\hat{\epsilon}}_{\mathbf{Z}}.
  3. Compute a set of permuted data \mathbf{Y}^{*}_{j}. This is done by pre-multiplying the residuals from the reduced model produced in the previous step, \boldsymbol{\hat{\epsilon}}_{\mathbf{Z}}, by a permutation matrix, \mathbf{P}{j}, then adding back the estimated nuisance effects, i.e. \mathbf{Y}^{*}_{j} = \mathbf{P}_{j}\boldsymbol{\hat{\epsilon}}_{\mathbf{Z}} + \mathbf{Z}\boldsymbol{\hat{\gamma}}.
  4. Regress the permuted data \mathbf{Y}^{*}_{j} against the full model, i.e. \mathbf{Y}^{*}_{j} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon}
  5. Use the estimated \boldsymbol{\hat{\beta}}^{*}_{j} to compute the statistic of interest. Call this statistic T^{*}_{j}.
  6. Repeat the Steps 2-4 many times to build the reference distribution of T^{*} under the null hypothesis of no association between \mathbf{Y} and \mathbf{X}.
  7. Count how many times T^{*}_{j} was found to be equal to or larger than T_{0}, and divide the count by the number of permutations; the result is the p-value.

Steps 1-4 can be written concisely as:

\left(\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}+\mathbf{H}_{\mathbf{Z}}\right) \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma}+\boldsymbol{\epsilon}

where \mathbf{P}_{j} is a permutation matrix (for the j-th permutation, \mathbf{H}_{\mathbf{Z}}=\mathbf{Z}\mathbf{Z}^{+} is the hat matrix due to the covariates, and \mathbf{R}_{\mathbf{Z}} = \mathbf{I} - \mathbf{H}_{\mathbf{Z}} is the residual forming matrix; the superscript symbol ^{+} represents a matrix pseudo-inverse.

In page 385 of Winkler et al. (2014), my colleagues and I state that:

[…] add the nuisance variables back in Step 3 is not strictly necessary, and the model can be expressed simply as \mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}\mathbf{Y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\gamma}+\boldsymbol{\epsilon}, implying that the permutations can actually be performed just by permuting the rows of the residual-forming matrix \mathbf{R}_{\mathbf{Z}}.

However, in the paper we do not offer any proof of this important result, that allows algorithmic acceleration. Here we remedy that. Let’s start with two brief lemmata:

Lemma 1: The product of a hat matrix and its corresponding residual-forming matrix is zero, that is, \mathbf{R}_{\mathbf{Z}}\mathbf{H}_{\mathbf{Z}} = \mathbf{H}_{\mathbf{Z}}\mathbf{R}_{\mathbf{Z}} = \mathbf{0}.

This is because \mathbf{R}_{\mathbf{Z}} = \mathbf{I} - \mathbf{H}_{\mathbf{Z}}, hence \mathbf{R}_{\mathbf{Z}}\mathbf{H}_{\mathbf{Z}} = \mathbf{R}_{\mathbf{Z}}(\mathbf{I} - \mathbf{R}_{\mathbf{Z}}) = \mathbf{R}_{\mathbf{Z}} - \mathbf{R}_{\mathbf{Z}}\mathbf{R}_{\mathbf{Z}} = \mathbf{R}_{\mathbf{Z}} - \mathbf{R}_{\mathbf{Z}} = \mathbf{0} since \mathbf{R}_{\mathbf{Z}} is idempotent.

Lemma 2 (Frisch–Waugh–Lovell theorem): Given a GLM expressed as \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon}, we can estimate \boldsymbol{\beta} from an equivalent GLM written as \mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta} + \mathbf{R}_{\mathbf{Z}}\boldsymbol{\epsilon}.

To see why, remember that multiplying both sides of an equation by the same factor does not change it (least squares solutions may change; transformations using Lemma 2 below do not act on the fitted model). Let’s start from:

\mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}(\mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} + \boldsymbol{\epsilon})

Then remove the parentheses:

\mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta} + \mathbf{R}_{\mathbf{Z}}\mathbf{Z}\boldsymbol{\gamma} + \mathbf{R}_{\mathbf{Z}}\boldsymbol{\epsilon}

Since \mathbf{R}_{\mathbf{Z}} = \mathbf{I} - \mathbf{H}_{\mathbf{Z}}:

\mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta} + (\mathbf{I}-\mathbf{H}_{\mathbf{Z}})\mathbf{Z}\boldsymbol{\gamma} + \mathbf{R}_{\mathbf{Z}}\boldsymbol{\epsilon}

and that \mathbf{H}_{\mathbf{Z}} = \mathbf{Z}\mathbf{Z}^{+}:

\mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta} + (\mathbf{Z}-\mathbf{Z}\mathbf{Z}^{+}\mathbf{Z})\boldsymbol{\gamma} + \mathbf{R}_{\mathbf{Z}}\boldsymbol{\epsilon}

Since \mathbf{Z}^{+}\mathbf{Z}=\mathbf{I}:

\mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta} + \mathbf{0}\boldsymbol{\gamma} + \mathbf{R}_{\mathbf{Z}}\boldsymbol{\epsilon}

\mathbf{R}_{\mathbf{Z}}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}_{\mathbf{Z}}

where \boldsymbol{\epsilon}_{\mathbf{Z}}=  \mathbf{R}_{\mathbf{Z}}\boldsymbol{\epsilon}.

Main result

Now we are ready for the main result. The Freedman-Lane model is:

\left(\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}+\mathbf{H}_{\mathbf{Z}}\right) \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma}+\boldsymbol{\epsilon}

Per Lemma 2, it can be rewritten as:

\mathbf{R}_{\mathbf{Z}}\left(\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}+\mathbf{H}_{\mathbf{Z}}\right) \mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}_{\mathbf{Z}}

Dropping the parenthesis:

\mathbf{R}_{\mathbf{Z}}\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}\mathbf{Y} +\mathbf{R}_{\mathbf{Z}}\mathbf{H}_{\mathbf{Z}} \mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}_{\mathbf{Z}}

Per Lemma 1:

\mathbf{R}_{\mathbf{Z}}\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}\mathbf{Y} + \mathbf{0}\mathbf{Y} = \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}_{\mathbf{Z}}

\mathbf{R}_{\mathbf{Z}}\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}}\mathbf{Y}= \mathbf{R}_{\mathbf{Z}}\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}_{\mathbf{Z}}

What is left has the same form as the result of Lemma 2. Thus, reversing it, we obtain the final result:

\mathbf{P}_{j}\mathbf{R}_{\mathbf{Z}} \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{\gamma} +\boldsymbol{\epsilon}

Hence, the hat matrix \mathbf{H}_{\mathbf{Z}} cancels out, meaning that it is not necessary. Results are the same both ways.

References

Updates

Leave a comment