False Discovery Rate: Corrected & Adjusted P-values

Until mid-1990’s, control of the error rate under multiple testing was done, in general, using family-wise error rate (FWER). An example of this kind of correction is the Bonferroni correction. In an influential paper, Benjamini and Hochberg (1995) introduced the concept of false discovery rate (FDR) as a way to allow inference when many tests are being conducted. Differently than FWER, which controls the probability of committing a type I error for any of a family of tests, FDR allows the researcher to tolerate a certain number of tests to be incorrectly discovered. The word rate in the FDR is in fact a misnomer, as the FDR is the proportion of discoveries that are false among all discoveries, i.e., proportion of incorrect rejections among all rejections of the null hypothesis.

Benjamini and Hochberg’s FDR-controlling procedure

Consider testing N hypotheses, H_{1}, H_{2}, \ldots , H_{N} based on their respective p-values, p_{1}, p_{2}, \ldots , p_{N}. Consider that a fraction q of discoveries are allowed (tolerated) to be false. Sort the p-values in ascending order, p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(N)} and denote H_{(i)} the hypothesis corresponding to p_{(i)}. Let k be the largest i for which p_{(i)} \leq \frac{i}{N}\frac{q}{c(N)}. Then reject all H_{(i)}, i=1, 2, \ldots , k. The constant c(N) is not in the original publication, and appeared in Benjamini and Yekutieli (2001) for cases in which independence cannot be ascertained. The possible choices for the constant are c(N)=1 or c(N)=\sum_{i=1}^{N} 1/i. The second is valid in any situation, whereas the first is valid for most situations, particularly where there are no negative correlations among tests. The B&H procedure has found many applications across different fields, including neuroimaging, as introduced by Genovese et al. (2002).

FDR correction

The procedure described above effectively defines a single number, a threshold, that can be used to declare tests as significant or not at the level q. One might, instead, be interested to know, for each original p-value, the minimum q level in which they would still be rejected. To find this out, define q_{(i)}=\frac{p_{(i)}N}{i}c(N). This formulation, however, has problems, as discussed next.

FDR adjustment

The problem with the FDR-correction is that q_{(i)} is not a monotonic function of p_{(i)}. This means that if someone uses any q_{(i)} to threshold the set of FDR-corrected values, the result is not the same as would be obtained by applying sequentially the B&H procedure for all these corresponding q levels.

To address this concern, Yekutieli and Benjamini (1999) introduced the FDR-adjustment, in which monotonicity is enforced, and which definition is compatible with the original FDR definition. Let q*_{(i)} be the FDR-adjusted value for p_{(i)}. It’s value is the smallest q_{(k)}, k \geq i, where q_{(k)} is the FDR-corrected as defined above. That’s just it!

Example

Consider the image below, on the left, a square of 9×9 pixels containing each some t-statistic with some sparse, low magnitude signal added. On the right are the corresponding p-values, ranging approximately between 0-1:

Statistical maps

Statistic and p-values

colorscale

Using the B&H procedure to compute the threshold, with q=0.20, and applying this threshold to the image rejects the null hypothesis in 6 pixels. On the right panel, the red line corresponds to the threshold. All p-values (in blue) below this line are declared significant.

FDR-threshold

FDR-threshold

Computing the FDR-corrected values at each pixel, and thresholding at the same q=0.20, however, does not produce the same results, and just 1 pixel is declared significant. Although conservative, the “correction” is far from the nominal false discovery rate and is not appropriate. Note on the right panel the lack of monotonicity.

FDR-corrected

FDR-corrected

Computing instead the FDR-adjusted values, and thresholding again at q=0.20 produces the same results as with the simpler FDR-threshold. The adjustment is, thus, more “correct” than the “corrected”. Observe, on the right panel, the monotonicity enforced.

FDR-adjusted

FDR-adjusted

Implementation

An implementation of the three styles of FDR for Octave/MATLAB is available here: fdr.m
SPM users will find a similar feature in the function spm_P_FDR.m.

References

22 thoughts on “False Discovery Rate: Corrected & Adjusted P-values

  1. Hi,
    I am doing genome wide association study . After analysis I am getting p-values and FDR adjusted p-values. Could you please tell me about SNPs most significant….. I am pasting some of the results…

    SNP P.value FDR_Adjusted_P-values
    SNP13114 3.56E-09 1.52E-05
    SNP4683 6. 58E-09 1.88E-05
    SNP13114 8.80E-09 1.88E-05
    SNP13061 1.23E-08 1.71E-05
    SNP12690 7.58E-05 0.113065
    SNP9766 7.81E-05 0.113065
    SNP9769 7.81E-05 0.113065
    SNP2215 0.000472 0.511954
    SNP3436 0.001216 0.877387
    SNP3221 0.001532 0.877387
    SNP5367 1.74E-05 0.999769
    SNP5010 1.92E-05 0.999769
    SNP10332 3.29E-05 0.999769

    Is the mean of this table the last SNPs are not significantly associated with traits or my interpretation is wrong.
    Thanks,

    • Are these all the p-values you found, or only some (e.g., only the lowest 13)? If these are all, then all are significant at q<=0.05. The threshold is 0.0015320. However, the adjusted p-values I get are different than yours, see below (in the same order).
      3.8133e-08
      3.8133e-08
      3.8133e-08
      3.9975e-08
      1.0153e-04
      1.0153e-04
      1.0153e-04
      5.5782e-04
      1.3173e-03
      1.5320e-03
      4.1600e-05
      4.1600e-05
      6.1100e-05

      • PS: Oh, these are only some of the p-vals, as this is from a GWA study. If you got the p-adjusted using the ‘fdr.m’ function, they should be correct and you can say confidently that all the adjusted values below 0.05 (or any other q-level you prefer) are significant. Hope this helps!

  2. No, these are not all p-values, I’ve chose them from whole set to clear my question. Suppose I am getting p-value=3.56E-09 and its FDR adjusted p-value=1.52E-05 while in second case p-value=3.29E-05 and its adjusted p-value=0.999769? what the mean of both values, is first significant and second significant for p-value but due to high FDR value its non-significant or there is something other meaning?
    These are GWAS data from a diploid plant species for genome wide SNPs.
    Suggest what to do?

    Thanks for your replies.

    Vinod Kumar

    • Hi Vinod,
      The adjusted values that are below q=0.05 (or another q-level you may choose) can be declared as significant. The same would be obtained if, instead of the p-adjustment, the FDR threshold had been calculated, in which case the p-values below the threshold would be declared as significant. So, in the example you gave, for a q=0.05, the first can be considered significant, but not the second. In the initial list of p-values you posted, the first four p-values are significant at q=0.05, but not the remaining ones.
      Hope this helps!
      All the best,
      Anderson

  3. HI Anderson,
    Thanks for making the point clear to me. If I follow this one then in my GWAS using 6000 SNPs, only 10-15 SNPs are significant for only two traits while I’ve used 28 traits for my analysis. I am confused what to do with this? Is there any method which assure me that FDR-Adjusted p-values are correct and my all SNPs are null.
    Thanks,
    Vinod

    • Hi Vinod,
      Sounds you don’t have strong results… The method for adjustment is the same described in Yekutieli & Benjamini (1999) and should be correct. The FDR (either the B&H threshold or the Y&B adjustment) has the property that, if there are no true effects, it controls the FWE (i.e., the family-wise error rate). This may explain why in about 2 out of the 28 traits (~7%) significant results were found. However this doesn’t prove the null, only fails to reject it, as do most other methods…
      All the best!
      Anderson

  4. Hi,
    I have some confusion regrading results of GWAS conducted for a diploid plant species for Na content measured under control and under salt imposed condition. I got 32 highly significant (FDR corrected) SNPs only under control condition while I am not getting any of the significant SNP under salt imposed condition. Is it mean that my study is not fruitful or still I can say something about.
    Can you researchers here, help me in this regard.
    Thanks

    Vinod

  5. Hi Vinod,
    Your question sounds more a biology/GWAS-specific than FDR-related. If you did all correctly, then in principle, yes, you can say that there is no effect. But there may be other aspects to your study that may need to be checked before giving up entirely, with no bearing with multiple testing.
    All the best,
    Anderson

  6. Hi,
    I have a question regarding GWAS. Suppose, If I am getting 5 FDR corrected SNPs in MLM analysis while the same SNPs are not significant in GLM analysis and have higher p-values. What is mean by this? Do familial relationship is doing something here or what should I deduce from these results?
    Thanks,

    Vinod

  7. Pingback: Unrelated to all that, 2/21 edition | neuroecology

  8. Hi,
    How can we calculate FDR threshold? Is there any program we can calculate ? I am using Nexus Copy Numbers Program.
    Thanks for your replies.

    • Hi Tuğçe,
      Please, see in the main article the “fdr.m” file. It’s a function that can be used in either Matlab or Octave, and which will give the threshold and also corrected and adjusted p-values. I’m not familiar with Nexus so I can’t help on this one.
      All the best,
      Anderson

  9. Is there a mistake in the equation for FDR correction?
    It should be:
    q(i) = (p(i)*N)/i
    and not
    q(i) = pow(p(i),N)/i
    or am I getting something wrong?

    • Hi Nic,
      The formula in the article is correct, i.e., there is no power involved. The “(i)” is a subscript, and the p-values are multiplied by N, not raised to N.
      It might perhaps give this impression because of the small formatting in WordPress (the blog engine), but if you hover the mouse on the equation, it shows in the original LaTeX typing, which should disambiguate this.
      All the best,
      Anderson

  10. I am looking for an FDR analysis that takes account of the number of values that pass the uncorrected p criterion even if none pass the B & H (1995) criterion. Suppose I have a correlation table with 100 tests of which 20 have p values < 0.05, though none are < 0.01. By chance, we would expect only 5 to pass the < 0.05 criterion, so the implication is that 15 of the correlations are actually significant, although we don't know which ones.

    I am looking for support for two (novel) approaches that I am not finding in the literature. One is to say that there is a 75% probability (15/20) that each of these correlations are significant at p < 0.05. The other is to suggest that the 15 with the lowest p values are the ones most likely to actually be significant. I have not found any discussion of either of these option in the FDR literature and would appreciate an informed opinion about it.

    • Hi Christopher,

      I believe something along these lines may been done before. Please see: Wilkinson B. A statistical consideration in psychological research. Psychol Bull. 1951 May;48(3):156-8.

      The paper proposes computing a p-value for the probability of observing a certain number of significant findings when there are multiple tests using a binomial expansion.

      Specifically about the two approaches you mention, and in the context of FDR, I haven’t heard anything about.

      All the best,

      Anderson

  11. Hi

    In your great explanation you say:
    Let q*… be the FDR-adjusted value … where q is the FDR-corrected

    But in your code…
    padj(i) = min(prev,pval(i)*V*cV/i);
    when you should use pcor, not pval

    Greetings

    • Hi Howard,

      Thanks for commenting. The code is fine: pcor(i) = pval(i)*V*cV/i.

      Having said that, perhaps the code could be simplified, such that that line would read instead:

      padj(i) = min(prev,pcor(i));

      since pcor is already produced a few lines above, and thus pcor(i) wouldn’t need to be computed again at the i-th iteration. Regardless, this would only make the code just a tiny bit faster, but wouldn’t change the results.

      Thanks again for the feedback!

      All the best,

      Anderson

  12. Thanks so much for this. It has demystified a few things.

    Which version is implemented in FSL? Both in PALM’s FDR correction and the standalone fdr program?

    • Hi Tobias,
      Thanks for the comments. In FSL, the option “-a” in the command “fdr” gives the adjusted results. In PALM, the option “-fdr” also gives adjusted. Hope this helps!
      All the best,
      Anderson

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s