01.Association12.Multiple comparisons - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

Multiple comparison adjustments arise from the concern - in frequentist statistics - that running multiple tests will lead to spurious findings of statistical significance [1]. Despite the concern, studies often do not make that adjustment [2].
From a frequentist perspective, multiple comparisons are usually addressed through statistical adjustments such as the conservative Bonferroni correction or the False Discovery Rate (FDR). From a Bayesian perspective, this correction is usually not a concern, and multiple tests can often be [3] and [4]. FDR (false discovery ratio) is the expected proportion of false positives (type I errors), meaning those that are incorrectly called significant. Usually, we keep the false positive rate (FPR), or p-value to 5% (0.05), which means that we accept the number of false positives that is equal to 5%. While this percentage is acceptable for one test, if we are working with multiple tests in our data, then this 5% might result in a large number of false positives.
Yet another method includes Empirical Bayes, where the prior is obtained from the data.

A single project where more than one analysis has been conducted, oftentimes without an a priori specific hypothesis.

Although we rarely use frequentist models at SporeData, there are methods to minimize the impact of multiple comparisons [5].

Bonferroni adjustments divide the alpha level (p-value) by the number of tested hypotheses.
FDR is less strict than Bonferroni (which is classified as familywise error rate or FWER), its principle being to keep the FDR below a pre-specified threshold.The most used method to calculate FDR is the one developed by Benjamini-Hochberg, which basically consists in an analysis that corrects/adjusts the existing p-values by ranking them in an ascending order and applying the formula where the individual p-value’s rank is divided by the total number of tests, multiplied by the chosen alpha value, or false discovery rate. FDRs up to 0.15 might be used depending on the study purpose (E.g. transcriptome studies with exploratory analysis).
Empirical Bayes is a frequentist method where the prior distribution is derived from the data. Despite criticisms, the method has gained popularity when conducting exploratory analyses with a large number of tests [6]. This popularity arises from the empirical base being quick to set in comparison to Bayesian multilevel models [7].

[1] Dossett LA, Kaji AH, Dimick JB. Practical guide to mixed methods. JAMA surgery. 2020 Mar 1;155(3):254-5.
[2] Odutayo A, Gryaznov D, Copsey B, Monk P, Speich B, Roberts C, Vadher K, Dutton P, Briel M, Hopewell S, Altman DG. Design, analysis and reporting of multi-arm trials and strategies to address multiple testing. International Journal of Epidemiology. 2020 Mar 16.
[3] Gelman A, Hill J, Yajima M. Why we (usually) don't have to worry about multiple comparisons. Journal of Research on Educational Effectiveness. 2012 Apr 1;5(2):189-211.
[4] Models with grouped outcomes
[5] Yadav K, Lewis RJ. Gatekeeping strategies for avoiding false-positive results in clinical trials with many comparisons. Jama. 2017 Oct 10;318(14):1385-6.
[6] Gelman A. Objections to Bayesian statistics. Bayesian Analysis. 2008;3(3):445-9.
[7] van de Wiel MA, Te Beest DE, Münch MM. Learning from a lot: Empirical Bayes for high‐dimensional model‐based prediction. Scandinavian Journal of Statistics. 2019 Mar;46(1):2-5.
[8] Klaus B, Strimmer K. fdrtool: Estimation of (local) false discovery rates and higher criticism. R package version. 2015;1:15.
[9] False Discovery Rate Analysis in R
[10] Introducing the ebbr package for empirical Bayes estimation.