Bonferroni correction

In statistics, the Bonferroni correction is a method used to counteract the problem of multiple comparisons. It is named after Italian mathematician Carlo Emilio Bonferroni for its use of Bonferroni inequalities,[1] but modern usage is often credited to Olive Jean Dunn, who described the procedure in a pair of articles written in 1959 and 1961.[2][3]

Informal introduction

A common type of frequentist statistical inference logic (often referred to as Null-hypothesis significance-testing or NHST) is based on rejecting the null hypotheses if the likelihood of the observed data under the null hypotheses is low. The problem of multiplicity arises from the fact that as we increase the number of hypotheses being tested, we also increase the likelihood of a rare event, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., make a Type I error).

The Bonferroni correction is based on the idea that if an experimenter is testing m hypotheses, then one way of maintaining the familywise error rate (FWER) is to test each individual hypothesis at a statistical significance level of 1/m times the desired maximum overall level.

So, if the desired significance level for the whole family of tests is \alpha, then the Bonferroni correction would test each individual hypothesis at a significance level of \alpha/m. For example, if a trial is testing m = 8 hypotheses with a desired \alpha = 0.05, then the Bonferroni correction would test each individual hypothesis at \alpha = 0.05/8 = 0.00625.

Statistically significant simply means that a given result is unlikely to occur if the null hypothesis is true (i.e., no difference among groups, no effect of treatment, no relation among variables).

The practice of deliberately trying many comparisons in the hope of finding a significant one (for example, giving people a vitamin pill and then testing for many potential health improvements in the hope that the pill will appear beneficial in at least one way) is a known problem particularly seen in poor-quality scientific research, whether applied unintentionally or deliberately.[4] It is known as data dredging or p-hacking.[5][6]

Definition

Let H_{1},...,H_{m} be a family of hypotheses and p_{1},...,p_{m} their corresponding p-values. The familywise error rate (FWER) is the probability of rejecting at least one true H_{i}; that is, to make at least one type I error. The Bonferroni correction states that rejecting the null hypothesis for all p_{i}\leq\frac{\alpha}{m} controls the FWER. The proof follows from Boole's inequality:

FWER = P\left\{ \bigcup_{i=1}^{m_0}\left(p_{i}\leq\frac{\alpha}{m}\right)\right\} \leq\sum_{i=1}^{m_0}\left\{P\left(p_{i}\leq\frac{\alpha}{m}\right)\right\}\leq m_{0}\frac{\alpha}{m}\leq m\frac{\alpha}{m}=\alpha

This control does not require any assumptions about dependence among the p-values.[7]

Extensions

Generalization

Rather than testing each hypothesis at the \alpha/m level, the hypotheses may be tested at any combination of levels that add up to \alpha, provided that the level of each specific test is determined before looking at the data. For example, for two hypothesis tests, an overall \alpha of .05 could be maintained by conducting one test at .04 and the other at .01.

Confidence intervals

Bonferroni correction can be used to adjust confidence intervals. If we are forming m confidence intervals, and wish to have overall confidence level of 1-\alpha, we can adjust each individual confidence interval to the level of 1-\frac{\alpha}{m}.

Alternatives

There are other alternatives to control the familywise error rate. For example, the Holm–Bonferroni method and the Šidák correction are universally more powerful procedures than the Bonferroni correction, meaning that they are always at least as powerful. However, unlike the Bonferroni procedure, these methods do not control the per-family Type I error rate (the expected number of Type I errors per family).[8]

Criticisms

The Bonferroni correction can be somewhat conservative if there are a large number of tests and/or the test statistics are positively correlated. The correction also comes at the cost of increasing the probability of producing false negatives, and consequently reducing statistical power.

Another criticism concerns the concept of a family of hypotheses. There is not a definitive consensus on how to define a family in all cases. As there is no standard definition, test results may change dramatically, only by modifying the way we consider the hypotheses families.

All of these criticisms, however, apply to adjustments for multiple comparisons in general, and are not specific to the Bonferroni correction.

See also

References

  1. Bonferroni, C. E., Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 1936
  2. Dunn, Olive Jean (1959). "Estimation of the Medians for Dependent Variables". Annals of Mathematical Statistics 30 (1): 192–197. doi:10.1214/aoms/1177706374. JSTOR 2237135.
  3. Dunn, Olive Jean (1961). "Multiple Comparisons Among Means" (PDF). Journal of the American Statistical Association 56 (293): 52–64. doi:10.1080/01621459.1961.10482090.
  4. Young, S. S., Karr, A. (2011). "Deming, data and observational studies" (PDF). Significance 8 (3).
  5. Smith, G. D., Shah, E. (2002). "Data dredging, bias, or confounding". BMJ 325 (7378): 1437–1438. doi:10.1136/bmj.325.7378.1437. PMC 1124898. PMID 12493654.
  6. Bohannon, John. "I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here's How.". io9. Gawker Media. Retrieved 5 April 2016.
  7. Goeman, Jelle J.; Solari, Aldo (2014). "Multiple Hypothesis Testing in Genomics". Statistics in Medicine 33 (11). doi:10.1002/sim.6082.
  8. Frane, Andrew (2015). "Are per-family Type I error rates relevant in social and behavioral science?". Journal of Modern Applied Statistical Methods 14 (1): 12–23.

Further reading

External links

This article is issued from Wikipedia - version of the Friday, April 15, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.