Šidák correction for t-test

One of the application of Student's t-test is to test the location of one sequence of independent and identically distributed random variables. If we want to test the locations of multiple sequences of such variables, Šidák correction should be applied in order to calibrate the level of the Student's t-test. Moreover, if we want to test the locations of nearly infinitely many sequences of variables, then Šidák correction should be used, but with caution. More specifically, the validity of Šidák correction depends on how fast the number of sequences goes to infinity.

Introduction of Šidák correction

Suppose we are interested in m different null hypotheses,   H_{1},...,H_{m} , and would like to check if all of them are true. Now the hypothesis test scheme becomes

 H_{null} : all of  H_{i} are true;
 H_{alternative}: at least one of  H_{i} is false.

Let  \alpha be the level of this test, that is, the probability that we falsely reject   H_{null} when it is true. Now we aim to design a test with certain level  \alpha . Suppose when testing each null hypothesis   H_{i}, the statistic we use is  t_{i}. If these  t_{i}'s are independent, then a test for   H_{null} can be developed by the following procedures, known as Šidák correction.

Step 1, we test each of m null hypotheses at level  1-(1-\alpha)^\frac{1}{m} .
Step 2, if any of these m null hypotheses is rejected, we reject   H_{null} .

Šidák correction for finitely many t-test

Suppose  Y_{ij}=\mu_{i}+\epsilon_{ij}, i=1,...,N, j=1,...,n, where for each i,  \epsilon_{i1},...,\epsilon_{in} are independently and identically distributed, for each j   \epsilon_{1j},...,\epsilon_{Nj} are independent but not necessarily identically distributed, and  \epsilon_{ij} has finite fourth moment.

Our goal is to design a test for  H_{null}: \mu_{i}=0, \forall i=1,...,N with level α. This test can be based on the t-statistic of each sequences, that is,

 t_{i}=\frac{\bar{Y}_{i}}{S_{i}/\sqrt{n}} , where
 \bar{Y}_{i}=\frac{1}{n}\sum_{j=1}^{n}Y_{ij} ,  S_{i}^{2}=\frac{1}{n}\sum_{j=1}^{n}(Y_{ij}-\bar{Y}_{i})^{2} .

Using Šidák correction, we reject  H_{null} if any of the t-tests based on the t-statistics above reject at level  1-(1-\alpha)^{1/N}. More specifically, we reject  H_{null} when

 \exists i=1,...,N, |t_{i}|> \zeta_{\alpha,N} , where
 P(|Z|>\zeta_{\alpha,N})=1-(1-\alpha)^{1/N},  Z\sim N(0,1)

The test defined above has asymptotic level α, because

 level = P_{null}(\text{reject } H_{null}) = P_{null}(\exists i=1,...,N, |t_{i}|>\zeta_{\alpha,N})
 = 1-P_{null}(\forall i=1,...,N |t_{i}|\leq\zeta_{\alpha,N}=1-\Pi_{i=1}^{N}P_{null}(|t_{i}|\leq\zeta_{\alpha,N})
 \rightarrow 1-\Pi_{i=1}^{N}P(|Z_{i}|\leq\zeta_{\alpha,N}) \text{ where } Z_{i}\sim N(0,1)
 =\alpha

Šidák correction for infinitely many t-test

In some cases, the number of sequences,  N , increase as the data size of each sequences,  n , increase. In particular, suppose  N(n)\rightarrow \infty \text{ as } n \rightarrow \infty . If this is true, then we will need to test a null including infinitely many hypotheses, that is

  H_{null}: \text{ all of } H_{i}  \text{ are true, } i=1,2,....

To design a test, Šidák correction may be applied, as in the case of finitely many t-test. However, when N(n)\rightarrow \infty \text{ as } n\rightarrow \infty, the Šidák correction for t-test may not achieve the level we want, that is, the true level of the test may not converges to the nominal level  \alpha as n goes to infinity. This result is related to high-dimensional statistics and is proven by Fan, Hall and Yao (2007).[1] Specifically, if we want the true level of the test converges to the nominal level  \alpha , then we need a restraint on how fast  N(n)\rightarrow \infty . Indeed,

The results above are based on Central Limit Theorem. According to Central Limit Theorem, each of our t-statistics  t_{i} possesses asymptotic standard normal distribution, and so the difference between the distribution of each  t_{i} and the standard normal distribution is asymptotically negligible. The question is, if we aggregate all the differences between the distribution of each  t_{i} and the standard normal distribution, is this aggregation of differences still asymptotically ignorable?

When we have finitely many  t_{i} , the answer is yes. But when we have infinitely many  t_{i} , the answer some time becomes no. This is because in the latter case we are summing up infinitely many infinitesimal terms. If the number of the terms goes to infinity too fast, that is,  N(n) \rightarrow \infty too fast, then the sum may not be zero, the distribution of the t-statistics can not be approximated by the standard normal distribution, the true level does not converges to the nominal level  \alpha , and then the Šidák correction fails.

See also

Notes

  1. Fan, Jianqing; Hall, Peter; Yao, Qiwei (2007). "To How Many Simultaneous Hypothesis Tests Can Normal, Student's t or Bootstrap Calibration Be Applied". Journal of American Statistical Association 102 (480): 1282–1288.

References

This article is issued from Wikipedia - version of the Tuesday, December 01, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.