Cramér's V

In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φc) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.[1]

Usage and interpretation

φc is the intercorrelation of two discrete variables[2] and may be used with variables having two or more levels. φc is a symmetrical measure, it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns doesn't matter, so φc may be used with nominal data types or higher (ordered, numerical, etc.)

Cramér's V may also be applied to goodness of fit chi-squared models when there is a 1×k table (e.g.: r=1). In this case k is taken as the number of optional outcomes and it functions as a measure of tendency towards a single outcome.

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when the two variables are equal to each other.

φc2 is the mean square canonical correlation between the variables.

In the case of a 2×2 contingency table Cramér's V is equal to the Phi coefficient.

Note that as chi-squared values tend to increase with the number of cells, the greater the difference between r (rows) and c (columns), the more likely φc will tend to 1 without strong evidence of a meaningful correlation.

V may be viewed as the association between two variables as a percentage of their maximum possible variation. V2 is the mean square canonical correlation between the variables.

Calculation

Let a sample of size n of the simultaneously distributed variables A and B for i=1,\ldots,r; j=1,\ldots,k be given by the frequencies

n_{ij}= number of times the values (A_i,B_j) were observed.

The chi-squared statistic then is:

\chi^2=\sum_{i,j}\frac{(n_{ij}-\frac{n_{i.}n_{.j}}{n})^2}{\frac{n_{i.}n_{.j}}{n}}

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

V = \sqrt{\frac{\varphi^2}{\min(k - 1,r-1)}} = \sqrt{ \frac{\chi^2/n}{\min(k - 1,r-1)}} 

where:

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.

The formula for the variance of Vc is known.[3]

In R, the function cramersV() from the lsr package, calculates V using the chisq.test function from the stats package.[4]

Bias correction

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association.[5] A 2013 paper[5] proposes the following simple and effective bias correction. Using the above notation, let

\tilde V = \sqrt{\frac{\tilde\varphi^2}{\min(k - 1,r-1)}}  

where

 \tilde\varphi^2 = \max\left(0,\varphi^2 - \frac{(k-1)(r-1)}{n-1}\right)  

Then \tilde V estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, E\varphi^2=\frac{(k-1)(r-1)}{n-1}[6]

See also

Other measures of correlation for nominal data:

Other related articles:

References

  1. Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, p282. ISBN 0-691-08004-6
  2. Sheskin, David J. (1997). Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.
  3. Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32. (pages 15–16)
  4. http://artax.karlin.mff.cuni.cz/r-help/library/lsr/html/cramersV.html
  5. 1 2 Bergsma, Wicher. 2013. A bias correction for Cramér's V and Tschuprow's T. Journal of the Korean Statistical Society 42 (2013): 323-328
  6. Bartlett, Maurice S (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Society of London (Series A): 268-282.

External links

This article is issued from Wikipedia - version of the Thursday, February 18, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.