Somers' D
In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two variables and . Somers’ D takes values between when all pairs of the variables disagree and when all pairs of the variables agree. Somers’ D is named after R. H. Somers, who proposed it in 1962.[1]
Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of logistic regressions and credit scoring models.
Somers’ D for sample
We say that two pairs and are concordant, if the ranks of both elements agree, or and or if and . We say that two pairs and are discordant, if the ranks of both elements disagree, or if and or if and . If or , the pair is neither concordant nor discordant.
Let be a set of observations of two possibly dependent random variables and . Define Kendall tau rank correlation coefficient as
where is the number of concordant pairs and is the number of discordant pairs. Somes’ D of with respect to is defined as .
Note that Kendall's tau is symmetric in and , whereas Somers’ D is asymmetric in and .
Somers’ D for distribution
Let two bivariate random variables and are independently drawn from the same probability distribution . Again, Somers’ D can be defined through Kendall's tau
or the difference between the probabilities of concordance and discordance. Somers’ D of with respect to is defined as . Thus, is the difference between the two corresponding probabilities, conditional on the values not being equal. If has continuous СDF, then and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable .
If and are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:
Somers’ D for logistic regression
Several statistics can be used to measure quality of logistic regressions: AUC or c-statistic, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available rank order correlation statistics.[3] For being predicted probability of the outcome and being the outcome, Somers’ D for logistic regression can be rewritten as
where is the number of pairs tied on variable .
In logistic regressions, Somers’ D is related to the well-known area under the receiver operating characteristic curve (AUA), .
References
- ↑ Somers, R. H. 1962. A new asymmetric measure of association for ordinal variables. American Sociological Review 27: 799–811.
- ↑ Newson, Roger (2002). "Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences". Stata Journal 2 (1): 45–64.
- ↑ O'Connell, A. A. (2005) Logistic Regression Models for Ordinal Response Variables (Quantitative Applications in the Social Sciences). Ohio State University, USA.