Somers' D
In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two variables and
. Somers’ D takes values between
when all pairs of the variables disagree and
when all pairs of the variables agree. Somers’ D is named after R. H. Somers, who proposed it in 1962.[1]
Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of logistic regressions and credit scoring models.
Somers’ D for sample
We say that two pairs and
are concordant, if the ranks of both elements agree, or
and
or if
and
. We say that two pairs
and
are discordant, if the ranks of both elements disagree, or if
and
or if
and
. If
or
, the pair is neither concordant nor discordant.
Let be a set of observations of two possibly dependent random variables
and
. Define Kendall tau rank correlation coefficient
as
where is the number of concordant pairs and
is the number of discordant pairs. Somes’ D of
with respect to
is defined as
.
Note that Kendall's tau is symmetric in and
, whereas Somers’ D is asymmetric in
and
.
Somers’ D for distribution
Let two bivariate random variables and
are independently drawn from the same probability distribution
. Again, Somers’ D can be defined through Kendall's tau
or the difference between the probabilities of concordance and discordance. Somers’ D of with respect to
is defined as
. Thus,
is the difference between the two corresponding probabilities, conditional on the
values not being equal.
If
has continuous СDF, then
and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable
.
If and
are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:
Somers’ D for logistic regression
Several statistics can be used to measure quality of logistic regressions: AUC or c-statistic, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available rank order correlation statistics.[3] For being predicted probability of the outcome and
being the outcome, Somers’ D for logistic regression can be rewritten as
where is the number of pairs tied on variable
.
In logistic regressions, Somers’ D is related to the well-known area under the receiver operating characteristic curve (AUA), .
References
- ↑ Somers, R. H. 1962. A new asymmetric measure of association for ordinal variables. American Sociological Review 27: 799–811.
- ↑ Newson, Roger (2002). "Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences". Stata Journal 2 (1): 45–64.
- ↑ O'Connell, A. A. (2005) Logistic Regression Models for Ordinal Response Variables (Quantitative Applications in the Social Sciences). Ohio State University, USA.