Somers' D

In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two variables $X$ and $Y$ . Somers’ D takes values between $-1$ when all pairs of the variables disagree and $1$ when all pairs of the variables agree. Somers’ D is named after R. H. Somers, who proposed it in 1962.^[1]

Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.^[2] It is also used as a quality measure of logistic regressions and credit scoring models.

Somers’ D for sample

We say that two pairs $(x_i,y_i)$ and $(x_j,y_j)$ are concordant, if the ranks of both elements agree, or $x_i>x_j$ and $y_i>y_j$ or if $x_i<x_j$ and $y_i<y_j$ . We say that two pairs $(x_i,y_i)$ and $(x_j,y_j)$ are discordant, if the ranks of both elements disagree, or if $x_i>x_j$ and $y_i<y_j$ or if $x_i<x_j$ and $y_i>y_j$ . If $x_i=x_j$ or $y_i=y_j$ , the pair is neither concordant nor discordant.

Let $(x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)$ be a set of observations of two possibly dependent random variables $X$ and $Y$ . Define Kendall tau rank correlation coefficient $\tau$ as

\tau=\frac{N_S-N_D}{n(n-1)/2},

where $N_S$ is the number of concordant pairs and $N_D$ is the number of discordant pairs. Somes’ D of $Y$ with respect to $X$ is defined as $D_{YX}=\tau(X,Y)/\tau(X,X)$ .

Note that Kendall's tau is symmetric in $X$ and $Y$ , whereas Somers’ D is asymmetric in $X$ and $Y$ .

Somers’ D for distribution

Let two bivariate random variables $(X_1, Y_1)$ and $(X_2, Y_2)$ are independently drawn from the same probability distribution $\operatorname{P}_{XY}$ . Again, Somers’ D can be defined through Kendall's tau

\tau(X,Y)=\operatorname{E}(\sgn(X_1-X_2)\sgn(Y_1-Y_2))=\operatorname{P}(\sgn(X_1-X_2)\sgn(Y_1-Y_2)=1)-\operatorname{P}(\sgn(X_1-X_2)\sgn(Y_1-Y_2)=-1),

or the difference between the probabilities of concordance and discordance. Somers’ D of $Y$ with respect to $X$ is defined as $D_{YX} =\tau(X,Y)/\tau(X,X)$ . Thus, $D_{YX}$ is the difference between the two corresponding probabilities, conditional on the $X$ values not being equal. If $X$ has continuous СDF, then $\tau(X,X)=1$ and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable $X$ .

If $X$ and $Y$ are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:

D_{YX}=\operatorname{P}(Y=1 \mid X=1)-\operatorname{P}(Y=1\mid X=0).

Somers’ D for logistic regression

Several statistics can be used to measure quality of logistic regressions: AUC or c-statistic, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available rank order correlation statistics.^[3] For $X$ being predicted probability of the outcome and $Y$ being the outcome, Somers’ D for logistic regression can be rewritten as

D_{YX}=\frac{N_S-N_D}{N_S+N_D+T_Y},

where $T_Y$ is the number of pairs tied on variable $Y$ .

In logistic regressions, Somers’ D is related to the well-known area under the receiver operating characteristic curve (AUA), $AUC=D_{YX}/2+0.5$ .

References

↑ Somers, R. H. 1962. A new asymmetric measure of association for ordinal variables. American Sociological Review 27: 799–811.
↑ Newson, Roger (2002). "Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences". Stata Journal 2 (1): 45–64.
↑ O'Connell, A. A. (2005) Logistic Regression Models for Ordinal Response Variables (Quantitative Applications in the Social Sciences). Ohio State University, USA.

This article is issued from Wikipedia - version of the Tuesday, April 05, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.