Somers' D

In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two variables X and Y. Somers’ D takes values between -1 when all pairs of the variables disagree and 1 when all pairs of the variables agree. Somers’ D is named after R. H. Somers, who proposed it in 1962.[1]

Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of logistic regressions and credit scoring models.

Somers’ D for sample

We say that two pairs (x_i,y_i) and (x_j,y_j) are concordant, if the ranks of both elements agree, or x_i>x_j and y_i>y_j or if x_i<x_j and y_i<y_j. We say that two pairs (x_i,y_i) and (x_j,y_j) are discordant, if the ranks of both elements disagree, or if x_i>x_j and y_i<y_j or if x_i<x_j and y_i>y_j. If x_i=x_j or y_i=y_j, the pair is neither concordant nor discordant.

Let (x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n) be a set of observations of two possibly dependent random variables X and Y. Define Kendall tau rank correlation coefficient \tau as

\tau=\frac{N_S-N_D}{n(n-1)/2},

where N_S is the number of concordant pairs and N_D is the number of discordant pairs. Somes’ D of Y with respect to X is defined as D_{YX}=\tau(X,Y)/\tau(X,X).

Note that Kendall's tau is symmetric in X and Y, whereas Somers’ D is asymmetric in X and Y.

Somers’ D for distribution

Let two bivariate random variables (X_1, Y_1) and (X_2, Y_2) are independently drawn from the same probability distribution \operatorname{P}_{XY}. Again, Somers’ D can be defined through Kendall's tau

\tau(X,Y)=\operatorname{E}(\sgn(X_1-X_2)\sgn(Y_1-Y_2))=\operatorname{P}(\sgn(X_1-X_2)\sgn(Y_1-Y_2)=1)-\operatorname{P}(\sgn(X_1-X_2)\sgn(Y_1-Y_2)=-1),

or the difference between the probabilities of concordance and discordance. Somers’ D of Y with respect to X is defined as D_{YX} =\tau(X,Y)/\tau(X,X). Thus, D_{YX} is the difference between the two corresponding probabilities, conditional on the X values not being equal. If X has continuous СDF, then \tau(X,X)=1 and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable X.

If X and Y are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:

D_{YX}=\operatorname{P}(Y=1 \mid X=1)-\operatorname{P}(Y=1\mid X=0).

Somers’ D for logistic regression

Several statistics can be used to measure quality of logistic regressions: AUC or c-statistic, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available rank order correlation statistics.[3] For X being predicted probability of the outcome and Y being the outcome, Somers’ D for logistic regression can be rewritten as

D_{YX}=\frac{N_S-N_D}{N_S+N_D+T_Y},

where T_Y is the number of pairs tied on variable Y.

In logistic regressions, Somers’ D is related to the well-known area under the receiver operating characteristic curve (AUA), AUC=D_{YX}/2+0.5.

References

  1. Somers, R. H. 1962. A new asymmetric measure of association for ordinal variables. American Sociological Review 27: 799–811.
  2. Newson, Roger (2002). "Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences". Stata Journal 2 (1): 4564.
  3. O'Connell, A. A. (2005) Logistic Regression Models for Ordinal Response Variables (Quantitative Applications in the Social Sciences). Ohio State University, USA.
This article is issued from Wikipedia - version of the Tuesday, April 05, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.