Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.[1]

Venn diagram illustrating the relation between information entropies, mutual information and variation of information.

Definition

Suppose we have two partitions X and Y of a set A into disjoint subsets, namely X = \{X_{1}, X_{2}, ..,, X_{k}\}, Y = \{Y_{1}, Y_{2}, ..,, Y_{l}\}. Let n = \Sigma_{i} |X_{i}| = \Sigma_{j} |Y_{j}|=|A|, p_{i} = |X_{i}| / n , q_{j} = |Y_{j}| / n, r_{ij} = |X_i\cap Y_{j}| / n. Then the variation of information between the two partitions is:

VI(X; Y ) = - \sum_{i,j} r_{ij} \left[\log(r_{ij}/p_i)+\log(r_{ij}/q_j) \right].

This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on Adefined by \mu(B):=|B|/n for B\subseteq A.

Identities

The variation of information satisfies

VI(X; Y ) = H(X) + H(Y) - 2I(X, Y),

where H(X) is the entropy of X, and I(X, Y) is mutual information between X and Y with respect to the uniform probability measure on A. This can be rewritten as

VI(X; Y ) = H(X,Y) - I(X, Y),

where H(X,Y) is the joint entropy of X and Y, or

VI(X; Y ) = H(X|Y) + H(Y|X),

where H(X|Y) and H(Y|X) are the respective conditional entropies.

References

  1. Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, and Peter Grassberger, "Hierarchical Clustering Based on Mutual Information", (2003) ArXiv q-bio/0311039

Further reading

External links

This article is issued from Wikipedia - version of the Monday, March 14, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.