Bregman divergence

In mathematics, a Bregman divergence or Bregman distance is similar to a metric, but does not satisfy the triangle inequality nor symmetry. There are three ways in which Bregman divergences are important. Firstly, they generalize squared Euclidean distance to a class of distances that all share similar properties. Secondly, they bear a strong connection to exponential families of distributions; as has been shown by (Banerjee et al. 2005), there is a bijection between regular exponential families and regular Bregman divergences. Finally, Bregman divergences appear in a natural way as regret functions in problems involving optimization over convex sets (Harremoës 2015).

Bregman divergences are named after Lev M. Bregman, who introduced the concept in 1967. More recently researchers in geometric algorithms have shown that many important algorithms can be generalized from Euclidean metrics to distances defined by Bregman divergence (Banerjee et al. 2005; Nielsen and Nock 2006; Boissonnat et al. 2010).

Definition

Let F: \Omega \to \mathbb{R} be a continuously-differentiable real-valued and strictly convex function defined on a closed convex set \Omega.

The Bregman distance associated with F for points p, q \in \Omega is the difference between the value of F at point p and the value of the first-order Taylor expansion of F around point q evaluated at point p:

D_F(p, q) = F(p)-F(q)-\langle \nabla F(q), p-q\rangle.

Properties

D_{F_1 + \lambda F_2}(p, q) = D_{F_1}(p, q) + \lambda D_{F_2}(p, q)
D_{F^*}(p^*, q^*) = D_F(q, p)
Here, p^* = \nabla F(p) and q^* = \nabla F(q) are the dual points corresponding to p and q.

Examples

D_F(p, q) = \sum p(i) \log \frac{p(i)}{q(i)} - \sum p(i) + \sum q(i)
is generated by the convex function
F(p) = \sum_i p(i)\log p(i) - \sum p(i)
D_F(p, q) = \sum_i \left(\frac {p(i)}{q(i)} - \log \frac{p(i)}{q(i)} - 1 \right)
is generated by the convex function
F(p) = - \sum \log p(i)

Generalizing projective duality

A key tool in computational geometry is the idea of projective duality, which maps points to hyperplanes and vice versa, while preserving incidence and above-below relationships. There are numerous analytical forms of the projective dual: one common form maps the point p = (p_1, \ldots p_d) to the hyperplane x_{d+1} = \sum_1^d 2p_i x_i. This mapping can be interpreted (identifying the hyperplane with its normal) as the convex conjugate mapping that takes the point p to its dual point p^* = \nabla F(p), where F defines the d-dimensional paraboloid x_{d+1} = \sum x_i^2.

If we now replace the paraboloid by an arbitrary convex function, we obtain a different dual mapping that retains the incidence and above-below properties of the standard projective dual. This implies that natural dual concepts in computational geometry like Voronoi diagrams and Delaunay triangulations retain their meaning in distance spaces defined by an arbitrary Bregman divergence. Thus, algorithms from "normal" geometry extend directly to these spaces (Boissonnat, Nielsen and Nock, 2010)

Bregman divergence on other objects

Bregman divergences can also be defined between matrices, between functions, and between measures (distributions). Bregman divergences between matrices include the Stein's loss and von Neumann entropy. Bregman divergences between functions include total squared error, relative entropy, and squared bias; see the references by Frigyik et al. below for definitions and properties. Similarly Bregman divergences have also been defined over sets, through a submodular set function which is known as the discrete analog of a convex function. The submodular Bregman divergences subsume a number of discrete distance measures, like the Hamming distance, precision and recall, mutual information and some other set based distance measures (see Iyer & Bilmes, 2012) for more details and properties of the submodular Bregman.)

For a list of common matrix Bregman divergences, see Table 15.1 in.[2]

References

  1. "Joint and separate convexity of the Bregman Distance", by H. Bauschke and J. Borwein, in D. Butnariu, Y. Censor, and S. Reich, editors, Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier 2001
  2. "Matrix Information Geometry", R. Nock, B. Magdalou, E. Briys and F. Nielsen, pdf, from this book

External links

This article is issued from Wikipedia - version of the Saturday, March 19, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.