Rademacher complexity

In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution.

Given a training sample $S=(z_1, z_2, \dots, z_m) \in Z^m$ , and a class $\mathcal{H}$ of real-valued functions defined on a domain space $Z$ , the empirical Rademacher complexity of $\mathcal{H}$ is defined as:

$\widehat{\mathcal{R}}_S(\mathcal{H}) = \frac{2}{m} \mathbb{E} \left[ \sup_{h \in \mathcal{H}} \left| \sum_{i=1}^m \sigma_i h(z_i) \right| \ \bigg| \ S \right]$

where $\sigma_1, \sigma_2, \dots, \sigma_m$ are independent random variables drawn from the Rademacher distribution i.e. $\Pr(\sigma_i = +1) = \Pr(\sigma_i = -1) = 1/2$ for $i=1,2,\dots,m$ .

Let $P$ be a probability distribution over $Z$ . The Rademacher complexity of the function class $\mathcal{H}$ with respect to $P$ for sample size $m$ is:

$\mathcal{R}_m(\mathcal{H}) = \mathbb{E} \left[ \widehat{\mathcal{R}}_S(\mathcal{H}) \right]$

where the above expectation is taken over an identically independently distributed (i.i.d.) sample $S=(z_1, z_2, \dots, z_m)$ generated according to $P$ .

One can show, for example, that there exists a constant $C$ , such that any class of $\{0,1\}$ -indicator functions with Vapnik-Chervonenkis dimension $d$ has Rademacher complexity upper-bounded by $C\sqrt{\frac{d}{m}}$ .

Gaussian complexity

Gaussian complexity is a similar complexity with similar physical meanings, and can be obtained from the previous complexity using the random variables $g_i$ instead of $\sigma_i$ , where $g_i$ are Gaussian i.i.d. random variables with zero-mean and variance 1, i.e. $g_i \sim \mathcal{N} \left( 0, 1 \right)$ .

References

Peter L. Bartlett, Shahar Mendelson (2002) Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research 3 463-482
Giorgio Gnecco, Marcello Sanguineti (2008) Approximation Error Bounds via Rademacher's Complexity. Applied Mathematical Sciences, Vol. 2, 2008, no. 4, 153 - 176

This article is issued from Wikipedia - version of the Wednesday, March 30, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.