Fano's inequality

In information theory, Fano's inequality (also known as the Fano converse and the Fano lemma) relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D. seminar in information theory at MIT, and later recorded in his 1961 textbook.

It is used to find a lower bound on the error probability of any decoder as well as the lower bounds for minimax risks in density estimation.

Let the random variables X and Y represent input and output messages with a joint probability $P(x,y)$ . Let e represent an occurrence of error; i.e., that $X\neq \tilde{X}$ , being $\tilde{X}=f(Y)$ a noise approximate version of $X$ . Fano's inequality is

H(X|Y)\leq H(e)+P(e)\log(|\mathcal{X}|-1),

where $\mathcal{X}$ denotes the support of X,

H\left(X|Y\right)=-\sum_{i,j} P(x_i,y_j)\log P\left(x_i|y_j\right)

is the conditional entropy,

P(e)=P(X\neq \tilde{X})

is the probability of the communication error, and

H(e)=-P(e)\log P(e)-(1-P(e))\log(1-P(e))

is the corresponding binary entropy.

Alternative formulation

Let X be a random variable with density equal to one of $r+1$ possible densities $f_1,\ldots,f_{r+1}$ . Furthermore, the Kullback–Leibler divergence between any pair of densities cannot be too large,

D_{KL}(f_i\|f_j)\leq \beta

for all

i\not = j.

Let $\psi(X)\in\{1,\ldots, r+1\}$ be an estimate of the index. Then

\sup_i P_i(\psi(X)\not = i) \geq 1-\frac{\beta+\log 2}{\log r}

where $P_i$ is the probability induced by $f_i$

Generalization

The following generalization is due to Ibragimov and Khasminskii (1979), Assouad and Birge (1983).

Let F be a class of densities with a subclass of r + 1 densities ƒ_θ such that for any θ ≠ θ′

\|f_{\theta}-f_{\theta'}\|_{L_1}\geq \alpha,\,

D_{KL}(f_\theta\|f_{\theta'})\leq \beta.\,

Then in the worst case the expected value of error of estimation is bound from below,

\sup_{f\in \mathbf{F}} E \|f_n-f\|_{L_1}\geq \frac{\alpha}{2}\left(1-\frac{n\beta+\log 2}{\log r}\right)

where ƒ_n is any density estimator based on a sample of size n.

References

P. Assouad, "Deux remarques sur l'estimation", Comptes Rendus de L'Academie des Sciences de Paris, Vol. 296, pp. 1021–1024, 1983.
L. Birge, "Estimating a density under order restrictions: nonasymptotic minimax risk", Technical report, UER de Sciences Économiques, Universite Paris X, Nanterre, France, 1983.
T. Cover, J. Thomas, Elements of Information Theory. pp. 43.
L. Devroye, A Course in Density Estimation. Progress in probability and statistics, Vol 14. Boston, Birkhauser, 1987. ISBN 0-8176-3365-0, ISBN 3-7643-3365-0.
R. Fano, Transmission of information; a statistical theory of communications. Cambridge, Massachusetts, M.I.T. Press, 1961. ISBN 0-262-06001-9
R. Fano, Fano inequality Scholarpedia, 2008.
I. A. Ibragimov, R. Z. Has′minskii, Statistical estimation, asymptotic theory. Applications of Mathematics, vol. 16, Springer-Verlag, New York, 1981. ISBN 0-387-90523-5

This article is issued from Wikipedia - version of the Wednesday, April 29, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.