Binary entropy function

Entropy of a Bernoulli trial as a function of success probability, called the binary entropy function.

In information theory, the binary entropy function, denoted $\operatorname H(p)$ or $\operatorname H_\text{b}(p)$ , is defined as the entropy of a Bernoulli process with probability of success $p$ . Mathematically, the Bernoulli trial is modelled as a random variable $X$ that can take on only two values: 0 and 1. The event $X = 1$ is considered a success and the event $X = 0$ is considered a failure. (These two events are mutually exclusive and exhaustive.)

If $\operatorname{Pr}(X=1) = p$ , then $\operatorname{Pr}(X=0) = 1-p$ and the entropy of $X$ (in shannons) is given by

\operatorname H(X) = \operatorname H_\text{b}(p) = -p \log_2 p - (1 - p) \log_2 (1 - p)

where $0 \log_2 0$ is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.

When $p=\tfrac 1 2$ , the binary entropy function attains its maximum value. This is the case of the unbiased bit, the most common unit of information entropy.

$\operatorname H(p)$ is distinguished from the entropy function $\operatorname H(X)$ in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variables as a parameter. Sometimes the binary entropy function is also written as $\operatorname H_2(p)$ . However, it is different from and should not be confused with the Rényi entropy, which is denoted as $\operatorname H_2(X)$ .

Explanation

In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose $p=0$ . At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If $p=1$ , the result is again certain, so the entropy is 0 here as well. When $p=1/2$ , the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if $p=1/4$ , there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.

Derivative

The derivative of the binary entropy function may be expressed as the negative of the logit function:

{d \over dp} H_\text{b}(p) = - \operatorname{logit}_2(p) = -\log_2\left( \frac{p}{1-p} \right)

Taylor series

The Taylor series of the binary entropy function in a neighborhood of 1/2 is

\operatorname H_\text{b}(p) = 1 - \frac{1}{2\ln 2} \sum^{\infin}_{n=1} \frac{(1-2p)^{2n}}{n(2n-1)}

for $0\le p\le 1$ .

References

MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1

This article is issued from Wikipedia - version of the Friday, April 15, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Binary entropy function

Explanation

Derivative

Taylor series

See also

References