Binary entropy function

In information theory, the binary entropy function, denoted  or
 or  , is defined as the entropy of a Bernoulli process with probability of success
, is defined as the entropy of a Bernoulli process with probability of success  .   Mathematically, the Bernoulli trial is modelled as a random variable
.   Mathematically, the Bernoulli trial is modelled as a random variable  that can take on only two values: 0 and 1.  The event
 that can take on only two values: 0 and 1.  The event  is considered a success and the event
 is considered a success and the event  is considered a failure.  (These two events are mutually exclusive and exhaustive.)
 is considered a failure.  (These two events are mutually exclusive and exhaustive.)
If  , then
, then  and the entropy of
 and the entropy of  (in shannons) is given by
 (in shannons) is given by
 , ,
where  is taken to be 0.  The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.
 is taken to be 0.  The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.
When  , the binary entropy function attains its maximum value.  This is the case of the unbiased bit, the most common unit of information entropy.
, the binary entropy function attains its maximum value.  This is the case of the unbiased bit, the most common unit of information entropy.
 is distinguished from the entropy function
 is distinguished from the entropy function  in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variables as a parameter.
Sometimes the binary entropy function is also written as
 in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variables as a parameter.
Sometimes the binary entropy function is also written as  .
However, it is different from and should not be confused with the Rényi entropy, which is denoted as
.
However, it is different from and should not be confused with the Rényi entropy, which is denoted as  .
.
Explanation
In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose  . At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If
. At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If  , the result is again certain, so the entropy is 0 here as well. When
, the result is again certain, so the entropy is 0 here as well. When  , the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if
, the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if  , there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.
, there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.
Derivative
The derivative of the binary entropy function may be expressed as the negative of the logit function:
 . .
Taylor series
The Taylor series of the binary entropy function in a neighborhood of 1/2 is
for  .
.
See also
References
- MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
