Hoeffding's lemma

In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the FinnishAmerican mathematical statistician Wassily Hoeffding.

The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality. Hoeffding's lemma is itself used in the proof of McDiarmid's inequality.

Statement of the lemma

Let X be any real-valued random variable with expected value E[X] = 0 and such that a  X  b almost surely. Then, for all λ  R,

\mathbf{E} \left[ e^{\lambda X} \right] \leq \exp \left( \frac{\lambda^2 (b - a)^2}{8} \right).


Note that by the assumption that the random variable X has zero expectation, the a and b in the lemma must satisfy a \leq 0 and 0 \leq b.

Proof of the lemma

Since  e^{\lambda x} is a convex function of x, we have

e^{\lambda x}\leq \frac{b-x}{b-a}e^{\lambda a}+\frac{x-a}{b-a}e^{\lambda b}\qquad \forall a\leq x\leq b

So,  \mathbf{E}\left[e^{\lambda X}\right] \leq \frac{b-EX}{b-a}e^{\lambda a}+\frac{EX-a}{b-a}e^{\lambda b}.

Let  h=\lambda(b-a),  p=\frac{-a}{b-a} and  L(h)=-hp+\ln(1-p+pe^h)

Then, \frac{b-EX}{b-a}e^{\lambda a}+\frac{EX-a}{b-a}e^{\lambda b}=e^{L(h)} since  EX=0

Taking derivative of  L(h),

 L(0)=L^{'}(0)=0\text{ and } L^{''}(h)\leq \frac{1}{4}

By Taylor's expansion,

 L(h)\leq \frac{1}{8}h^2=\frac{1}{8}\lambda^2(b-a)^2

Hence,  \mathbf{E}\left[e^{\lambda X}\right] \leq e^{\frac{1}{8}\lambda^2(b-a)^2}

(The "alternative proof" below is the same proof with more explanation.)

Alternative proof

First note that if one of a or b is zero, then \textstyle\mathbb{P}\left(X=0\right)=1 and the inequality follows. If both are nonzero, then a must be negative and b must be positive.

Next, recall that esx is a convex function on the real line:

\forall x \in [a, b]: \qquad e^{sx}\leq \frac{b-x}{b-a}e^{sa}+\frac{x-a}{b-a}e^{sb}.

Applying E[ ⋅ ] to both sides of the above inequality gives us:

\begin{align}
\mathrm{E} \left [e^{sX} \right ] &\leq \frac{b-\mathrm{E}[X]}{b-a} e^{sa} + \frac{\mathrm{E}[X]-a}{b-a}e^{sb} \\
&= \frac{b}{b-a} e^{sa} + \frac{-a}{b-a}e^{sb} && \mathrm{E}[X] = 0\\
&= \left (-\frac{a}{b-a} \right ) e^{sa} \left (-\frac{b}{a}+e^{sb-sa} \right ) \\
&= \left (-\frac{a}{b-a} \right ) e^{sa} \left (-\frac{b-a+a}{a}+e^{s(b-a)} \right ) \\
&= \left (-\frac{a}{b-a} \right ) e^{sa} \left (-\frac{b-a}{a}-1+e^{s(b-a)} \right ) \\
&= \left (1-\theta+\theta e^{s(b-a)} \right ) e^{-s\theta(b-a)} && \theta=-\frac{a}{b-a}>0
\end{align}

Let u = s(ba) and define:

\begin{cases} \varphi:\mathbf{R}\to\mathbf{R} \\ \varphi(u)=-\theta u+\log \left(1-\theta+\theta e^u \right)\end{cases}

φ is well defined on R, to see this we calculate:

\begin{align}
1-\theta+\theta e^u &= \theta \left (\frac{1}{\theta} - 1 + e^u \right) \\
& = \theta \left ( -\frac{b}{a} + e^u \right ) \\
& > 0 && \theta > 0, \quad \frac{b}{a} <0
\end{align}

The definition of φ implies

\mathrm{E} \left [e^{sX} \right ] \leq e^{\varphi(u)}.

By Taylor's theorem, for every real u there exists a v between 0 and u such that

\varphi(u)=\varphi(0)+u\varphi'(0)+\tfrac{1}{2} u^2\varphi''(v).

Note that:

\begin{align}
\varphi(0)  &= 0 \\
\varphi'(0) &= -\theta+ \left.\frac{\theta e^u}{1-\theta +\theta e^u}\right|_{u=0} \\
&=0 \\[6pt]
\varphi''(v) &= \frac{\theta e^v \left (1-\theta+\theta e^v \right)-\theta^{2}e^{2v}}{\left (1-\theta+\theta e^v \right)^2 }\\[6pt]
&=\frac{\theta e^v}{1-\theta+\theta e^v}\left(1-\frac{\theta e^v}{1-\theta+\theta e^v}\right)\\[6pt]
&= t(1-t) && t=\frac{\theta e^v}{1-\theta+\theta e^v} \\
&\leq \tfrac{1}{4} && t > 0
\end{align}

Therefore,

\varphi (u)\leq 0 + u \cdot 0 + \tfrac{1}{2}u^2 \cdot \tfrac{1}{4} = \tfrac{1}{8} u^2 = \tfrac{1}{8}s^2(b-a)^2.

This implies

\mathrm{E} \left [e^{sX} \right ] \leq \exp\left(\tfrac{1}{8}s^2(b-a)^2\right).


See also


This article is issued from Wikipedia - version of the Wednesday, April 27, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.