Proof of Stein's example
Stein's example is an important result in decision theory which can be stated as
- The ordinary decision rule for estimating the mean of a multivariate Gaussian distribution is inadmissible under mean squared error risk in dimension at least 3.
The following is an outline of its proof. The reader is referred to the main article for more information.
Sketched proof
The risk function of the decision rule
is
Now consider the decision rule
where
. We will show that
is a better decision rule than
. The risk function is
— a quadratic in
. We may simplify the middle term by considering a general "well-behaved" function
and using integration by parts. For
, for any continuously differentiable
growing sufficiently slowly for large
we have:
Therefore,
(This result is known as Stein's lemma.)
Now, we choose
If
met the "well-behaved" condition (it doesn't, but this can be remedied -- see below), we would have
and so
Then returning to the risk function of
:
This quadratic in
is minimized at
giving
which of course satisfies:
making
an inadmissible decision rule.
It remains to justify the use of
This function is not continuously differentiable since it is singular at
. However the function
is continuously differentiable, and after following the algebra through and letting
one obtains the same result.
![R(\theta,d) = \mathbb{E}_\theta[ |\mathbf{\theta - X}|^2]](../I/m/ee09e0123e7b3b4eb17dd3fe37989901.png)



![R(\theta,d') = \mathbb{E}_\theta\left[ \left|\mathbf{\theta - X} + \frac{\alpha}{|\mathbf{X}|^2}\mathbf{X}\right|^2\right]](../I/m/578c9d96c5591cff54d9a25f1dc10154.png)
![= \mathbb{E}_\theta\left[ |\mathbf{\theta - X}|^2 + 2(\mathbf{\theta - X})^T\frac{\alpha}{|\mathbf{X}|^2}\mathbf{X} + \frac{\alpha^2}{|\mathbf{X}|^4}|\mathbf{X}|^2 \right]](../I/m/aea90ce285572ef77dbbb417c07ec26d.png)
![= \mathbb{E}_\theta\left[ |\mathbf{\theta - X}|^2 \right] + 2\alpha\mathbb{E}_\theta\left[\frac{\mathbf{(\theta-X)^T X}}{|\mathbf{X}|^2}\right] + \alpha^2\mathbb{E}_\theta\left[\frac{1}{|\mathbf{X}|^2} \right]](../I/m/640acfb1ee59eb96748d4cdde11422cd.png)
![\mathbb{E}_\theta [ (\theta_i - X_i) h(\mathbf{X}) | X_j=x_j (j\neq i) ]= \int (\theta_i - x_i) h(\mathbf{x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{ -(1/2)\mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } m(dx_i)](../I/m/ad47a5da54e2ab334b74833c471b7955.png)
![= \left[ h(\mathbf{x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{-(1/2) \mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } \right]^\infty_{x_i=-\infty}
- \int \frac{\partial h}{\partial x_i}(\mathbf{x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{-(1/2)\mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } m(dx_i)](../I/m/5342babec5a84b8e77ea6e4c08c15cd4.png)
![= - \mathbb{E}_\theta \left[ \frac{\partial h}{\partial x_i}(\mathbf{X}) | X_j=x_j (j\neq i) \right].](../I/m/2d314f488e471c38a1c6fd15fe0cff49.png)
![\mathbb{E}_\theta [ (\theta_i - X_i) h(\mathbf{X})]= - \mathbb{E}_\theta \left[ \frac{\partial h}{\partial x_i}(\mathbf{X}) \right].](../I/m/66ce709f2e9f03ac4aa1fe656909af4a.png)


![\mathbb{E}_\theta\left[\frac{\mathbf{(\theta-X)^T X}}{|\mathbf{X}|^2}\right] = \sum_{i=1}^n \mathbb{E}_\theta \left[ (\theta_i - X_i) \frac{X_i}{|\mathbf{X}|^2} \right]](../I/m/bf85c787d654114ed846f979d9ec582c.png)
![= - \sum_{i=1}^n \mathbb{E}_\theta \left[ \frac{1}{|\mathbf{X}|^2} - \frac{2 X_i^2}{|\mathbf{X}|^4} \right]](../I/m/b4520d0d047012a1d0850ccf05e0928c.png)
![= -(n-2)\mathbb{E}_\theta \left[\frac{1}{|\mathbf{X}|^2}\right].](../I/m/38ff741e18ccecf88dc3c2a5320d2c7d.png)
![R(\theta,d') = n - 2\alpha(n-2)\mathbb{E}_\theta\left[\frac{1}{|\mathbf{X}|^2}\right] + \alpha^2\mathbb{E}_\theta\left[\frac{1}{|\mathbf{X}|^2} \right].](../I/m/d263ac469e22a392cac096e44289d2ac.png)

![R(\theta,d') = R(\theta,d) - (n-2)^2\mathbb{E}_\theta\left[\frac{1}{|\mathbf{X}|^2} \right]](../I/m/8ed9168fa566506d101ade76305495e3.png)


