Delta method

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

Univariate delta method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables Xn satisfying

{\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2)},

where θ and σ2 are finite valued constants and \xrightarrow{D} denotes convergence in distribution, then

{\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2[g'(\theta)]^2)}

for any function g satisfying the property that g′(θ) exists and is non-zero valued.

Proof in the univariate case

Demonstration of this result is fairly straightforward under the assumption that g′(θ) is continuous. To begin, we use the mean value theorem (i.e.: the first order approximation of a Taylor series using Taylor's theorem):

g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),

where \tilde{\theta} lies between Xn and θ. Note that since X_n\,\xrightarrow{P}\,\theta and X_n <  \tilde{\theta} < \theta  , it must be that \tilde{\theta} \,\xrightarrow{P}\,\theta and since g′(θ) is continuous, applying the continuous mapping theorem yields

g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),

where \xrightarrow{P} denotes convergence in probability.

Rearranging the terms and multiplying by \sqrt{n} gives

\sqrt{n}[g(X_n)-g(\theta)]=g' \left (\tilde{\theta} \right )\sqrt{n}[X_n-\theta].

Since

{\sqrt{n}[X_n-\theta] \xrightarrow{D} \mathcal{N}(0,\sigma^2)}

by assumption, it follows immediately from appeal to Slutsky's Theorem that

{\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} \mathcal{N}(0,\sigma^2[g'(\theta)]^2)}.

This concludes the proof.

Proof with an explicit order of approximation

Alternatively, one can add one more step at the end, to obtain the order of approximation:


\begin{align}
\sqrt{n}[g(X_n)-g(\theta)]&=g' \left (\tilde{\theta} \right )\sqrt{n}[X_n-\theta]=\sqrt{n}[X_n-\theta]\left[ g'(\tilde{\theta} )+g'(\theta)-g'(\theta)\right]\\
&=\sqrt{n}[X_n-\theta]\left[g'(\theta)\right]+\sqrt{n}[X_n-\theta]\left[ g'(\tilde{\theta} )-g'(\theta)\right]\\
&=\sqrt{n}[X_n-\theta]\left[g'(\theta)\right]+O_p(1)\cdot o_p(1)\\
&=\sqrt{n}[X_n-\theta]\left[g'(\theta)\right]+o_p(1)
\end{align}

This suggests that the error in the approximation converges to 0 in probability.

Multivariate delta method

By definition, a consistent estimator B converges in probability to its true value β, and often a central limit theorem can be applied to obtain asymptotic normality:

\sqrt{n}\left(B-\beta\right)\,\xrightarrow{D}\,N\left(0, \Sigma \right),

where n is the number of observations and Σ is a (symmetric positive semi-definite) covariance matrix. Suppose we want to estimate the variance of a function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as

h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)

which implies the variance of h(B) is approximately

\begin{align}
\operatorname{Var}\left(h(B)\right) & \approx \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\
 & = \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\
 & = \operatorname{Var}\left(\nabla h(\beta)^T \cdot B\right) \\
 & = \nabla h(\beta)^T \cdot \operatorname{Cov}(B) \cdot \nabla h(\beta) \\
 & = \nabla h(\beta)^T \cdot (\Sigma / n) \cdot \nabla h(\beta)
\end{align}

One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.

The delta method therefore implies that

\sqrt{n}\left(h(B)-h(\beta)\right)\,\xrightarrow{D}\,N\left(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)\right)

or in univariate terms,

\sqrt{n}\left(h(B)-h(\beta)\right)\,\xrightarrow{D}\,N\left(0, \sigma^2 \cdot \left(h^\prime(\beta)\right)^2 \right).

Example

Suppose Xn is Binomial with parameters  p \in (0,1] and n. Since

{\sqrt{n} \left[ \frac{X_n}{n}-p \right]\,\xrightarrow{D}\,N(0,p (1-p))},

we can apply the Delta method with g(θ) = log(θ) to see

{\sqrt{n} \left[ \log\left( \frac{X_n}{n}\right)-\log(p)\right] \,\xrightarrow{D}\,N(0,p (1-p) [1/p]^2)}

Hence, even though for any finite n, the variance of \log\left(\frac{X_n}{n}\right) does not actually exist (since Xn can be zero), the asymptotic variance of  \log \left( \frac{X_n}{n} \right) does exist and is equal to

 \frac{1-p}{p\,n}.

Note that since p>0,  \Pr \left( \frac{X_n}{n} > 0 \right) \rightarrow 1 as  n \rightarrow \infty , so with probability converging to one,  \log\left(\frac{X_n}{n}\right) is finite for large n.

Moreover, if \hat p and \hat q are estimates of different group rates from independent samples of sizes n and m respectively, then the logarithm of the estimated relative risk \frac{\hat p}{\hat q} has asymptotic variance equal to

 \frac{1-\hat p}{\hat p \, n}+\frac{1-\hat q}{\hat q \, m}.

This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.

Note

The delta method is often used in a form that is essentially identical to that above, but without the assumption that Xn or B is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are:

\begin{align}
\operatorname{Var} \left(h_r \right) = & \sum_i \left( \frac{\partial h_r}{\partial B_i} \right)^2 \operatorname{Var}\left( B_i \right) +  \sum_i \sum_{j \neq i} \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_r }{ \partial B_j } \right) \operatorname{Cov}\left( B_i, B_j \right) \\
\operatorname{Cov}\left( h_r, h_s \right) = & \sum_i \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{\partial h_s }{ \partial B_i } \right) \operatorname{Var}\left( B_i \right) + \sum_i \sum_{j \neq i} \left( \frac{\partial h_r}{\partial B_i} \right) \left(\frac{\partial h_s}{\partial B_j} \right) \operatorname{Cov}\left( B_i, B_j \right)
\end{align}

where hr is the rth element of h(B) and Biis the ith element of B. The only difference is that Klein stated these as identities, whereas they are actually approximations.

See also

References

    This article is issued from Wikipedia - version of the Thursday, April 28, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.