Power transform

In statistics, a power transform is a family of functions that are applied to create a monotonic transformation of data using power functions. This is a useful data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association such as the Pearson correlation between variables and for other data stabilization procedures.

Definition

The power transformation is defined as a continuously varying function, with respect to the power parameter λ, in a piece-wise function form that makes it continuous at the point of singularity (λ = 0). For data vectors (y1,..., yn) in which each yi > 0, the power transform is

y_i^{(\lambda)} =
\begin{cases}
\dfrac{y_i^\lambda-1}{\lambda(\operatorname{GM}(y))^{\lambda -1}} , &\text{if } \lambda \neq 0 \\[12pt]
\operatorname{GM}(y)\ln{y_i} , &\text{if } \lambda = 0
\end{cases}

where

 \operatorname{GM}(y) = (y_1\cdots y_n)^{1/n} \,

is the geometric mean of the observations y1, ..., yn. The case for \lambda = 0 is the limit as \lambda approaches 0. To see this, note that y_i^{\lambda} = \operatorname{exp}({\lambda \operatorname{log}(y_i)}) = 1 + \lambda \operatorname{log}(y_i) + O((\lambda \operatorname{log}(y_i))^2). Then \dfrac{y_i^\lambda-1}\lambda = \operatorname{log}(y_i) + O(\lambda), and everything but \operatorname{log}(y_i) becomes negligible for \lambda sufficiently small.

The inclusion of the (λ  1)th power of the geometric mean in the denominator simplifies the scientific interpretation of any equation involving y_i^{(\lambda)}, because the units of measurement do not change as λ changes.

Box and Cox (1964) introduced the geometric mean into this transformation by first including the Jacobian of rescaled power transformation

 \dfrac{y^\lambda-1}{\lambda} .

with the likelihood. This Jacobian is as follows:

 J(\lambda; y_1, ..., y_n) = \prod_{i=1}^n |d y_i^{(\lambda)} / dy|
= \prod_{i=1}^n y_i^{\lambda-1}
= \operatorname{GM}(y)^{n(\lambda-1)}

This allows the normal log likelihood at its maximum to be written as follows:


   \log ( \mathcal{L} (\hat\mu,\hat\sigma)) = (-n/2)(\log(2\pi\hat\sigma^2) +1) +
n(\lambda-1) \log(\operatorname{GM}(y))

       = (-n/2)(\log(2\pi\hat\sigma^2 / \operatorname{GM}(y)^{2(\lambda-1)}) + 1).

From here, absorbing \operatorname{GM}(y)^{2(\lambda-1)} into the expression for \hat\sigma^2 produces an expression that establishes that minimizing the sum of squares of residuals from y_i^{(\lambda)} is equivalent to maximizing the sum of the normal log likelihood of deviations from (y^\lambda-1)/\lambda and the log of the Jacobian of the transformation.

The value at Y = 1 for any λ is 0, and the derivative with respect to Y there is 1 for any λ. Sometimes Y is a version of some other variable scaled to give Y = 1 at some sort of average value.

The transformation is a power transformation, but done in such a way as to make it continuous with the parameter λ at λ = 0. It has proved popular in regression analysis, including econometrics.

Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter.

\tau(y_i;\lambda, \alpha) = \begin{cases} \dfrac{(y_i + \alpha)^\lambda - 1}{\lambda (\operatorname{GM}(y+\alpha))^{\lambda - 1}} & \text{if } \lambda\neq 0, \\  \\
\operatorname{GM}(y+\alpha)\ln(y_i + \alpha)& \text{if } \lambda=0,\end{cases}

which holds if yi + α > 0 for all i. If τ(Y, λ, α) follows a truncated normal distribution, then Y is said to follow a Box–Cox distribution.

Bickel and Doksum eliminated the need to use a truncated distribution by extending the range of the transformation to all y, as follows:

\tau(y_i;\lambda, \alpha) = \begin{cases}
\dfrac{\operatorname{sgn}(y_i + \alpha)|y_i + \alpha|^\lambda - 1}{\lambda (\operatorname{GM}(y+\alpha))^{\lambda - 1}} & \text{if } \lambda\neq 0, \\  \\
\operatorname{GM}(y+\alpha)\operatorname{sgn}(y+\alpha)\ln(y_i + \alpha)& \text{if } \lambda=0,\end{cases},

where sgn(.) is the Sign function. This change in definition has little practical import as long as \alpha is less than \operatorname{min}(y_i), which it usually is.[1]

Bickel and Doksum also proved that the parameter estimates are consistent and asymptotically normal under appropriate regularity conditions, though the standard Cramér–Rao lower bound can substantially underestimate the variance when parameter values are small relative to the noise variance.[1] However, this problem of underestimating the variance may not be a substantive problem in many applications.[2][3]

Box–Cox transformation

The one-parameter Box–Cox transformations are defined as:

y_i^{(\lambda)} =
\begin{cases}
\dfrac{y_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0, \\[8pt]
\ln{(y_i)} & \text{if } \lambda = 0,
\end{cases}

and the two-parameter Box-Cox transformations as:

y_i^{(\boldsymbol{\lambda})} =
\begin{cases}
\dfrac{(y_i + \lambda_2)^{\lambda_1} - 1}{\lambda_1} & \text{if } \lambda_1 \neq 0, \\[8pt]
\ln{(y_i + \lambda_2)} & \text{if } \lambda_1 = 0,
\end{cases}

as described in the original article.[4][5] Moreover, the first transformations hold for y_i > 0 and the second for y_i > - \lambda_2.[4]

The parameter \lambda is estimated using the profile likelihood function.

Confidence interval

Confidence interval for the Box-Cox transformation can be asymptotically constructed using Wilks's theorem on the profile likelihood function to find all the possible values of \lambda that fulfill the following restriction:[6]

\ln \left( L\left( \lambda \right) \right)\ge \ln \left( L\left( {\hat{\lambda }} \right) \right)-\frac{1}{2}{{\chi }^{2}}_{1,1-\alpha }

Use of the power transform

Example

The BUPA liver data set[8] contains data on liver enzymes ALT and γGT. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help.

The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ12/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs.

Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line.

Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a robust regression approach leads to a more precise model.

Econometric application

Economists often characterize production relationships by some variant of the Box–Cox transformation.

Consider a common representation of production Q as dependent on services provided by a capital stock K and by labor hours N:

\tau(Q)=\alpha \tau(K)+ (1-\alpha)\tau(N).\,

Solving for Q by inverting the Box–Cox transformation we find

Q=\big(\alpha K^\lambda + (1-\alpha) N^\lambda\big)^{1/\lambda},\,

which is known as the constant elasticity of substitution (CES) production function.

The CES production function is a homogeneous function of degree one.

When λ = 1, this produces the linear production function:

Q=\alpha K + (1-\alpha)N.\,

When λ → 0 this produces the famous Cobb–Douglas production function:

Q=K^\alpha N^{1-\alpha}.\,

Activities and demonstrations

The SOCR resource pages contain a number of hands-on interactive activities[9] demonstrating the Box–Cox (Power) Transformation using Java applets and charts. These directly illustrate the effects of this transform on Q-Q plots, X-Y scatterplots, time-series plots and histograms.

Notes

  1. 1 2 Bickel, Peter J.; Doksum, Kjell A. (June 1981). "An analysis of transformations revisited". Journal of the American Statistical Association (American Statistical Association) 76 (374): 296–311. doi:10.1080/01621459.1981.10477649.
  2. Sakia, R. M. (1992), "The Box-Cox transformation technique: a review", The Statistician 41: 169–178, doi:10.2307/2348250
  3. Li, Fengfei (April 11, 2005), Box-Cox Transformations: An Overview (PDF) (slide presentation), Sao Paulo, Brazil: University of Sao Paulo, Brazil, retrieved 2014-11-02
  4. 1 2 Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B 26 (2): 211–252. JSTOR 2984418. MR 192611.
  5. Johnston, J. (1984). Econometric Methods (Third ed.). New York: McGraw-Hill. pp. 61–74. ISBN 0-07-032685-1.
  6. Abramovich, Felix, and Ya'acov Ritov. Statistical Theory: A Concise Introduction. CRC Press, 2013. Pages 121-122
  7. Peters, J. L.; Rushton, L.; Sutton, A. J.; Jones, D. R.; Abrams, K. R.; Mugglestone, M. A. (2005). "Bayesian methods for the cross-design synthesis of epidemiological and toxicological evidence". Journal of the Royal Statistical Society: Series C (Applied Statistics) 54: 159. doi:10.1111/j.1467-9876.2005.00476.x.
  8. BUPA liver disorder dataset
  9. Power Transform Family Graphs, SOCR webpages

References

External links

This article is issued from Wikipedia - version of the Sunday, March 27, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.