Winsorized mean

A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after replacing given parts of a probability distribution or sample at the high and low end with the most extreme remaining values,^[1] typically doing so for an equal amount of both extremes; often 10 to 25 percent of the ends are replaced.

Advantages

The winsorized mean is a useful estimator because it is less sensitive to outliers than the mean but will still give a reasonable estimate of central tendency or mean for almost all statistical models. In this regard it is referred to as a robust estimator.

Drawbacks

The winsorized mean uses more information from the distribution or sample than the median. However, unless the underlying distribution is symmetric, the winsorized mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.

Example

For a sample of 10 numbers (from x₁, the smallest, to x₁₀ the largest) the 10% winsorized mean is

\frac{\overbrace{x_2 + x_2} + x_3 + x_4 + x_5 + x_6 + x_7 + x_8 + \overbrace{x_9 + x_9}}{10}. \,

The key is in the repetition of x₂ and x₉: the extras substitute for the original values x₁ and x₁₀ which have been discarded and replaced.

Notes

↑ Dodge, Y (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entry for "winsorized estimation")

References

Wilcox, R.R.; Keselman, H.J. (2003). "Modern robust data analysis methods: Measures of central tendency". Psychological Methods 8 (3): 254–274. doi:10.1037/1082-989X.8.3.254. PMID 14596490.

This article is issued from Wikipedia - version of the Friday, February 12, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.