Bayes classifier

In statistical classification the Bayes classifier minimizes the probability of misclassification.[1]

Definition

Suppose a pair (X,Y) takes values in \mathbb{R}^d \times \{1,2,\dots,K\}, where Y is the class label of X. This means that the conditional distribution of X, given that the label Y takes the value r is given by

X\mid Y=r \sim P_r for r=1,2,\dots,K

where "\sim" means "is distributed as", and where P_r denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function C: \mathbb{R}^d \to \{1,2,\dots,K\}, with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

\mathcal{R}(C)  = \operatorname{P}\{C(X) \neq Y\}.

The Bayes classifier is

C^\text{Bayes}(x) = \underset{r \in \{1,2,\dots, K\}}{\operatorname{argmax}} \operatorname{P}(Y=r \mid X=x).

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, \operatorname{P}(Y=r \mid X=x). The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier C (possibly depending on some training data) is defined as \mathcal{R}(C) - \mathcal{R}(C^\text{Bayes}). Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.

See also

References

  1. Devroye, L., Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.
This article is issued from Wikipedia - version of the Tuesday, April 19, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.