Bayes classifier

In statistical classification the Bayes classifier minimizes the probability of misclassification.^[1]

Definition

Suppose a pair $(X,Y)$ takes values in $\mathbb{R}^d \times \{1,2,\dots,K\}$ , where $Y$ is the class label of $X$ . This means that the conditional distribution of X, given that the label Y takes the value r is given by

X\mid Y=r \sim P_r

for

r=1,2,\dots,K

where " $\sim$ " means "is distributed as", and where $P_r$ denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function $C: \mathbb{R}^d \to \{1,2,\dots,K\}$ , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

\mathcal{R}(C) = \operatorname{P}\{C(X) \neq Y\}.

The Bayes classifier is

C^\text{Bayes}(x) = \underset{r \in \{1,2,\dots, K\}}{\operatorname{argmax}} \operatorname{P}(Y=r \mid X=x).

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, $\operatorname{P}(Y=r \mid X=x)$ . The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier $C$ (possibly depending on some training data) is defined as $\mathcal{R}(C) - \mathcal{R}(C^\text{Bayes}).$ Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.

References

↑ Devroye, L., Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.

This article is issued from Wikipedia - version of the Tuesday, April 19, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Bayes classifier

Definition

See also

References