Nearest neighbour classifiers
Nearest neighbour classifiers are a class of non-parametric methods used in statistical classification (or pattern recognition). The method classifies objects based on closest training examples in the feature space.
Statistical setting
Suppose we have pairs taking values in , where Y is the class label of X, so that for (and probability distributions ). Given some norm on and a point , let be a reordering of the training data such that .
The 1-nearest neighbour classifier
The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that assigns a point x to the class of its closest neighbour in the feature space, that is .
As the size of training data set approaches infinity, the one nearest neighbour classifier guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data).
The k-nearest neighbour classifier
The k-nearest neighbour classifier assigns a point x to a particular class based on a majority vote among the classes of the k nearest training points to x.
Properties
There are many results on the error rate of the k nearest neighbour classifiers.[1] The k-nearest neighbour classifier is strongly (that is for any joint distribution on ) consistent provided diverges and converges to zero as .
Let denote the k nearest neighbour classifier based on a training set of size n. Under certain regularity conditions, the excess risk yields the following asymptotic expansion[2]
for some constants and .
The choice offers a trade off between the two terms in the above display, for which the -nearest neighbour error converges to the Bayes error at the optimal (minimax) rate .
The weighted nearest neighbour classifier
The k-nearest neighbour classifier can be viewed as assigning the k nearest neighbours a weight and all others 0 weight. This can be generalised to weighted nearest neighbour classifiers. That is, where the ith nearest neighbour is assigned a weight , with . An analogous result on the strong consistency of weighted nearest neighbour classifiers also holds.[3]
Let denote the weighted nearest classifier with weights . Subject to regularity conditions on to class distributions the excess risk has the following asymptotic expansion[2]
for constants and where and .
The optimal weighting scheme , that balances the two terms in the display above, is given as follows: set ,
- for and
- for .
With optimal weights the dominant term in the asymptotic expansion of the excess risk is . Similar results are true when using a bagged nearest neighbour classifier.
References
- ↑ Devroye, L., Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.
- 1 2 Samworth R. J. (2012). "Optimal weighted nearest neighbour classifiers". Annals of Statistics 40 (5): 2733–2763. doi:10.1214/12-AOS1049.
- ↑ Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics 5 (4): 595–620. doi:10.1214/aos/1176343886.