Relief (feature selection)
RELIEF is a feature selection algorithm used in binary classification (generalisable to polynomial classification by decomposition into a number of binary problems) proposed by Kira and Rendell in 1992.[1] Its strengths are that it is not dependent on heuristics, runs in low-order polynomial time, and is noise-tolerant and robust to feature interactions, as well as being applicable for binary or continuous data; however, it does not discriminate between redundant features, and low numbers of training instances fool the algorithm. Kononenko et al. proposed some updates to the algorithm (RELIEFF) in order to improve the reliability of the probability approximation, make it robust to incomplete data, and generalising it to multi-class problems.[2]
RELIEF
Take a data set with n instances of p features, belonging to two known classes. Within the data set, each feature should be scaled to the interval [0 1] (binary data should remain as 0 and 1). The algorithm will be repeated m times. Start with a p-long weight vector (W) of zeros.
At each iteration, take the feature vector (X) belonging to one random instance, and the feature vectors of the instance closest to X (by Euclidean distance) from each class. The closest same-class instance is called 'near-hit', and the closest different-class instance is called 'near-miss'. Update the weight vector such that
Thus the weight of any given feature decreases if it differs from that feature in nearby instances of the same class more than nearby instances of the other class, and increases in the reverse case.
After m iterations, divide each element of the weight vector by m. This becomes the relevance vector. Features are selected if their relevance is greater than a threshold τ.
Kira and Rendell's experiments[3] showed a clear contrast between relevant and irrelevant features, allowing τ to be determined by inspection. However, it can also be determined by Cebysev's inequality for a given confidence level (α) that a τ of 1/sqrt(α*m) is good enough to make the probability of a Type I error less than α, although it is stated that τ can be much smaller than that.
RELIEFF
Kononenko et al. propose a number of updates to RELIEF.[4] Firstly, they find the near-hit and near-miss instances using the Manhattan (L1) norm rather than the Euclidean (L2) norm, although the rationale is not specified. Furthermore, they found taking the absolute differences between xi and near-hiti, and xi and near-missi to be sufficient when updating the weight vector (rather than the square of those differences).
Reliable probability estimation
Rather than repeating the algorithm m times, implement it exhaustively (i.e. n times, once for each instance) for relatively small n (up to one thousand). Furthermore, rather than finding the single nearest hit and single nearest miss, which may cause redundant and noisy attributes to affect the selection of the nearest neighbours, RELIEFF searches for k nearest hits and misses and averages their contribution to the weights of each feature. k can be tuned for any individual problem.
Incomplete data
In RELIEFF, the contribution of missing values to the feature weight is determined using the conditional probability that two values should be the same or different, approximated with relative frequencies from the data set. This can be calculated if one or both features are missing.
Multi-class problems
Rather than use Kira and Rendell's proposed decomposition of a multinomial classification into a number of binomial problems, RELIEFF searches for k near misses from each different class and averages their contributions for updating W, weighted with the prior probability of each class.
RRELIEFF
Robnik-Šikonja and Kononenko propose further updates to RELIEFF, making it appropriate for regression.[5]
References
- ↑ Kira, Kenji and Rendell, Larry (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings.
- ↑ Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55
- ↑ Kira, Kenji and Rendell, Larry (1992) A Practical Approach to Feature Selection, Proceedings of the Ninth International Workshop on Machine Learning, p249-256
- ↑ Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55
- ↑ Robnik-Šikonja, Marko, and Kononenko, Igor (1997). An adaptation of Relief for attribute estimation in regression. Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97) (p296-304)