Similarity learning

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn from examples a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems,[1] visual identity tracking,[2] face verification[3] and speaker verification.[4]

Learning setup

There are four common setups for similarity and metric distance learning.

A common approach for learning similarity, is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function  f_W(x, z)  = x^T W z .

Metric learning

Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, Identity of indiscernibles, symmetry and subadditivity / triangle inequality. In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.

When the objects x_i are vectors in R^d, then any matrix W in the symmetric positive semi-definite cone S_+^d defines a distance pseudo-metric of the space of x through the form D_W(x_1, x_2)^2 = (x_1-x_2)^{\top} W (x_1-x_2). When W is a symmetric positive definite matrix, D_W is a metric. Moreover, as any symmetric positive semi-definite matrix W \in S_+^d can be decomposed as W = L^{\top}L where L \in R^{e \times d} and e \geq rank(W), the distance function D_W can be rewritten equivalently D_W(x_1, x_2)^2 = (x_1-x_2)^{\top} L^{\top}L (x_1-x_2) = \| L (x_1-x_2) \|_2^2. The distance D_W(x_1, x_2)^2=\| x_1' - x_2' \|_2^2 corresponds to the Euclidean distance between the projected feature vectors x_1'= Lx_1 and x_2'= Lx_2. Some well-known approaches for metric learning include Large margin nearest neighbor,[8] Information theoretic metric learning (ITML).[9]

In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance.

Applications

Similarity learning is used in information retrieval for learning to rank, in face verification or face identification,[10][11] and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches .[12]

Scalability

Metric and similarity learning naively scale quadraticly with the dimension of the input space, as can easily see when the learned metric has a bilinear form  f_W(x, z)  = x^T W z . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL,[13] and with COMET .[14]

Further reading

For further information on this topic, see the surveys on metric and similarity learning by Bellet et al.[15] and Kulis.[16]

See also

Latent semantic analysis

References

  1. Barkan, O; Koenigstein, N (2016)."Item2Vec: Neural Item Embedding for Collaborative Filtering". arXiv:1603.04259.
  2. Bewley, A., Ott, L., Ramos, F., & Upcroft, B. (2016)."ALExTRAC: Affinity Learning by Exploring Temporal Reinforcement within Association Chains". In Proceedings of the IEEE International Conference on Robotics and Automation 2016.
  3. Barkan O, Weill J, Wolf L, Aronowitz H. "Fast high dimensional vector multiplication face recognition". In Proceedings of the IEEE International Conference on Computer Vision 2013 (pp. 1960-1967).
  4. Barkan O, Aronowitz H. "Diffusion maps for PLDA-based speaker verification.". In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013 (pp. 7639-7643).
  5. Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010). "Large Scale Online Learning of Image Similarity Through Ranking" (PDF). Journal of Machine Learning research 11: 1109–1135.
  6. Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via hashing." VLDB. Vol. 99. No. 6. 1999.
  7. Rajaraman, A.; Ullman, J. (2010). "Mining of Massive Datasets, Ch. 3.".
  8. Weinberger, K. Q.; Blitzer, J. C.; Saul, L. K. (2006). "Distance Metric Learning for Large Margin Nearest Neighbor Classification" (PDF). Advances in Neural Information Processing Systems 18: 1473–1480.
  9. Davis, J. V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I. S. (2007). "Information-theoretic metric learning". International conference in machine learning (ICML): 209–216.
  10. Guillaumin, M.; Verbeek, J.; Schmid, C. (2009). "Is that you? Metric learning approaches for face identification" (PDF). IEEE International Conference on Computer Vision (ICCV).
  11. Mignon, A.; Jurie, F. (2012). "PCCA: A new approach for distance learning from sparse pairwise constraints" (PDF). IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  12. Xing, E. P.; Ng, A. Y.; Jordan, M. I.; Russell, S. (2002). "Distance Metric Learning, with Application to Clustering with Side-information". Advances in Neural Information Processing Systems (MIT Press) 15: 505–512.
  13. Liu; Bellet; Sha (2015). "Similarity Learning for High-Dimensional Sparse Data" (PDF). International Conference on Artificial Intelligence and Statistics (AISTATS).
  14. Atzmon; Shalit; Chechik (2015). "Learning Sparse Metrics, One Feature at a Time" (PDF). J. Mach. Learn. Research (JMLR).
  15. Bellet, A.; Habrard, A.; Sebban, M. (2013). "A Survey on Metric Learning for Feature Vectors and Structured Data". arXiv:1306.6709 [cs.LG].
  16. Kulis, B. (2012). "Metric Learning: A Survey" (PDF). Foundations and Trends in Machine Learning.
This article is issued from Wikipedia - version of the Monday, April 18, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.