Unsupervised learning

Machine learning and data mining

Problems Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature engineering Feature learning Online learning Semi-supervised learning Unsupervised learning Learning to rank Grammar induction
Supervised learning (classification • regression) Decision trees Ensembles (Bagging, Boosting, Random forest) k-NN Linear regression Naive Bayes Neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH Hierarchical k-means Expectation-maximization (EM) DBSCAN OPTICS Mean-shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA t-SNE
Structured prediction Graphical models (Bayes net, CRF, HMM)
Anomaly detection k-NN Local outlier factor
Neural nets Autoencoder Deep learning Multilayer perceptron RNN Restricted Boltzmann machine SOM Convolutional neural network
Theory Bias-variance dilemma Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Machine learning venues NIPS ICML JMLR ArXiv:cs.LG
Machine learning portal

Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning.

Unsupervised learning is closely related to the problem of density estimation in statistics.^[1] However unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data. Many methods employed in unsupervised learning are based on data mining methods used to preprocess data.

Approaches to unsupervised learning include:

clustering (e.g., k-means, mixture models, hierarchical clustering),^[2]
Approaches for learning latent variable models such as
- Expectation–maximization algorithm (EM)
- Method of moments
- Blind signal separation techniques, e.g.,

Among neural network models, the self-organizing map (SOM) and adaptive resonance theory (ART) are commonly used unsupervised learning algorithms. The SOM is a topographic organization in which nearby locations in the map represent inputs with similar properties. The ART model allows the number of clusters to vary with problem size and lets the user control the degree of similarity between members of the same clusters by means of a user-defined constant called the vigilance parameter. ART networks are also used for many pattern recognition tasks, such as automatic target recognition and seismic signal processing. The first version of ART was "ART1", developed by Carpenter and Grossberg (1988).^[4]

Method of moments

One of the approaches in unsupervised learning is the method of moments. In the method of moments, the unknown parameters (of interest) in the model are related to the moments of one or more random variables, and thus, these unknown parameters can be estimated given the moments. The moments are usually estimated from samples in an empirical way. The basic moments are first and second order moments. For a random vector, the first order moment is the mean vector, and the second order moment is the covariance matrix (when the mean is zero). Higher order moments are usually represented using tensors which are the generalization of matrices to higher orders as multi-dimensional arrays.

In particular, the method of moments is shown to be effective in learning the parameters of latent variable models.^[5] Latent variable models are statistical models where in addition to the observed variables, a set of latent variables also exists which is not observed. A highly practical example of latent variable models in machine learning is the topic modeling which is a statistical model for generating the words (observed variables) in the document based on the topic (latent variable) of the document. In the topic modeling, the words in the document are generated according to different statistical parameters when the topic of the document is changed. It is shown that method of moments (tensor decomposition techniques) consistently recover the parameters of a large class of latent variable models under some assumptions.^[5]

The Expectation–maximization algorithm (EM) is also one of the most practical methods for learning latent variable models. However, it can get stuck in local optima, and it is not guaranteed that the algorithm will converge to the true unknown parameters of the model. Alternatively, for the method of moments, the global convergence is guaranteed under some conditions.^[5]

Notes

↑ Jordan, Michael I.; Bishop, Christopher M. (2004). "Neural Networks". In Allen B. Tucker. Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton, FL: Chapman & Hall/CRC Press LLC. ISBN 1-58488-360-X.
↑ Hastie,Trevor,Robert Tibshirani, Friedman,Jerome (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer. pp. 485–586. ISBN 978-0-387-84857-0.
↑ Acharyya, Ranjan (2008); A New Approach for Blind Source Separation of Convolutive Sources, ISBN 978-3-639-07797-1 (this book focuses on unsupervised learning with Blind Source Separation)
↑ Carpenter, G.A. and Grossberg, S. (1988). "The ART of adaptive pattern recognition by a self-organizing neural network" (PDF). Computer 21: 77–88. doi:10.1109/2.33.
1 2 3 Anandkumar, Animashree; Ge, Rong; Hsu, Daniel; Kakade, Sham; Telgarsky, Matus (2014). "Tensor Decompositions for Learning Latent Variable Models" (PDF). Journal of Machine Learning Research (JMLR) 15: 2773−2832.

Unsupervised learning

Method of moments

See also

Notes

Further reading