Multiple-instance learning
In machine learning, multiple-instance learning (MIL) is a variation on supervised learning. Instead of receiving a set of instances which are individually labeled, the learner receives a set of labeled bags, each containing many instances. In the simple case of multiple-instance binary classification, a bag may be labeled negative if all the instances in it are negative. On the other hand, a bag is labeled positive if there is at least one instance in it which is positive. From a collection of labeled bags, the learner tries to either (i) induce a concept that will label individual instances correctly or (ii) learn how to label bags without inducing the concept.
Take image classification for example in Amores (2013). Given an image, we want to know its target class based on its visual content. For instance, the target class might be "beach", where the image contains both "sand" and "water". In MIL terms, the image is described as a bag , where each is the feature vector (called instance) extracted from the corresponding i-th region in the image and N is the total regions (instances) partitioning the image. The bag is labeled positive ("beach") if it contains both "sand" region instances and "water" region instances.
Multiple-instance learning was originally proposed under this name by Dietterich, Lathrop & Lozano-Pérez (1997), but earlier examples of similar research exist, for instance in the work on handwritten digit recognition by Keeler, Rumelhart & Leow (1990). Recent reviews of the MIL literature include Amores (2013), which provides an extensive review and comparative study of the different paradigms, and Foulds & Frank (2010), which provides a thorough review of the different assumptions used by different paradigms in the literature.
Examples of where MIL is applied are:
- Molecule activity
- Predicting binding sites of Calmodulin binding proteins [1]
- Predicting function for alternatively spliced isoforms Li, Menon & et al. (2014),Eksi et al. (2013)
- Image classification Maron & Ratan (1998)
- Text or document categorization Kotzias et al. (2015)
- Predicting functional binding sites of MicroRNA targets Bandyopadhyay, Ghosh & et al. (2015)
Numerous researchers have worked on adapting classical classification techniques, such as support vector machines or boosting, to work within the context of multiple-instance learning.
See also
References
- ↑ Minhas, Fayyaz (2012). "Multiple instance learning of Calmodulin binding sites,". Bioinformatics 28 (18): i416-i422. doi:10.1093/bioinformatics/bts416.
- Dietterich, Thomas G.; Lathrop, Richard H.; Lozano-Pérez, Tomás (1997), "Solving the multiple instance problem with axis-parallel rectangles", Artificial Intelligence 89 (1–2): 31–71, doi:10.1016/S0004-3702(96)00034-3.
- Amores, Jaume (2013), "Multiple instance classification: Review, taxonomy and comparative study", Artificial Intelligence 201: 81–105, doi:10.1016/j.artint.2013.06.003.
- Foulds, James; Frank, Eibe (2010), "A Review of Multi-Instance Learning Assumptions", Knowledge Engineering Review 25 (1): 1–25, doi:10.1017/S026988890999035X.
- Keeler, James D.; Rumelhart, David E.; Leow, Wee-Kheng (1990), "Integrated segmentation and recognition of hand-printed numerals", Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems (NIPS 3), pp. 557–563.
- Li, H.D.; Menon, R.; et al. (2014), "The emerging era of genomic data integration for analyzing splice isoform function", Trends in Genetics 30: 340–347, doi:10.1016/j.tig.2014.05.005, PMID 24951248, pii S0168-9525(14)00085-7.
- Eksi, R.; Li, H.D.; Menon, R.; et al. (2013), "Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data", PLoS Comput Biol 9: Nov;9(11):e1003314, doi:10.1371/journal.pcbi.1003314, PMC 3820534, PMID 24244129.
- Maron, O.; Ratan, A.L. (1998), "Multiple-instance learning for natural scene classification", Proceedings of the Fifteenth International Conference on Machine Learning, pp. 341–349.
- Kotzias, Dimitrios; Denil, Misha; De Freitas, Nando; Smyth, Padhraic (2015), "From Group to Instance Labels, using Deep Features", Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 597–606, doi:10.1145/2783258.2783380.
- Ray, Soumya; Page, David (2001). Multiple instance regression (PDF). ICML..
- Bandyopadhyay, S.; Ghosh, D.; Mitra, R.; Zhao, Z. (2015), "MBSTAR: multiple instance learning for predicting specific functional binding sites in microRNA targets.", Sci Rep. 5: 8004, doi:10.1038/srep08004, PMID 25614300