Automated species identification
Automated species identification is a method of making the expertise of taxonomists available to ecologists, parataxonomists and others via computers, mobile devices and other digital technology.
Introduction
The automated identification of biological objects such as insects (individuals) and/or groups (e.g., species, guilds, characters) has been a dream among systematists for centuries. The goal of some of the first multivariate biometric methods was to address the perennial problem of group discrimination and inter-group characterization. Despite much preliminary work in the 1950s and '60s, progress in designing and implementing practical systems for fully automated object biological identification has proven frustratingly slow. As recently as 2004 Dan Janzen [1] updated the dream for a new audience:
The spaceship lands. He steps out. He points it around. It says ‘friendly–unfriendly—edible–poisonous—safe– dangerous—living–inanimate’. On the next sweep it says ‘Quercus oleoides—Homo sapiens—Spondias mombin—Solanum nigrum—Crotalus durissus—Morpho peleides— serpentine’. This has been in my head since reading science fiction in ninth grade half a century ago.
The species identification problem
Janzen’s preferred solution to this classic problem involved building machines to identify species from their DNA. His predicted budget and proposed research team is “US$1 million and five bright people.” However, recent developments in computer architectures, as well as innovations in software design, have placed the tools needed to realize Janzen’s vision in the hands of the systematics community not in several years hence, but now; and not just for DNA barcodes, but for digital images of organisms too. A recent survey of results accuracy results for small-scale trials (<50 taxa) obtained by such systems shows an average reproducible accuracy of over 85 percent with no significant correlation between accuracy and the number of included taxa or the type of group being assessed (e.g., butterflies, moths, bees, pollen, spores, foraminifera, dinoflagellates, vertebrates).[2] Moreover, these identifications, often involving thousands of individual specimens, can be made in a fraction of the time required by human experts and can be done on site, on demand, anywhere in the world.
These developments could not have come at a better time. As the taxonomic community already knows, the world is running out of specialists who can identify the very biodiversity whose preservation has become a global concern. In commenting on this problem in palaeontology as long ago as 1993, Roger Kaesler [3]
recognized:
“… we are running out of systematic palaeontologists who have anything approaching synoptic knowledge of a major group of organisms … Palaeontologists of the next century are unlikely to have the luxury of dealing at length with taxonomic problems … Palaeontology will have to sustain its level of excitement without the aid of systematists, who have contributed so much to its success.”
.
This expertise deficiency cuts as deeply into those commercial industries that rely on accurate identifications (e.g., agriculture, biostratigraphy) as it does into a wide range of pure and applied research programmes (e.g., conservation, biological oceanography, climatology, ecology). It is also commonly, though informally, acknowledged that the technical, taxonomic literature of all organismal groups is littered with examples of inconsistent and incorrect identifications. This is due to a variety of factors, including taxonomists being insufficiently trained and skilled in making identifications (e.g., using different rules-of-thumb in recognizing the boundaries between similar groups), insufficiently detailed original group descriptions and/or illustrations, inadequate access to current monographs and well-curated collections and, of course, taxonomists having different opinions regarding group concepts. Peer review only weeds out the most obvious errors of commission or omission in this area, and then only when an author provides adequate representations (e.g., illustrations, recordings, and gene sequences) of the specimens in question.
Systematics too has much to gain, both practically and theoretically, from the further development and use of automated identification systems. It is now widely recognized that the days of systematics as a field populated by mildly eccentric individuals pursuing knowledge in splendid isolation from funding priorities and economic imperatives are rapidly drawing to a close. In order to attract both personnel and resources, systematics must transform itself into a “large, coordinated, international scientific enterprise” [4] Many have identified use of the Internet— especially via the World Wide Web — as the medium through which this transformation can be made. While establishment of a virtual, GenBank-like system for accessing morphological data, audio clips, video files and so forth would be a significant step in the right direction, improved access to observational information and/or text-based descriptions alone will not address either the taxonomic impediment or low identification reproducibility issues successfully. Instead, the inevitable subjectivity associated with making critical decisions on the basis of qualitative criteria must be reduced or, at the very least, embedded within a more formally analytic context.
Properly designed, flexible, and robust, automated identification systems, organized around distributed computing architectures and referenced to authoritatively identified collections of training set data (e.g., images, and gene sequences) can, in principle, provide all systematists with access to the electronic data archives and the necessary analytic tools to handle routine identifications of common taxa. Properly designed systems can also recognize when their algorithms cannot make a reliable identification and refer that image to a specialist (whose address can be accessed from another database). Such systems can also include elements of artificial intelligence and so improve their performance the more they are used. Most tantalizingly, once morphological (or molecular) models of a species have been developed and demonstrated to be accurate, these models can be queried to determine which aspects of the observed patterns of variation and variation limits are being used to achieve the identification, thus opening the way for the discovery of new and (potentially) more reliable taxonomic characters.
Implementations
- Leaf Snap is an iOS app developed by the Smithsonian Institution that uses visual recognition software to identify North American tree species from photographs of leaves.
References cited
- ↑ Janzen, Daniel H. (March 22, 2004). "Now is the time.". Philosophical Transactions of the Royal Society of London. B 359: 731–732. doi:10.1098/rstb.2003.1444. PMC 1693358. PMID 15253359.
- ↑ Gaston, Kevin J.; O'Neill, Mark A. (March 22, 2004). "Automated species recognition: why not?". Philosophical Transactions of the Royal Society of London. B 359: 655–667. doi:10.1098/rstb.2003.1442. PMC 1693351. PMID 15253351.
- ↑ Kaesler, Roger L (1993). "A window of opportunity: peering into a new century of palaeontology". Journal of Palaeontology 67 (3): 329–333. JSTOR 1306022.
- ↑ Wheeler, Quentin D. (2003). "Transforming taxonomy" (PDF) (22). The Systematist: 3–5.
External links
Here are some links to the home pages of species identification systems. The SPIDA and DAISY system are essentially generic and capable of classifying any image material presented. The ABIS and DrawWing system are restricted to insects with membranous wings as they operate by matching a specific set of characters based on wing venation.