Advances in DNA sequencing and genomics offer to enhance not only human health and human based biology but also offer to open doors for the characterization of the biodiversity on our planet. There are two major areas in modern biodiversity studies that will directly benefit from the advancing technology. The first and perhaps the most important concerns aiding the simple cataloguing of diversity on our planet. This aspect of genomics information will utilize genomic data as an informatic anchor for organizing the biology of a species. The second concerns using genome level information to create a "Tree of Life" that could serve as a foundation for all of biological science.
Officially, there are 1.7 million species of organisms on this planet. By officially I mean "named". A named species is important because it has been recognized as a species by experts in an area of organismal diversity (such as botany or zoology or mycology) using the methods outlined by Carl von Linne over 250 years ago. If we take just these 1.7 million named species, then arthropods (insects, crustaceans, spiders etc) would be the most speciose group of living things on the planet, strengthening the famous geneticist JBS Haldane's statement that "God has an inordinate fondness of beetles". And from those same 1.7 million species only 6,000 would be bacteria. On the other hand, we know from several studies that this number is off by at least three, perhaps four orders of magnitude, meaning that there are more than likely tens of millions of microbial species we have yet to discover. On the other hand, there are many fewer vertebrate species for us to discover anew. So from this perspective if we looked at the diversity of organisms based on "true" numbers, the overwhelming winner would be bacteria and rather, God would have an inordinate fondness of microbes. When we add to the fray that 99.9% of the life on this planet has gone extinct, the immensity of this diversity should be more than evident.
To demonstrate the utility of a DNA based approach to classifying and discovering diversity, I want to mention an initiative called the Consortium for the Barcode of Life (CBoL). This initiative has quietly been churning away at obtaining a short sequence for a 600 base pair reference region of the mitochondrial genome for the past decade. This project, while seemingly simple in its design, is an important one. Mostly because it will gather together tissues, taxonomic data, biogeographic data, and other data specific to the 1.7 million species on this planet. The DNA sequences themselves can serve as identifiers for future biological, forensic and conservation research. Many initiatives have strived toward a centralized repository for the biodiversity of this planet and have failed. One of the major successes of CBoL doesn't concern the progress they have made (they have close to 1,500,000 reference sequences in their databases and this covers about 150,000 named species), nor that the DNA barcode sequences will be useful, but rather they have demonstrated that the infrastructure for such an initiative is possible and necessary for its utility.
What the advances in genetic technology also mean is that any specimen collected or used by a scientist can and more than likely will have its genome sequenced. We will be able to use billions of base pairs as a DNA barcode in the future. Indeed it is one of my goals as a museum scientist to see every specimen that is accessioned into our collection at the American Museum of Natural History to have its genome sequenced as part of the process of accessioning. My colleagues at the AMNH might think me crazy, but the reality is this goal will most likely be a reality in two decades. And in many cases DNA sequences are all we have to recognize new species such as with bacteria, archaea and some fungi. As an example, recently an entirely new phylum of fungi was discovered directly as a result of the advancing modern technology.
It's one thing to find, name, store and catalogue species as most museum scientists do. It's another to figure out how species are related to each other and this is the purview of a sub discipline of biology called systematics. For the past decade the National Science Foundation (NSF) has supported and promoted a large multi-institutional project called the Tree of Life (ToL). This project hopes to construct THE branching diagram for all of the 1.7 million named species. This is a daunting task that will be made simpler by the ability to sequence whole genomes quickly and cheaply. While a DNA barcode system may not be in the cards in the future, The Tree of Life will be a reality as a result of the influx of whole genome sequencing. And The Tree of Life can serve as a cornerstone for modern biology. Why? Because a branching diagram is a very efficient way to store information. Couple the unique information storage capabilities with the idea that the branching order reflects evolutionary history, and we are in for some bizarre but overall pleasant surprises about life on this planet other than ourselves in the near future.
Rob DeSalle, PhD, is a curator in the American Museum of Natural History's Division of Invertebrate Zoology and co-director of its molecular laboratories and a member of CRG's Board of Directors.