By Paul Billings

The improvement in efficiency of microprocessors has led to incredible changes in our lives. The amount of information we have at our fingertips and the new ways we can communicate with others in a multitude of environments have been substantially made possible as a result of improvements in these tiny electronic components. Their evolution has generated the formulation of Moore's Law (after the Intel Founder) who noted that microprocessor chip capabilities were doubling in power and productivity approximately every two years.

This remarkable, transformative technical accomplishment may soon be eclipsed in some respects by the changes occurring in nucleic acid sequencing. Until the advent of the Human Genome Project, accomplished over the decade of the 1990's, the field of DNA sequencing was dominated by the relatively slow, laborious but highly accurate methods developed by Fred Sanger and Walter Gilbert. In fact, these methods still dominate key areas of sequencing application today. But when faced with the daunting task of analyzing and assembling the content of 24 unique human chromosomes comprising roughly 3 billion individual DNA bases (the approximate size of a haploid human genome), researchers decided that approaches termed "shot-gun" were required in order to complete the project. The genome was fragmented into small pieces, those fragments were copied faithfully, and this library of copies was decoded using tagged nucleotides. The decoding occurred as part of normal DNA synthesis, and as this occurred, the incorporated nucleotides were identified by light sensitive cameras. The induced fragment sequences were then mapped and reconstructed by bioinformatic computational tools into the original whole genome sequence. With the invention of this new form of sequencing, called next generation sequencing, a rough draft of the human genome was prepared in about 10 years at a cost of roughly $3 billion.

Within the next few months, a new sequencing method that does not rely on photographic determinations of base incorporation in templated DNA libraries will deliver a whole genome sequence with accuracy similar to or better than what the Human Genome Project provided. This sequencing will be done in about a day for a cost of less than $1,000. Using learning derived from the microprocessor industry, nucleic acid sequencing is turning to sequencing chips, and the productivity of this innovation is surpassing the pace predicted by Moore's Law-the output of these chips has risen exponentially every 6 months!

This incredible speed improvement and cost reduction opens up a whole array of potential biological and medical applications. Researchers across the globe can afford to analyze DNA or RNA (the methods can be used for all types of nucleic acid analysis and are highly quantifiable) in a broad array of experimental systems using a desktop sequencer purchasable with a low capital expenditure. Entrants in clinical trials can easily be sequencing before commencing protocols or in post hoc analyses in order to tailor therapies better for specific clinical cases. This development will likely speed more personalized or individualized care and reduce the burden of adverse drug reactions. Rapid analysis of infectious agents or disease states may identify causative agents more rapidly and lead to more precise (and less wasteful) clinical care. The overall goals of this new type of rapid measurement should be to expand knowledge in related sciences while addressing, in a high quality manner, meaningful unmet needs in human health, and reducing costs and eliminating waste in clinical delivery.

The new methods can be applied to the analysis of amplicons (strings of nucleic acids copied) or complete genes depending on their length. They can be trained on panels of areas of genes that are known to vary in important ways (for instance, genetic "hotspots" important in cancer care) or on all a cell's expressed genes (the exome) or on whole genomes, the gene "home" in our cells. There are at least two genomes in most of human cells. One resides in the nucleus of cells and the other exists in the mitochondria. There appears to be a sharing of information between our two cellular repositories of DNA information.

As noted, sensitive and quantitative analyses of RNA are also possible with next generation sequencing methods. Variation in mRNA level is now used in research and in some clinical settings. For instance, in breast and lung cancer treatment planning, analysis of mRNA patterns appears to allow better predictions of which patients can forego sometimes toxic continued treatments for their cancers. Next generation sequencing methods may make these analyses even more precise, comprehensive or less costly.

The older methods of sequencing analyzed the ends of strings of nucleic acids by first marking them with tags and then terminating the ends one nucleotide at a time ("chain termination" methods). Next generation sequencing methods use natural enzymes that synthesize or link nucleic acids (polymerases or ligases) and note the incorporation of known nucleotides. The most common form of next generation sequencing incorporates fluorescent DNA bases recorded by a camera. The microprocessor method simply monitors the PH (acidity) around the artificial synthesis of DNA library fragments, functioning as miniature PH meters. When the correct base is inserted, an acidic hydrogen ion is released and the microprocessor registers this event.

In the future, single molecule strings of nucleic acids (bases) may be passed through pores, one base at a time (but very rapidly) and analyzed (nanopore methods). A variety of other sensitive single molecule approaches have also been proposed. The sensitivity of these methods may improve the detection of minor nucleic acid species in mixtures within biological fluids. For instance, the blood of a pregnant woman contains a relatively small number of molecules derived from the fetus during gestation. Reviewing and counting them may help in the monitoring of fetal health. Cancers shed DNA and RNA as they grow and spread into a blood based sea of normal nucleic acid strings and fragments. Analyzing them may provide important cancer biology data.

There are many factors that impact the accuracy and utility of all sequencing techniques. First, can the target sequence be reliably purified and prepared for the analytic approach? Some areas of the genome, for biochemical and structural reasons, are hard to assess. Second, can the targeted fragment be copied and modified into libraries for processing? Then, does the sequence to be analyzed conform to those that can be done accurately with the method applied? Long strings of the same DNA base (known as homopolymers) often confound next generation methods and do exist in parts of the human genome. Finally, the bioinformatic transformation of input from the sequencer into imputed strings of DNA bases and the calling of base changes (mutations) can vary. If a method produces accurate and long strings of output, the computational transformations of raw data may be generally more accurate.

At present, no one method of next generation sequencing allows for all the DNA of the human genome to be fully analyzed. Around 5 to 10% of the genome seems relatively inaccessible for the reasons noted above. In addition, even if the method applied in a laboratory is 99.99% accurate-a level that far exceeds most clinically applied measurements now-this would appear to be inadequate for clinical genome work. Consider that genome sequencing has at least 3x109 targets. An error rate of 0.01% (99.99% accuracy) would produce up to 105 inaccurate calls and potentially false results. That would be unacceptable in many applications. The use of highly redundant applications of next generation sequencing methods or two differing methods on the same specimen may be ways of improving the accuracy of newer DNA methods and limiting errors.

In this issue of GeneWatch, authors explore how next generation sequencing and other views of the genome may alter our lives and environments. It is certain that sequencing of genomes by methods including those described as "next generation" will not provide all the answers to our biology or address all our unmet clinical needs, but the use of these methods and resulting data will be a relatively quantifiable and reliable component of ongoing approaches to medical understanding and care. It is hopeful that we will be building an analytic system using a rapidly improving series of methods with the characteristics of next generation sequencing. "Art" has dominated many measures of human variation so far. The evolution of next generation sequencing promises the application of more "science" to important issues, likely a salubrious change.

Paul Billings, MD, PhD, is Vice Chair of the Board of Directors of the Council for Responsible Genetics and Chief Medical Officer of Life Technologies, Corp. This article represents Dr. Billings' own views rather than those of Life Technologies.

Search: GeneWatch
Created in 1999 by the Council for Responsible Genetics, the Safe Seed Pledge helps to connect non-GM seed sellers,distributors and traders to the growing market of concerned gardeners and agricultural consumers. The Pledge allows businesses and individuals to declare that they "do not knowingly buy, sell or trade genetically engineered seeds," thus assuring consumers of their commitment.
View Project
The purpose of the Genetic Bill of Rights is to introduce a global dialogue on the fundamental values that have been put at risk by new applications of genetics.
View Project